Truculent Llama
New Member
- Joined
- Mar 18, 2021
- Messages
- 12
- Office Version
- 365
- Platform
- Windows
Hi all,
I have a very large spreadsheet (currently at 130k+ rows and growing), which I'd like to perform a few functions on:
1. validate the fields within one of the columns, and alert the user if there are invalid entries
2. remove duplicate entries, keeping bottom rows instead of top rows
Request #1
My list of valid values to compare against is 136 values.
I have sought help from other users on Reddit r/excel and have had lots of help, however the validation involved a looping solution (and another a loop within a loop) to check through each cell in the column, and compare it against the list of valid options. There were all sorts of issues I encountered because of the large data set, and in the end I ended up settling on Excel conditional formatting to highlight invalid cells (which by comparison is instantaneous), and then sorting the data based on coloured fill. My macro also selects the last cell in the column, hopefully drawing the user's eye to the shaded cells. So far so good.
What I would like is for a quick VBA script that can assess whether or not any of those cells have been conditionally formatted, and if so throw an error message to alert the user. The MsgBox I'm fine with. It's the checking for the error that I can't work out.
Request #2
The way the data is managed, sometimes we need to overwrite certain rows with new data. Using a solution like VLOOKUP would be too cumbersome for the workflow in place, so what I'd like to implement is a simple solution to create a helper column to sort based on "newest" to "oldest", then simply remove duplicates top to bottom.
The data does not contain date or time columns. What I thought might be the easiest would be to:
1. Helper column has a simple count (+1 based on the row above)
2. Turn off auto calculation
3. Force a recalculation on the helper column with
4. Sort the data descending
5. Remove duplicates
6. Turn auto calculation back on
So I think during the writing of this post, I've basically answered my own Request #2, but would love some help with Request #1.
Having said that, if anyone has any great suggestions for either, I'm open to ideas.
Thanking you in advance,
Llama
I have a very large spreadsheet (currently at 130k+ rows and growing), which I'd like to perform a few functions on:
1. validate the fields within one of the columns, and alert the user if there are invalid entries
2. remove duplicate entries, keeping bottom rows instead of top rows
Request #1
My list of valid values to compare against is 136 values.
I have sought help from other users on Reddit r/excel and have had lots of help, however the validation involved a looping solution (and another a loop within a loop) to check through each cell in the column, and compare it against the list of valid options. There were all sorts of issues I encountered because of the large data set, and in the end I ended up settling on Excel conditional formatting to highlight invalid cells (which by comparison is instantaneous), and then sorting the data based on coloured fill. My macro also selects the last cell in the column, hopefully drawing the user's eye to the shaded cells. So far so good.
What I would like is for a quick VBA script that can assess whether or not any of those cells have been conditionally formatted, and if so throw an error message to alert the user. The MsgBox I'm fine with. It's the checking for the error that I can't work out.
Request #2
The way the data is managed, sometimes we need to overwrite certain rows with new data. Using a solution like VLOOKUP would be too cumbersome for the workflow in place, so what I'd like to implement is a simple solution to create a helper column to sort based on "newest" to "oldest", then simply remove duplicates top to bottom.
The data does not contain date or time columns. What I thought might be the easiest would be to:
1. Helper column has a simple count (+1 based on the row above)
2. Turn off auto calculation
3. Force a recalculation on the helper column with
Application.Calculate
4. Sort the data descending
5. Remove duplicates
6. Turn auto calculation back on
So I think during the writing of this post, I've basically answered my own Request #2, but would love some help with Request #1.
Having said that, if anyone has any great suggestions for either, I'm open to ideas.
Thanking you in advance,
Llama