Sorting data

Brookevcl

New Member
Joined
Jul 22, 2024
Messages
11
Office Version
  1. 365
Platform
  1. MacOS
Hi! I have a huge list (60,000) of contact data including information I want and a lot I don’t need. It is unevenly spread across cells in a way that isn’t patterned throughout (like some of the data is in correct columns but some is not). I basically want to delete the excess data and sort the rest. An example end product I would want is:
(Name, Address, City, State, Zip)
but because it’s all jumbled, I have to hand sort it. Also, I need to delete out all the extra data like account data and business names. Is there an easier way to do this through code? Or do I just need to hand do it all? Any help would be greatly appreciated!
 
The new endless formula is supposed to be dragged down, i.e. you can use on all rows at once and see the result (no need to change 1 to 2 manually). To identify problematic rows you could indeed e.g. create a helper column and use =LEN(A1) where A1 represents the STATE column; if you drag it down you can then filter rows which are > 2 because it should probably contain only two.
It seems like, when I drag down the new formula, it states false for every row except the problem ones so I think I would need to filter it so it keeps the data on the non-problem rows
 
Upvote 0

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December
It seems like, when I drag down the new formula, it states false for every row except the problem ones so I think I would need to filter it so it keeps the data on the non-problem rows
I noticed that and edited the formula (but you probably copied & used it before that) - my apologies; please copy it again on page 1, it should now address the "FALSE" problem, i.e. be usable on all rows.
 
Upvote 0
I noticed that and edited the formula (but you probably copied & used it before that) - my apologies; please copy it again on page 1, it should now address the "FALSE" problem, i.e. be usable on all rows.
I just tried that formula and this is the result I'm getting:
 

Attachments

  • Screenshot 2024-07-24 at 6.32.39 PM.png
    Screenshot 2024-07-24 at 6.32.39 PM.png
    41.7 KB · Views: 2
Upvote 0
I might simply have overthought that, my apologies. Test the slightly tuned original formula on the whole dateset:

Excel Formula:
=LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,2))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip))

On top of that, could you please test on the whole dataset the following formula and post how many rows are there where the result is > 2:

Excel Formula:
=LET(
array,TOROW(TRIM(A1:S1),1),
SUM(--IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,1))))))
 
Upvote 0
I might simply have overthought that, my apologies. Test the slightly tuned original formula on the whole dateset:

Excel Formula:
=LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,2))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip))

On top of that, could you please test on the whole dataset the following formula and post how many rows are there where the result is > 2:

Excel Formula:
=LET(
array,TOROW(TRIM(A1:S1),1),
SUM(--IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,1))))))
Still getting that same response of the VALUE or FALSE. As for the second part, it looks like there are 13 rows with 3
 
Upvote 0
If the original formula works, this should too because there is only one change - are you sure that if you use this:

Excel Formula:
=LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,2))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip))

it returns VALUE or FALSE?

As for the latter part, if there are only 3 rows out of 60,000 that returns 3, I guess you can address it manually. On the other hand, are there any rows where it returns 0? 1 and 2 should be fine with the formula above (you will still need to verify the data though).
 
Upvote 0
If the original formula works, this should too because there is only one change - are you sure that if you use this:

Excel Formula:
=LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,2))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip))

it returns VALUE or FALSE?

As for the latter part, if there are only 3 rows out of 60,000 that returns 3, I guess you can address it manually. On the other hand, are there any rows where it returns 0? 1 and 2 should be fine with the formula above (you will still need to verify the data though).
It takes forever to apply to all, but it seems like that formula works great. The other formula looks like it is affecting the city column rather than the state column? I could be wrong in what it is doing, though. I can tell there are many other problem areas but when I filter it, they disappear.
 
Upvote 0
I would need to see the data to think about those problematic parts...
 
Upvote 0
I would need to see the data to think about those problematic parts...
So sorry, I forgot to link them. Here is what happens when I don't filter it and when I filter it down to include 0 and 3:
 

Attachments

  • Image 7-28-24 at 9.53 AM.jpg
    Image 7-28-24 at 9.53 AM.jpg
    54.2 KB · Views: 5
  • Screenshot 2024-07-28 at 9.51.24 AM.png
    Screenshot 2024-07-28 at 9.51.24 AM.png
    87.2 KB · Views: 4
Upvote 0
Those CALC errors (rows with 0) mean that the formula does not work because at that particular row the data has a different structure - how many zeros are there out of those 60,000 rows + could you post some original data (e.g. 10 rows) where it returns CALC error?
 
Upvote 0

Forum statistics

Threads
1,224,884
Messages
6,181,555
Members
453,053
Latest member
Kiranm13

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top