Sorting data

Brookevcl

New Member
Joined
Jul 22, 2024
Messages
11
Office Version
  1. 365
Platform
  1. MacOS
Hi! I have a huge list (60,000) of contact data including information I want and a lot I don’t need. It is unevenly spread across cells in a way that isn’t patterned throughout (like some of the data is in correct columns but some is not). I basically want to delete the excess data and sort the rest. An example end product I would want is:
(Name, Address, City, State, Zip)
but because it’s all jumbled, I have to hand sort it. Also, I need to delete out all the extra data like account data and business names. Is there an easier way to do this through code? Or do I just need to hand do it all? Any help would be greatly appreciated!
 

Excel Facts

Square and cube roots
The =SQRT(25) is a square root. For a cube root, use =125^(1/3). For a fourth root, use =625^(1/4).
It would be helpful if you could post a sample of the data so that it is possible to answer your question...
 
Upvote 0
It would be helpful if you could post a sample of the data so that it is possible to answer your question...
Here is a sample of the data and what I need to sort it into :)
 

Attachments

  • Screenshot 2024-07-22 at 3.47.10 PM.jpeg
    Screenshot 2024-07-22 at 3.47.10 PM.jpeg
    123.2 KB · Views: 28
  • Screenshot 2024-07-22 at 3.49.19 PM.jpeg
    Screenshot 2024-07-22 at 3.49.19 PM.jpeg
    53.3 KB · Views: 26
Upvote 0
Address, state, city, and zip code could probably be extracted with a combination of REGEX and INDEX functions, name seems to be more complicated; I guess you are not allowed to share the whole dataset or some larger sample?
 
Upvote 0
Address, state, city, and zip code could probably be extracted with a combination of REGEX and INDEX functions, name seems to be more complicated; I guess you are not allowed to share the whole dataset or some larger sample?
How would I go about using those functions? I am not allowed to share a larger sample, but it basically goes on exactly like that for 60,000 rows
 
Upvote 0
Please test this, insert it in the ADDRESS column of the output table and drag it down (adjust range so that the last column that contains data is selected):

Excel Formula:
=LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,1))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip))

Then check where there are some problems, return here and describe them/post another sample with those problematic rows.

As for the NAME column, could you please post another 5-10 rows of the data?
 
Upvote 0
Please test this, insert it in the ADDRESS column of the output table and drag it down (adjust range so that the last column that contains data is selected):

Excel Formula:
=LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,1))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip))

Then check where there are some problems, return here and describe them/post another sample with those problematic rows.

As for the NAME column, could you please post another 5-10 rows of the data?
This is so helpful! Thank you so much. As for the problem areas, it looks as though they are occurring in areas where the data is misplaced in the input. I will link a photo example, the top being a correct input and output and the bottom being incorrect.
 

Attachments

  • Screenshot 2024-07-23 at 12.42.42 PM.png
    Screenshot 2024-07-23 at 12.42.42 PM.png
    31.8 KB · Views: 11
  • Screenshot 2024-07-23 at 12.42.50 PM.png
    Screenshot 2024-07-23 at 12.42.50 PM.png
    25.2 KB · Views: 8
Upvote 0
This is not due to misplaced input but due to the fact the SEARCH part of the formula returns two strings that match the pattern instead of one. You could try to address this particular problem by changing 1 to 2 in this part of the formula:
Excel Formula:
(RIGHT(array,1)
whenever there is a problematic row.

Or you could test this:

Excel Formula:
=LET(
a,
LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,1))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip)),
IF(COLUMNS(a)>4,
LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,2))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip));a))

I am not sure it will address all the problems - could you post how many of 60,000 rows are still problematic after this "patch"?
 
Upvote 0
This is not due to misplaced input but due to the fact the SEARCH part of the formula returns two strings that match the pattern instead of one. You could try to address this particular problem by changing 1 to 2 in this part of the formula:
Excel Formula:
(RIGHT(array,1)
whenever there is a problematic row.

Or you could test this:

Excel Formula:
=LET(
a,
LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,1))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip)),
IF(COLUMNS(a)>4,
LET(
array,TOROW(TRIM(A1:O1),1),
string,FILTER(array,IF(ISNUMBER(SEARCH("* ?? *",array)),ISNUMBER(--(RIGHT(array,2))))=TRUE),
address,INDEX(array,,XMATCH(string,array,0)-1),
state,TEXTBEFORE(TEXTAFTER(string," ",-2)," "),
city,TEXTBEFORE(string," ",-2),
zip,TEXTAFTER(string," ",-1),
HSTACK(address,state,city,zip));a))

I am not sure it will address all the problems - could you post how many of 60,000 rows are still problematic after this "patch"?
This works but because there are so many problematic rows, it would take forever to individually input. Is there a way to sort out a keyword? Like only have all rows with more than the two letters in the state column be shown? That way I could possibly use the formula for all of them in a chunk rather than having to sort it all myself? Thank you so much again!
 
Upvote 0
The new "endless" formula is supposed to be dragged down, i.e. you can use on all rows at once (i.e. even on those that are fine) and see the result (no need to change 1 to 2 manually). To identify problematic rows you could indeed e.g. create a helper column and use =LEN(A1) where A1 represents the STATE column; if you drag it down you can then filter rows which are > 2 because it should probably contain only two.
 
Upvote 0

Forum statistics

Threads
1,223,882
Messages
6,175,164
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top