Hey All,
Here is my problem, I have millions of addresses in multiple workbooks that were supplied to me.
The data is currently in two columns, colA Name colB Address.
ColA is fine like it is. ColB is the problem.
Right now I am trying to clean up ColB so it can be separated out as Street, City, Zip
The problem is cleaning the contents of misspellings and missing spaces between (words concatenated) by the source export (or data input)
Using the built in Excel text to columns, sort and filter has not really reduced the problems.
here is an example of the biggest problem
[TABLE="width: 500"]
<tbody>[TR]
[TD]
<colgroup><col width="408"></colgroup><tbody>
[TD="width: 408"]429 CHRISTOPHER AVE T2L14GAITHERSBURG MD 20879
[/TD]
</tbody>[/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
That 14Gaithersburg could be anything 14GGaithersburg, 14LeGaithersburg,
253b-tGaithersburg, etc.
I do have a list of all the cities that should be a possible valid entry in each state book. What I am thinking is to find a way to use that list as a range search and then place a space between the legitimate City Name and the mistyped ones.
any suggestions on how I can consistently make this happen? The good news is that the state code is always correct in each workbook and is the second to last "word" (from the right) in each string. The zip is always the last word.
any help would be appreciated
Thanks
Mike
Here is my problem, I have millions of addresses in multiple workbooks that were supplied to me.
The data is currently in two columns, colA Name colB Address.
ColA is fine like it is. ColB is the problem.
Right now I am trying to clean up ColB so it can be separated out as Street, City, Zip
The problem is cleaning the contents of misspellings and missing spaces between (words concatenated) by the source export (or data input)
Using the built in Excel text to columns, sort and filter has not really reduced the problems.
here is an example of the biggest problem
[TABLE="width: 500"]
<tbody>[TR]
[TD]
<colgroup><col width="408"></colgroup><tbody>
[TD="width: 408"]429 CHRISTOPHER AVE T2L14GAITHERSBURG MD 20879
[/TD]
</tbody>
[TD][/TD]
[/TR]
</tbody>[/TABLE]
That 14Gaithersburg could be anything 14GGaithersburg, 14LeGaithersburg,
253b-tGaithersburg, etc.
I do have a list of all the cities that should be a possible valid entry in each state book. What I am thinking is to find a way to use that list as a range search and then place a space between the legitimate City Name and the mistyped ones.
any suggestions on how I can consistently make this happen? The good news is that the state code is always correct in each workbook and is the second to last "word" (from the right) in each string. The zip is always the last word.
any help would be appreciated
Thanks
Mike