cleaning address list of stray characters and bad spacing

mikenola

New Member
Joined
Jul 29, 2012
Messages
6
Hey All,

Here is my problem, I have millions of addresses in multiple workbooks that were supplied to me.

The data is currently in two columns, colA Name colB Address.

ColA is fine like it is. ColB is the problem.

Right now I am trying to clean up ColB so it can be separated out as Street, City, Zip

The problem is cleaning the contents of misspellings and missing spaces between (words concatenated) by the source export (or data input)

Using the built in Excel text to columns, sort and filter has not really reduced the problems.

here is an example of the biggest problem
[TABLE="width: 500"]
<tbody>[TR]
[TD]

<colgroup><col width="408"></colgroup><tbody>
[TD="width: 408"]429 CHRISTOPHER AVE T2L14GAITHERSBURG MD 20879
[/TD]

</tbody>
[/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]


That 14Gaithersburg could be anything 14GGaithersburg, 14LeGaithersburg,
253b-tGaithersburg, etc.

I do have a list of all the cities that should be a possible valid entry in each state book. What I am thinking is to find a way to use that list as a range search and then place a space between the legitimate City Name and the mistyped ones.

any suggestions on how I can consistently make this happen? The good news is that the state code is always correct in each workbook and is the second to last "word" (from the right) in each string. The zip is always the last word.

any help would be appreciated

Thanks

Mike
 

Excel Facts

What did Pito Salas invent?
Pito Salas, working for Lotus, popularized what would become to be pivot tables. It was released as Lotus Improv in 1989.
I tested this but only using one workbook, so you should probably set up your own test with one or two copies before trying it on the whole lot. This assumes that all workbooks will be open at runtime. It will have to be modified if you want to open, execute the changes, save and close each one, one at a time. This procedure uses columns D and E of the sheet in the host workbook to store the Workbook names and city names, respectively. If you put them in a different column it will require modification to the code. The procedure will find a city from col E and then manipulate the data in the cell to enter a space in front of the city name.

Code:
Sub sidiary()
Dim sh As Worksheet, sh2 As Worksheet, lr As Long, rng As Range, c As Range, _
wb As Workbook, r As Range, delPoint As Long, dRng As Range
Set sh = ThisWorkbook.Sheets(1)
lr = sh.Cells(Rows.Count, 5).End(xlUp).Row 'List of cities in Col E.
Set rng = sh.Range("E2:E" & lr)
For Each wb In sh.Range("D2:D" & sh.Cells(Rows.Count, 4).End(xlUp).Row)
Set sh2 = Workbooks(wb).Sheets(1)
For Each c In rng
For Each r In sh2.Range("B2", sh2.Cells(Rows.Count, 2).End(xlUp))
If InStr(LCase(r), LCase(c.Value)) > 0 Then
If InStr(LCase(r), LCase(c.Value)) - 1 <> " " Then
MyStr = Left(r, InStr(LCase(r), LCase(c.Value)) - 1) & " "
delPoint = InStr(LCase(r), LCase(c.Value)) - 1
r.Characters(1, delPoint).Delete
r = MyStr & r.Value
End If
End If
Next
Next
Workbooks(wb).Save
Next
End Sub
Code:
 
Upvote 0
Jlgwhiz, thanks for the quick reply.

I tried your code and all I get is a type mismatch error on execution.
If i step into it, it highlights the name Sub sidiary() and then on F5 it throws Type Mismatch (13) error. What am I doing wrong?

I can't upload the book so let me try and give you more information:

Book is called MD.XLSM
has 3 sheets, default names sheet1, 2, 3
on Sheet1 in ColB I have the list of addresses to be cleaned. The list is 113519 rows of data
This book is only for maryland.

IN Col E I have the list of cities starting in E2 (heading of List in E1). There are 427 city names in the list
In Col D I have only the maryland book in D2 (Col heading in D1 is Books). (though I don't see that this is needed since I am running the code from the same book as the data. As a note I did try running this from a different book called cleaning.xlsm and the error is the same).

I am using excel 2010.
The help screen on the error just says:
[h=1]Type mismatch (Error 13)[/h]This page is specific to the Visual Basic for Applications (VBA) Language Reference for Office 2010. Visual Basic is able to convert and coerce many values to accomplish data type assignments that weren't possible in earlier versions. However, this error can still occur and has the following causes and solutions:

  • The variable or property isn't of the correct type. For example, a variable that requires an integer value can't accept a string value unless the whole string can be recognized as an integer.
    Try to make assignments only between compatible data types. For example, an Integer can always be assigned to a Long, a Single can always be assigned to a Double, and any type (except a user-defined type) can be assigned to a Variant.
  • An object was passed to a procedure that is expecting a single property or value.
    Pass the appropriate single property or call a method appropriate to the object.






Any guidance would be most appreciated.

Mike
 
Upvote 0
If you are going to run the code from the same workbook as the data, then this revision should run without error, if you have your cities listed in Column E. Column D will be unnecessary to list workbooks if you only do one at a time, so I deleted that feature.

Code:
Sub sidiary2()
Dim sh As Worksheet, lr As Long, rng As Range, c As Range, _
wb As Workbook, r As Range, delPoint As Long
Set sh = ThisWorkbook.Sheets(1)
Set wb = ThisWorkbook
lr = sh.Cells(Rows.Count, 5).End(xlUp).Row 'List of cities in Col E.
Set rng = sh.Range("E2:E" & lr)
For Each c In rng
For Each r In sh.Range("B2", sh.Cells(Rows.Count, 2).End(xlUp))
If InStr(LCase(r), LCase(c.Value)) > 0 Then
If InStr(LCase(r), LCase(c.Value)) - 1 <> " " Then
MyStr = Left(r, InStr(LCase(r), LCase(c.Value)) - 1) & " "
delPoint = InStr(LCase(r), LCase(c.Value)) - 1
r.Characters(1, delPoint).Delete
r = MyStr & r.Value
End If
End If
Next
Next
wb.Save
End Sub
Code
 
Upvote 0
Jlgwhiz, wow that was quick. thanks.

sadly it still does not work. It does not error, but it does lock up my screen for a couple of minutes.

Is there some place I can post the whole workbook so you can see the actual behavior?

it is 113519 rows long, and would only contain the address field in col B and the city field in col e

btw I really appreciate your assistance.
 
Upvote 0
Jlgwhiz, wow that was quick. thanks.

sadly it still does not work. It does not error, but it does lock up my screen for a couple of minutes.

Is there some place I can post the whole workbook so you can see the actual behavior?

it is 113519 rows long, and would only contain the address field in col B and the city field in col e

btw I really appreciate your assistance.

Did you look to see if there are spaces in front of the city names? This code does not flicker and flash like recorded macros, so if it did not error, it probably created th space. It worked fine in testing. If it did actually create the space as desired, post back and I will modify the code so that it will apply to all sheets in the workbook.
 
Upvote 0
Also, be sure that your city names do not include the state or eny extraneous characters in column E. ie. Should be Baltimore and NOT Baltimore, MD in Column E.
 
Upvote 0
JLgWhiz,

I filtered the data after the procedure runs and I still have 123baltimore, #23TAnnapolis, CCumberland, etc.

From playing with this book before your code, it looks to be at least 30K of the city names have some sort of extra data attached to the front of the city name.

The address list is all in caps, but the city list is in proper form like "Annapolis", The state is not appended in the City list.

can you tell me what you tested? 1 line? or a specific pattern? I tried to make it clear that almost anything can appear at the front of these city names. including some with "-"', "#", "@" and "/" basically anything but quotes.

I have run =clean() and =trim() on both columns.

question? would approaching this by finding the city name and then finding the first blank to the left of it and then splitting the string at that point make this easier?

For the purposes of this data that extra stuff is not important in the final result and will be deleted.
so 214 south rd #34-ppasadena md 21212 needs to be 214 South Road, Pasadena MD 21212
 
Upvote 0
I have debugged this the best that I can on a test bed. Hopefully, it will work on your system.

This was tested using multiple sheets in a single workbook. The list of cities is in column E of sheet 1. The procedure will check each cell in column B for a match against each city listed in column E, if a match is found, then a space will be created between the city name and the data preceding it. When each page is completed it will then advance to the next page. If there are any sheets with different data in column B it will simply ignore that data unless it coincidentally contains the city name.

Code:
Sub sidiary3()
Dim sh As Worksheet, lr As Long, rng As Range, c As Range, _
wb As Workbook, r As Range, delPoint As Long
Set wb = ThisWorkbook
lr = Sheets(1).Cells(Rows.Count, 5).End(xlUp).Row 'List of cities in Col E.
Set rng = Sheets(1).Range("E2:E" & lr)
For Each sh In ThisWorkbook.Sheets
For Each c In rng
For Each r In sh.Range("B2", sh.Cells(Rows.Count, 2).End(xlUp))
If InStr(LCase(r), LCase(c.Value)) > 0 Then
If Mid(r, InStr(LCase(r), LCase(c.Value)) - 1, 1) <> " " Then
MyStr = Left(r, InStr(LCase(r), LCase(c.Value)) - 1) & " "
delPoint = InStr(LCase(r), LCase(c.Value)) - 1
r.Characters(1, delPoint).Delete
r = MyStr & r.Value
End If
End If
Next
Next
Next
wb.Save
End Sub
Code:

There was a problem with the reference to column E in the prvious version.
 
Upvote 0
I really do appreciate all the effort.

This version throws invalid argument or procedure call.

I don't know if it is legal to do here but if you want to create a throwaway email addy I will send you this book so you can see what it is doing....
 
Upvote 0

Forum statistics

Threads
1,223,243
Messages
6,170,971
Members
452,371
Latest member
Frana

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top