Data scrubbing a list of street addresses - how to trim off street number?

broken keyboard

New Member
Joined
Oct 10, 2017
Messages
8
Hi all,

Hoping to find some help on this one! I have a list of thousands of addresses containing numbers and street names, and I'm trying to "shave off" the number (and sometimes, unit name - e.g. 15/A or 64C) from this list in order to isolate the street name. Unfortunately there are no clear delimiters to work with that I can see. Here's a sample of some of the data below:

[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]187 Puriri Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]221A-D Puriri Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]48A&B Great North Road
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]76/A Treadwell Street[/TD]
[/TR]
</tbody>[/TABLE]
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]60,62 Great North Road
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]63a,b,c Polson Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]4 Seafront Road
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]336 Wicksteed Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]6, 1 -20 Kawakawa Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]41a 1-3 Rimu Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]68,68C,B,A Nixon Street
[TABLE="width: 176"]
<tbody>[TR]
[TD="width: 176"]75 - 85 Ridgway Street
[TABLE="width: 176"]
<tbody>[TR]
[TD="width: 176"]40 D'Arcy Road[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]

[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
Every address always starts with a number without fail. Any help would be much appreciated!

PS: I'm using Excel 2010 on Windows 8.
 
Quite possibly not as this sort of thing can be tricky if the data varies considerably. The 'rule' I worked on was
"Find the first space after the last digit and chop from the start to there off"
It seems to fit the sample data and the extra examples I introduced but wouldn't cope with "221A - D Puriri Street". It wouldn't surprise me at all if something like that actually turns up in the data. :eek:
One possible concern with your code would be addresses that we have here in the US in large cities (not sure if the rest of the world duplicates this type of street naming) like these...

123 W45th St

123 West 45th Street
 
Upvote 0

Excel Facts

What is the fastest way to copy a formula?
If A2:A50000 contain data. Enter a formula in B2. Select B2. Double-click the Fill Handle and Excel will shoot the formula down to B50000.
Hi all,

Thanks for all the help so far with the macro. I do have some more examples below of the sort of data I'm working with:

10a Brassey Road
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]22a Kings Avenue
279 St Hill Street
335 Victoria Avenue
104-106 Bell Street




[/TD]
[/TR]
</tbody>[/TABLE]

I haven't had a lot of experience with macros however - lets say I have all my data in Column A - does this need to be specified in the macro code itself? Or is the macro "function" called "back()" and I should be able to run =back(A1) to apply the macro to my data? Thanks for the continued help :)
 
Upvote 0
Hi all,

Thanks for all the help so far with the macro. I do have some more examples below of the sort of data I'm working with:
What about any addresses that look anything like what I posted in Message #11 where there is a number later in the address that you would not want to break at?
 
Upvote 0
Or is the macro "function" called "back()" and I should be able to run =back(A1) to apply the macro to my data?
Brian's 'back' code is not a function so cannot, as it currently stands, be used like that and the column letter needs to included in his code (which it is).

However, my code is a function and you can use it as I described in post 6 where I listed in detail the steps you needed to take to implement the code. Did you give that a try yet?
You could use the function on any column without altering the code at all, you would just use the function like, say,
=Street(F4) to extract the street name from that cell.

You do need to also address Rick's question from post 11 though. If you do have any data like that, do you have a 'rule' or 'set of rules' for us to work out what to extract and what to omit that would apply to all addresses?

BTW, my existing function works fine on those extra examples in post 12.
 
Last edited:
Upvote 0
@Peter, @Rick,
My code copes with everything so far which does include Rick's examples BUT does NOT accommodated Peter's example of "63c Polson Street".
I haven't tried to code that yet.
 
Last edited:
Upvote 0
I haven't tried to code that yet.
I'm not going to bother unless the OP tells us
a) That Rick's or my alternatives are possible with their data, and
b) If so, some guidance about how to distinguish a street name from a non-street name for their data.
 
Upvote 0
What about any addresses that look anything like what I posted in Message #11 where there is a number later in the address that you would not want to break at?


Hi Rick, while I have 3 different addresses with a number at the end or within the address, I could manually manage those (I.E. manually adjust outside of a formula/macro) since there's only 3 of them. Those addresses were: State Highway 3, State Highway 4 and No. 3 Line. While the particular town I'm looking at doesn't have many addresses with numbers inside the street name, there are some towns in New Zealand that have similarly named streets - so something I'd be curious about how to handle in the future. Perhaps though, because that system of naming roads is uncommon in NZ, it may be something I could manually manage.


Brian's 'back' code is not a function so cannot, as it currently stands, be used like that and the column letter needs to included in his code (which it is).


However, my code is a function and you can use it as I described in post 6 where I listed in detail the steps you needed to take to implement the code. Did you give that a try yet?
You could use the function on any column without altering the code at all, you would just use the function like, say,
=Street(F4) to extract the street name from that cell.


You do need to also address Rick's question from post 11 though. If you do have any data like that, do you have a 'rule' or 'set of rules' for us to work out what to extract and what to omit that would apply to all addresses?


BTW, my existing function works fine on those extra examples in post 12.


Hi Peter, I have now been able to run the macro you had written in post #6 . It has done an excellent job, thank you! There are, out of about 60000, perhaps around 200 - 300 manual adjustments required for the addresses. These are the types of addresses that needed adjusting. I could manually adjust them though, since 99% of the addresses are looking perfect - so if it might be difficult to allow for those special circumstances, the macro is more than enough as-is to help resolve my original problem.


271 A-D Somme Parade
270 A-C Somme Parade
70 A- D Keith Street
95 A & B Niblett Street
13 A - E Pitt Street
14A - E Campbell Street
39 A - C Harrison Street
11 A-C Somme Parade
111 A - D Somme Parade
93A - C Campbell Street
59A - D Pitt Street
146 A- C Bell Street
86 A & B Bell Street
11A - D Campbell Street
14A - D Keith Street
16A - D Keith Street
14 A - C Springvale Road
11 A-E Fox Road
8 A -D Parsons Street
3A & B Sheffield Place
41 A Cumbrae Pl
23 A Cumbrae Pl
13 A Collingwood St
25 A Brassey Rd
48 A College St
39 A College St
11a - d Campbell Street
27 A - H Campbell Street
20a - d Gonville Avenue
26d - l Puriri Street
27 (a,b) Treadwell Street
2a - e Roberts Avenue
16 a -d Keith Street
24 a - d Bignell Street
194 a-c Somme Parade
83 A,B Heads Road
2a - d Russell Street
15 a - d Niblett Street
42a - c Fitzherbert Avenue


Thanks again all for your helpful guidance :)
 
Upvote 0
Great that you've had considerable success.
if you wish to further reduce your manual work then run my code over those last few. It will work on most of what your examples above, things with inclusions like a-c, (a,b) are only partially processed. I note your "Highway 3", etc. My code skips them.

What I probably should have mentioned is that my code looks at the contents of each cell in Column A and replaces them.

If you want to leave the original address intact then you can simply modify the Macro to house the modifications in another column. I note however that it won't move "Highway 3" into that new column.
The slightly adjusted macro is:
Code:
Sub back()
Dim a As Long
Dim x As Long
Dim St As String
Dim ln As Long
Dim Lastrow As Long
Lastrow = Cells(Rows.Count, "[B]A[/B]").End(xlUp).Row
For x = 1 To Lastrow
St = Range("[B]A[/B]" & x).Value
ln = Len(St)

For a = ln To 1 Step -1
'Identify Capital followed by space
If Chr(64) < Mid(St, a, 1) And Mid(St, a, 1) < Chr(91) And Mid(St, a + 1, 1) = Chr(32) Then
St = Right(St, ln - a - 1)
Range("[COLOR=#ff0000][B]C[/B][/COLOR]" & x).Value = St
End If
'Identify Integer followed by space
If Chr(47) < Mid(St, a, 1) And Mid(St, a, 1) < Chr(58) And Mid(St, a + 1, 1) = Chr(32) Then
St = Right(St, ln - a - 1)
Range("[B][COLOR=#ff0000]C[/COLOR][/B]" & x).Value = St
End If
'Identify Comma followed by minuscule followed by space
If a > 3 Then
If Chr(44) = Mid(St, a, 1) And Mid(St, a + 1, 1) < Chr(123) And Mid(St, a + 2, 1) = Chr(32) Then
St = Right(St, ln - a - 2)
Range("[COLOR=#ff0000][B]C[/B][/COLOR]" & x).Value = St
End If
End If

Next
Next
End Sub
If you do choose this option then copy those remaining addresses to another column, eg,Col D and change the A in the Macro to D and the C to E.
 
Upvote 0
Give this version a try. It includes your 'special cases' of "State Highway 4" etc. If you encounter more of those, you can add more into the blue part of the first Pattern line. Separate each with a "|" and put a "#" where any digit/number occurs.

If you end up with some like Rick's example of "West 45th Street" then you may need several additions to that Pattern line ...<noparse>
"West #st Street" to cover West 21st Street etc
"West #nd Street" to cover West 22nd Street etc
"West #rd Street"</noparse>
etc
.. or if it becomes a problem, there may be something else we can do within the code but I won't look at that unless there is an issue with your data.

Rich (BB code):
Function Street(s As String) As String
  With CreateObject("VBScript.RegExp")
    .Pattern = Replace(Replace("State Highway #|No. # Line", "#", "\d+"), ".", "\.")
    If .Test(s) Then
      Street = .Execute(s)(0)
    Else
      .Pattern = ".*? (?=[A-Z][a-z']\D*)"
      Street = .Replace(s, "")
    End If
  End With
End Function

Sample data and results


Book1
AB
2187 Puriri StreetPuriri Street
3221A-D Puriri StreetPuriri Street
460,62 Great North RoadGreat North Road
563a,b,c Polson StreetPolson Street
64 Seafront RoadSeafront Road
7271 A-D Somme ParadeSomme Parade
895 A & B Niblett StreetNiblett Street
913 A - E Pitt StreetPitt Street
103A & B Sheffield PlaceSheffield Place
1127 (a,b) Treadwell StreetTreadwell Street
1242A State Highway 4State Highway 4
132a,c A-D No. 45 LineNo. 45 Line
1426a, Xb,c State Highway 1State Highway 1
Sheet3
Cell Formulas
RangeFormula
B2=Street(A2)
 
Last edited:
Upvote 0
Solution
If you end up with some like Rick's example of "West 45th Street" ...
.. there may be something else we can do within the code
That would be like this
Rich (BB code):
.Pattern = Replace(Replace("State Highway #|No. # Line|West #(st|nd|rd|th) Street", "#", "\d+"), ".", "\.")


Book1
AB
23A & B Sheffield PlaceSheffield Place
327 (a,b) Treadwell StreetTreadwell Street
442A State Highway 4State Highway 4
52a,c A-D No. 45 LineNo. 45 Line
626a, Xb,c State Highway 1State Highway 1
723 West 22nd StreetWest 22nd Street
82 (a - b) West 45th StreetWest 45th Street
Sheet4
Cell Formulas
RangeFormula
B2=Street(A2)
 
Upvote 0

Forum statistics

Threads
1,225,754
Messages
6,186,826
Members
453,377
Latest member
JoyousOne

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top