Data scrubbing a list of street addresses - how to trim off street number?

broken keyboard

New Member
Joined
Oct 10, 2017
Messages
8
Hi all,

Hoping to find some help on this one! I have a list of thousands of addresses containing numbers and street names, and I'm trying to "shave off" the number (and sometimes, unit name - e.g. 15/A or 64C) from this list in order to isolate the street name. Unfortunately there are no clear delimiters to work with that I can see. Here's a sample of some of the data below:

[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]187 Puriri Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]221A-D Puriri Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]48A&B Great North Road
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]76/A Treadwell Street[/TD]
[/TR]
</tbody>[/TABLE]
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]60,62 Great North Road
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]63a,b,c Polson Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]4 Seafront Road
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]336 Wicksteed Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]6, 1 -20 Kawakawa Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]41a 1-3 Rimu Street
[TABLE="width: 233"]
<tbody>[TR]
[TD="width: 233"]68,68C,B,A Nixon Street
[TABLE="width: 176"]
<tbody>[TR]
[TD="width: 176"]75 - 85 Ridgway Street
[TABLE="width: 176"]
<tbody>[TR]
[TD="width: 176"]40 D'Arcy Road[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]

[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]
Every address always starts with a number without fail. Any help would be much appreciated!

PS: I'm using Excel 2010 on Windows 8.
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest
Using the Find & Replace I was able to reduce the posted list to this below. Find > 1,2,3... Replace 'with nothing'. I went through all the numbers, 0 to 9. Then the comma (,) and the backslash ( /).

See CBNixon Street as a result which should be Nixon Street from the original 68,68C,B,A Nixon Street.
I should have used something like B,A'space' (B comma A space) in the Find and nothing in the Replace.

Then in a column used this formula =Trim(A2) and pulled down, copy > paste special > values.

You would select a column of the address's and Find & Replace with nothing the unwanted characters. If you have a lone A, for instance, use A+space in find. Otherwise it will take the A's (upper and lower) case from the names.

Not a sure fire method but may whittle it down a great deal.

Howard


[TABLE="width: 116"]
<tbody>[TR]
[TD]Puriri Street[/TD]
[/TR]
[TR]
[TD]Puriri Street[/TD]
[/TR]
[TR]
[TD]Great North Road[/TD]
[/TR]
[TR]
[TD]Treadwell Street[/TD]
[/TR]
[TR]
[TD]Great North Road[/TD]
[/TR]
[TR]
[TD]Polson Street[/TD]
[/TR]
[TR]
[TD]Seafront Road[/TD]
[/TR]
[TR]
[TD]Wicksteed Street[/TD]
[/TR]
[TR]
[TD]KawakawStreet[/TD]
[/TR]
[TR]
[TD]Rimu Street[/TD]
[/TR]
[TR]
[TD]CBNixon Street[/TD]
[/TR]
[TR]
[TD]Ridgway Street[/TD]
[/TR]
[TR]
[TD]D'Arcy Road[/TD]
[/TR]
</tbody><colgroup><col></colgroup>[/TABLE]
 
Upvote 0
Hm!
You have many many addresses to strip.
This macro may provide some solace but .... I'm wondering how long it would take to do its duty.
On the other hand, if it only has to run once at a down time then you may not worry.

Let me explain.
At this point I've only focused on your third listed item, Great North Road.
The macro take the full address and counts backwards until it finds a Capital B that is followed by a space, note the code:
If Mid(St, a, 2) = Chr(66) & Chr(32)
At that point, and I must acknowledge L.Howard here
L.Howard said:
If you have a lone A, for instance, use A+space in find.
for this prompt.
By starting at the end of the address and backtracking I wanted to find a space and then a capital letter (which as one would read would be a Capital followed by a space).

To progress the Macro needs to find if that character is a capital, ie, between Chr(64) and Chr(91) but not equal to either.
Then it will replace the text with the required Street name.

There is also one other thing that the macro needs to do. In 'counting back' if it finds an integer (0-9) then it can safely determine that what has been processed is the Street name.

Then what is lacking is the For .... Next loop to the Lastrow.

I don't have time to follow this through at the moment. Are you able to understand what I am about and be able to do the same?
If not, and noone joins us I shall address this in my morning.

Code:
Sub back()
Dim a As Long
Dim St As String
Dim ln As Long
Dim Cptl As String
St = Range("A3").Value
ln = Len(St)
Range("d1").Value = ln
For a = ln To 1 Step -1
If Mid(St, a, 2) = Chr(66) & Chr(32) Then
St = Right(St, ln - a)
Range("A3").Value = St
End If
Next
End Sub
 
Upvote 0
Sorry to take so long but I had some issues with this code, mostly with the first IF block.
I've managed to run this macro over the 13 items that you supplied and it works over that sample.

I don't know if the addition of:
Application.ScreenUpdating=False
Application.Calculation=xlCalculationManual
before the first FOR loop will speed things up, but if you do use it then add these lines at the end:
Application.ScreenUpdating=True
Application.Calculation=xlCalculationAutomatic

I trust this meets your requirements.
Code:
Sub back()
Dim a As Long
Dim x As Long
Dim St As String
Dim ln As Long
Dim Lastrow As Long
Lastrow = Cells(Rows.Count, "A").End(xlUp).Row
For x = 1 To Lastrow
St = Range("A" & x).Value
ln = Len(St)

For a = ln To 1 Step -1
'Identify Capital followed by space
If Chr(64) < Mid(St, a, 1) And Mid(St, a, 1) < Chr(91) And Mid(St, a + 1, 1) = Chr(32) Then
St = Right(St, ln - a)
Range("A" & x).Value = St
End If
'Identify Integer followed by space
If Chr(47) < Mid(St, a, 1) And Mid(St, a, 1) < Chr(58) And Mid(St, a + 1, 1) = Chr(32) Then
St = Right(St, ln - a)
Range("A" & x).Value = St
End If
'Identify Comma followed by minuscule followed by space
If a > 3 Then
If Chr(44) = Mid(St, a, 1) And Mid(St, a + 1, 1) < Chr(123) And Mid(St, a + 2, 1) = Chr(32) Then
St = Right(St, ln - a - 1)
Range("A" & x).Value = St
End If
End If

Next
Next
End Sub
 
Upvote 0
Welcome to the MrExcel board!

Could you carefully list all the expected results for the sample data you gave. Add any further explanation for any particularly 'tricky' ones if you can.
 
Upvote 0
@BrianJN1
I believe you need one further tweak to your code as it leaves a space at the beginning of each street name eg " Puriri Street"


@broken keyboard
Here is another possible approach for you to consider. This is a user-defined function (UDF). To implement ..
1. Right click the sheet name tab and choose "View Code".
2. In the Visual Basic window use the menu to Insert|Module
3. Copy and Paste the code below into the main right hand pane that opens at step 2.
4. Close the Visual Basic window.
5. Enter the formula as shown in the screen shot below and copy down.
6. Your workbook will need to be saved as a macro-enabled workbook (*.xlsm)

Code:
Function Street(s As String) As String
  With CreateObject("VBScript.RegExp")
    .Pattern = ".*\d[^ ]* (?!\d)"
    Street = .Replace(s, "")
  End With
End Function


Book1
AB
1
2187 Puriri StreetPuriri Street
3221A-D Puriri StreetPuriri Street
448A&B Great North RoadGreat North Road
576/A Treadwell StreetTreadwell Street
6
760,62 Great North RoadGreat North Road
863a,b,c Polson StreetPolson Street
94 Seafront RoadSeafront Road
10336 Wicksteed StreetWicksteed Street
116, 1 -20 Kawakawa StreetKawakawa Street
1241a 1-3 Rimu StreetRimu Street
1368,68C,B,A Nixon StreetNixon Street
1475 - 85 Ridgway StreetRidgway Street
1540 D'Arcy RoadD'Arcy Road
Sheet1
Cell Formulas
RangeFormula
B2=Street(A2)
 
Last edited:
Upvote 0
@BrianJN1
I believe you need one further tweak to your code as it leaves a space at the beginning of each street name eg " Puriri Street"
Thanks Peter. I was noting that all irrelevant information had gone but overlooked where my index terminated.
Code:
Sub back()
Dim a As Long
Dim x As Long
Dim St As String
Dim ln As Long
Dim Lastrow As Long
Lastrow = Cells(Rows.Count, "A").End(xlUp).Row
For x = 1 To Lastrow
St = Range("A" & x).Value
ln = Len(St)

For a = ln To 1 Step -1
'Identify Capital followed by space
If Chr(64) < Mid(St, a, 1) And Mid(St, a, 1) < Chr(91) And Mid(St, a + 1, 1) = Chr(32) Then
St = Right(St, ln - a[COLOR=#ff0000] - 1[/COLOR])
Range("A" & x).Value = St
End If
'Identify Integer followed by space
If Chr(47) < Mid(St, a, 1) And Mid(St, a, 1) < Chr(58) And Mid(St, a + 1, 1) = Chr(32) Then
St = Right(St, ln - a [COLOR=#ff0000]- 1[/COLOR])
Range("A" & x).Value = St
End If
'Identify Comma followed by minuscule followed by space
If a > 3 Then
If Chr(44) = Mid(St, a, 1) And Mid(St, a + 1, 1) < Chr(123) And Mid(St, a + 2, 1) = Chr(32) Then
St = Right(St, ln - a [COLOR=#ff0000]- 2[/COLOR])
Range("A" & x).Value = St
End If
End If

Next
Next
End Sub
I've highlighted changes in that code in red.
 
Upvote 0
I've highlighted changes in that code in red.
Brian, clearly I don't know what the OPs data may all be like, but looking at examples these 2 examples

63a,b,c Polson Street
41a 1-3 Rimu Street

I'm thinking that it may well be possible to have addresses like
63c Polson Street
41a Rimu Street

If that is the case, your code does not detect those.
 
Upvote 0
Point taken.
Yes, what other possibilities need to be considered?

I see that your Function uses wild card pattern matching (I don't understand that completely), is that likely to cover all possibilities that he would require?
Could such a pattern be used in a SUB?
 
Upvote 0
I see that your Function uses wild card pattern matching (I don't understand that completely), is that likely to cover all possibilities that he would require?
Quite possibly not as this sort of thing can be tricky if the data varies considerably. The 'rule' I worked on was
"Find the first space after the last digit and chop from the start to there off"
It seems to fit the sample data and the extra examples I introduced but wouldn't cope with "221A - D Puriri Street". It wouldn't surprise me at all if something like that actually turns up in the data. :eek:
In the end, you have to have a 'rule' or 'set of rules' to apply and the difficulty is often to get the OP to express those rules.


Could such a pattern be used in a SUB?
Yes
 
Upvote 0

Forum statistics

Threads
1,223,231
Messages
6,170,884
Members
452,364
Latest member
springate

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top