Extract address segments from string

TheCman81

Well-known Member
Joined
Feb 28, 2012
Messages
535
Hi All,

Excel 2007

I'm looking to extract parts of an address into different columns, I've already extract the Country, Post Code & Town. But what i'm having trouble doing is extract the street address and county, the reason I'm having trouble is because not every address has a county.

Below is what I have so far:

I need to extract Street (which is in red) into the Street Column
County (which is highthlighted in dark blue) into county

Excel 2010
ABCDEF
Address1StreetCountyTownPost CodeCountry
123 Bridge Street Kent KA4 1AA UNITED KINGDOM
KentKA4 1AAUNITED KINGDOM
Bridge Street Kent KA4 1AA UNITED KINGDOM
KentKA4 1AAUNITED KINGDOM
1 Random Road Edinburgh EH12 9AA UNITED KINGDOM
EdinburghEH12 9AAUNITED KINGDOM
East Avenue Maidstone Kent K1 1AA UNITED KINGDOM
KentK1 1AAUNITED KINGDOM
19 Random London S1 1AA UNITED KINGDOM
LondonS1 1AAUNITED KINGDOM
178 East Wellingotn St Mid-Lothian Edinburgh EH1 1AA UNITED KINGDOM
EdinburghEH1 1AAUNITED KINGDOM

<colgroup><col style="width: 25pxpx"><col><col><col><col><col><col></colgroup><thead>
</thead><tbody>
[TD="align: center"]1[/TD]

[TD="align: center"]2[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]3[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]4[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]5[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]6[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

[TD="align: center"]7[/TD]

[TD="align: right"][/TD]
[TD="align: right"][/TD]

</tbody>
Sheet1

[TABLE="width: 85%"]
<tbody>[TR]
[TD]Worksheet Formulas[TABLE="width: 100%"]
<thead>[TR="bgcolor: #DAE7F5"]
[TH="width: 10px"]Cell[/TH]
[TH="align: left"]Formula[/TH]
[/TR]
</thead><tbody>[TR]
[TH="width: 10px, bgcolor: #DAE7F5"]D2[/TH]
[TD="align: left"]=TRIM(RIGHT(SUBSTITUTE(TRIM(SUBSTITUTE(SUBSTITUTE(A2,F2,""),E2,""))," ",REPT(" ",99)),99))[/TD]
[/TR]
[TR]
[TH="width: 10px, bgcolor: #DAE7F5"]E2[/TH]
[TD="align: left"]=TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(A2," "&F2,"")," ",REPT(" ",99)),115))[/TD]
[/TR]
[TR]
[TH="width: 10px, bgcolor: #DAE7F5"]F2[/TH]
[TD="align: left"]=RIGHT(A2,14)[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
</tbody>[/TABLE]

Thanks in advance for our help :)
 

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)
Do you have a reverse lookup for the postal codes available? Usually in a situation like this I would recommend using a VLookup or Index-Match on the Postal Code to populate most of the fields. If the records are like the ones in Canada then there will be multiple addresses per postal code, it should be a simple matter of taking what you can glean off such as that and then taking the specific Address number substituting out that data to leave you with the first field.

Apparently, you can get such a list at this link:

UK government database of all 1,841,177 post codes together with precise geographic coordinates and other information, 8 Jul 2009 - WikiLeaks

Cheers, :)
 
Upvote 0
No I don't have a reverse look up. it's just a big data dump of addesses I have and I just to find out the most efficient way to do it and I don't think using another big dump like the one posted, the file is already pretty big
 
Upvote 0
I think the issue you are going to run into is that there just isn't a particularily efficient way of doing this kind of thing. Addresses are pretty notorious for not having patterns that are easy to parse. As such, I would take a look atb the pre-existing list I linked to above.

If you really don't want to do it that way, you can always find the last word in the remaining text you have above(after removing the town and post code) and validating that against a list of counties (assuming you have this). This won't work if you have more than one word in the county name(this seems to be a problem with your expression for towns too). Most likely there is no easy pattern that will allow you to separate the address and street number either.

Cheers, :)
 
Upvote 0
Hi

This will extract the street from what you already have :-
Code:
=SUBSTITUTE(A2,CONCATENATE(" ",D2," ",E2," ",F2),"")

May I suggest that you create a table of Counties to assist with filtering out those addresses which don't have the county present, eg Edinburgh.

hth
 
Upvote 0
I'm sorry, but I'm going to give you a lot of negatives, without much positive. :(

Attempting this with formulas would be a near-impossible task - see further comments below.
An attempt with vba without a complete list of Counties & Towns (& Countries if that varies) & street types/names would be pretty much as hard.

You say that you have formulas to extract Country, Post Code & Town, however ...

a) Country: If your Country is anything other than "UNITED KINGDOM", or some other 14-character Country, your formula will fail.
If the Country is always "UNITED KINGDOM" then you might as well just write UNITED KINGDOM in F2 and copy down rather than bothering with a formula.

b) Post Code: If you can get the Country right then getting the Post code should be okay with your type of method. (That's my bit of good news :))

c) Town: Your formula is based on the premise that all town names will consist of a single word. That is not the case. (Well, I suppose it may be with your data, but in general terms that is not always the case.)

d) County: Determining the County will be just as daunting since it may also consist of more than one word. Let's just for a moment suppose we could determine where the end of the Street name was. Between the street name and Post code will be a town and possibly a County. I don't know UK towns and Counties very well but a Quick Google search would lead me to believe that the number of words in the section between Street and Post Code could be from 1 to quite a few (eg South Kirkby and Moorthorpe in West Yorkshire, South Shields in Tyne and Wear, Burnham-on-Sea and Highbridge in Somerset .. you get the idea). Trying to determine if there is a County name and where the County ends and Town starts is pretty much impossible by direct formula.

e) Street: Without a complete list of street types (Street, Road, Avenue, etc AND all their abbreviations), it would be nigh impossible to tell where the Street name ends. That is made even worse by the look of your second-last data sample that appears to have no street designation at all.
 
Upvote 0
Thanks both,

I did think it would be near impossible without a pre exsisting list of towns and counties. So this is probably the route i'll have to take.

As for the Country, the data dump always ends with "UNITED KINGDOM", it's the same for every report so I know I can extract the post code quite easily. And to be honest, I don't even need the country in my list so I could just Ctrl H and replace that with blanks.

Thanks for your input anyway guys

Colin
 
Upvote 0

Forum statistics

Threads
1,223,236
Messages
6,170,917
Members
452,366
Latest member
TePunaBloke

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top