Table Join based on like string match.

PeteG

New Member
Joined
Sep 9, 2004
Messages
3
Access 2003, OS is XP.

Can/how would I create a join between two tables returning only those records from table two that has a common 10 digit string found within the field?

Table one contains purchase order numbers field (text) (ex. 6200005330, 4201005875 etc) from our system of record with a char len of 10.
Table two contains miscellaneous shipment reference number information (text) sent electronically by our service providers. This information will contain the purchase order numbers however it is frequently hidden between alpha/numeric characters (ex. 6200005330/10/1 or PO4201005875 etc…).

I would like join and return the data from table two when I can find a string match with the data in table 1 (ex 6200005330 = 6200005330/10/1 or 4201005875 = PO4201005875).
 

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.
I think that you will need to clean up your numbers first. Is there any logic that can be used to strip your number from the rest of the string? Will it always be a 10 digit string bracketed by non-numbers, in other words it wont be part of an 11 digit string for example?

Peter
 
Upvote 0
What logic could be used to clean this type of data. I have basically the same issue and no way to join my tables.
 
Upvote 0
The logic will depend on the data, In the data above if the 10 digit number will always be isolated from any other digits and the only number that length then we just need to search for a 10 digit string. In other cases it may be the first set of numbers after a word or something, the rules depend on the data.

peter
 
Upvote 0
Can we assume the purchase order number will always be 10 digits in length? Or will there be any values less than 1,000 million? (Such numbers will have a maximum of 9 digits). The reason I ask is because I suspect you might find a good proportion of the purchase order numbers will either be the first 10 characters of the string or the last 10 characters - I realise this is a massive assumption but you might get enough matches to make it worthwhile.
 
Upvote 0
What logic could be used to clean this type of data

How about this?

PurchaseOrderNo:Iif(Left([ShipmentRefNo], 2) ="PO", Right([ShipmentRefNo], 10), Left([ShipmentRefNo], 10))

You could create a query from the table with the shipment reference nos with the above.

This query could then be linked to the table with the purchase order nos using this field.
 
Upvote 0
I was thinking along the same lines but with a formula like this:

IIf(IsNumeric(Left([SRF],10)),Left([SRF],10),IIf(IsNumeric(Right([SRF],10)),Right([SRF],10),Null))

where SRF = ShipmentRefNo.

The formula above assumes that if the first 10 characters are numeric then use them, otherwise test the last 10 characters and if there is no numeric match, then return nothing.

However, if the last character is a period (".") then the formula incorrectly takes the period as the last character in the string where the first 9 characters are numeric but the last is a period. Please note that the formula above will not detect instances where the 10 digits are buried within the reference number (e.g.PO1234567980/1) and in such instances returns nothing.

Hope this helps, Andrew :biggrin:
 
Upvote 0
Thanks to everyone for their suggestions. The PO numbers from the system of record (table 1) is always 10 char in length.

Challenge is the poor data quality and lack of standardization in the reference number field (table 2). Ref number field is an EDI feed, 30 char in length, frequently containing multiple pieces of information on the same line. Initially I tried to clean the reference number data but found there to be so many prefix (PO, P.O., PO: etc…) and suffix variations that it became overwhelming, thus my hope to match to the string when ever a match was found.

Not being strong at access I originally took a two step approach, first trying to find and then mark the end of each prefix with logic:
PONBR: trim(mid([Shipper_ref_no],instr([shipper_ref_no],”P.O.”)+4))
Then using left([PONBR],10) to bring back the actual number.

After 20 such queries I had only matched to approximately 60% of the records.

Building on Andrew’s suggestion, is there be a way to identify when a 10 digit numeric string exists and at what point it starts? I could then return the 10 digits from that point and then try to run the match.

Thanks again.
Pete
 
Upvote 0
Here is some code to play with.

Rich (BB code):
' stolen from Richie(UK)
' in post http://www.mrexcel.com/board2/viewtopic.php?t=103884&highlight=regexp
Function RE10(strData As String) As String
    Dim RE As Object, REMatches As Object

    Set RE = CreateObject("vbscript.regexp")
    With RE
        .MultiLine = False
        .Global = False
        .IgnoreCase = True
        .Pattern = "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]"
    End With
    
    Set REMatches = RE.Execute(strData)
    RE10 = REMatches(0)

End Function

Not sure how it works :) but it should get your values for you.

Peter
 
Upvote 0
Challenge is the poor data quality and lack of standardization in the reference number field (table 2). Ref number field is an EDI feed, 30 char in length, frequently containing multiple pieces of information on the same line. Initially I tried to clean the reference number data but found there to be so many prefix (PO, P.O., PO: etc…) and suffix variations that it became overwhelming, thus my hope to match to the string when ever a match was found.

If you are having trouble with the previous suggestion from bat17, then there is an alternative you can try. The previous suggestion is a lot more elegant than mine and I recommend you try that first.

Given the field from the EDI field is 30 characters in length, you will need to clean up the data (by inserting a new field into your table and then cleaning the data with an update query) before trying to match the 10-character (digit? :wink: ) number per my previous post.

I have used some nested replace statements to clean up the data as follows :

replace(replace(replace(replace(replace(replace(replace([Shipper_Ref_No],".",""),",","")," ",""),"#",""),":",""),";",""),"PO","")

You can add more arguments to my example (e.g. Purchase, Order, Number, Num, etc.). The nested replace statement might give you the first or last 10 characters as a 10-digit number more than 60% of the time. Then you can try to link this “trimmed” field using the logic from my previous post.

This won't work with an item such as “PO Num - 1234567890/1” – because I haven’t stripped out “Num” nor the “-” or the “/”. I purposely haven't stripped out the "/" in case there are “/n” suffixes which will give you the wrong number if the “/” is removed. I’m not sure of the limitations of this, but you can always create a 2nd field to strip out additional data. NB : the nested replace functions work from the inside out, such that the “.” is removed first, then the comma, then the space, then the hash and so forth.

You may also want to wrap all of this up in a "Trim" function. I trust you would get a better hit rate than 60% with something like this.

Hope this helps, Andrew. :)
 
Upvote 0

Forum statistics

Threads
1,221,813
Messages
6,162,126
Members
451,743
Latest member
matt3388

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top