Problem – my company has an in house database for monitoring clients. We often refer to outside databases to gather additional information on these clients, but our naming conventions aren’t the same. Therefore only about a third of our names are an exact match.
I can’t change the name in our or the outside database to match up exactly, so instead I’m going to add (as an additional field) to our database the name as it appears in the outside database for future reference.
Goal – Identify and pair up partial matches across two large data sets.
Info – Data has several values including full name and full address.
Previously tried –
I have downloaded and been using an extension for excel called Fuzzy look up which helps me find partial duplicate strings. It can be found here. http://www.microsoft.com/en-us/downl....aspx?id=15011
I think this could be the answer to my problem, but I don’t think I’m using it completely correctly. I’ve built my two tables and run the application to compare them with a 50% similarity minimum threshold. Sometimes the results that it spits out are spot on, sometimes they are close and other times they are way off. They are way off despite still being over the 50% similarity.
One way I know that I can improve the results is to have it search and filter first by state and then by name, but I'm unable to figure out how to tell it to look at the states first, and then the names. In other words have fuzzy look at the states in table A and match them up with the states in table B. Of the states that match up then match the names from table A with table B, then repeat for every state. Every result should have the same state. If there is no match within 50% similarity in the same state, then leave it blank, don't try another state. If I wanted to I supposed I could filter Tables A and B by state then make new "state tables" and do fuzzy lookups by each state table, but I don't relish doing that fifty times.
Once I get comfortable with this solution I'd like to extend it to other fuzzy analyses like matching one e-mail up against another, but comparing the ending of the e-mail (everything after the @ sign), and then looking at the first part of the e-mail.
This is the file format (although I have plenty of flexibility to move it around)
Name (Database A) State (A) Name (Database B) State (B)
PS I'm all ears for a different solution for this. In the past I've used the fuzzy lookup macro found here - http://www.mrexcel.com/forum/excel-q...planation.html - but it was PAINFULLY slow and seemed to have the same functionality as this add in, but was just used for pre office 2010 versions.
PS 2 - can I attach a file to show you guys what I mean?
Thanks!
I can’t change the name in our or the outside database to match up exactly, so instead I’m going to add (as an additional field) to our database the name as it appears in the outside database for future reference.
Goal – Identify and pair up partial matches across two large data sets.
Info – Data has several values including full name and full address.
Previously tried –
I have downloaded and been using an extension for excel called Fuzzy look up which helps me find partial duplicate strings. It can be found here. http://www.microsoft.com/en-us/downl....aspx?id=15011
I think this could be the answer to my problem, but I don’t think I’m using it completely correctly. I’ve built my two tables and run the application to compare them with a 50% similarity minimum threshold. Sometimes the results that it spits out are spot on, sometimes they are close and other times they are way off. They are way off despite still being over the 50% similarity.
One way I know that I can improve the results is to have it search and filter first by state and then by name, but I'm unable to figure out how to tell it to look at the states first, and then the names. In other words have fuzzy look at the states in table A and match them up with the states in table B. Of the states that match up then match the names from table A with table B, then repeat for every state. Every result should have the same state. If there is no match within 50% similarity in the same state, then leave it blank, don't try another state. If I wanted to I supposed I could filter Tables A and B by state then make new "state tables" and do fuzzy lookups by each state table, but I don't relish doing that fifty times.
Once I get comfortable with this solution I'd like to extend it to other fuzzy analyses like matching one e-mail up against another, but comparing the ending of the e-mail (everything after the @ sign), and then looking at the first part of the e-mail.
This is the file format (although I have plenty of flexibility to move it around)
Name (Database A) State (A) Name (Database B) State (B)
PS I'm all ears for a different solution for this. In the past I've used the fuzzy lookup macro found here - http://www.mrexcel.com/forum/excel-q...planation.html - but it was PAINFULLY slow and seemed to have the same functionality as this add in, but was just used for pre office 2010 versions.
PS 2 - can I attach a file to show you guys what I mean?
Thanks!