Fuzzy Duplicate matching

Nondricek

New Member
Joined
Aug 15, 2014
Messages
1
Problem – my company has an in house database for monitoring clients. We often refer to outside databases to gather additional information on these clients, but our naming conventions aren’t the same. Therefore only about a third of our names are an exact match.
I can’t change the name in our or the outside database to match up exactly, so instead I’m going to add (as an additional field) to our database the name as it appears in the outside database for future reference.

Goal – Identify and pair up partial matches across two large data sets.

Info – Data has several values including full name and full address.

Previously tried –
I have downloaded and been using an extension for excel called Fuzzy look up which helps me find partial duplicate strings. It can be found here. http://www.microsoft.com/en-us/downl....aspx?id=15011

I think this could be the answer to my problem, but I don’t think I’m using it completely correctly. I’ve built my two tables and run the application to compare them with a 50% similarity minimum threshold. Sometimes the results that it spits out are spot on, sometimes they are close and other times they are way off. They are way off despite still being over the 50% similarity.
One way I know that I can improve the results is to have it search and filter first by state and then by name, but I'm unable to figure out how to tell it to look at the states first, and then the names. In other words have fuzzy look at the states in table A and match them up with the states in table B. Of the states that match up then match the names from table A with table B, then repeat for every state. Every result should have the same state. If there is no match within 50% similarity in the same state, then leave it blank, don't try another state. If I wanted to I supposed I could filter Tables A and B by state then make new "state tables" and do fuzzy lookups by each state table, but I don't relish doing that fifty times.

Once I get comfortable with this solution I'd like to extend it to other fuzzy analyses like matching one e-mail up against another, but comparing the ending of the e-mail (everything after the @ sign), and then looking at the first part of the e-mail.

This is the file format (although I have plenty of flexibility to move it around)
Name (Database A) State (A) Name (Database B) State (B)

PS I'm all ears for a different solution for this. In the past I've used the fuzzy lookup macro found here - http://www.mrexcel.com/forum/excel-q...planation.html - but it was PAINFULLY slow and seemed to have the same functionality as this add in, but was just used for pre office 2010 versions.

PS 2 - can I attach a file to show you guys what I mean?
Thanks!
 

Excel Facts

What is the last column in Excel?
Excel columns run from A to Z, AA to AZ, AAA to XFD. The last column is XFD.

Forum statistics

Threads
1,225,531
Messages
6,185,482
Members
453,297
Latest member
alvintranvcu123

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top