Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,546
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
I have a problem where this fuzzymatching might help. I have 2 excel sheets of 10,000+ records each. One of the common elements is NAME. I need to compare these two sheets, and when the computer finds a close to 100% match (say 80%), then to combine the 2 records into a 3rd sheet. Ideally, at the end of this comparison, I will have a 3rd sheet with just several hundret names and all the information from that row from sheet 1 and 2. Then I can eyeball these few records and see if the computer was right. Can fuzzymatching be used for such a problem? Thanks for the help.
 
Upvote 0

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce
Hi ttawd, Welcome to the board!

FuzzyVLookup is merely a fuzzy replacement for VLookup, so if you can envisage how you would set out the sheets with VLookup, you can do something similar with FuzzyVLookup.

Having said that, I dont know the format of your sheets, but from your description, it sounds like you need a bespoke macro (which may possibly utilise the FuzzyPercent function).

In either case, you have to define what form of fuzzy matching you require - there are 3 algorithms available:
Algorithm 1: a fuzzy match for minor misspellings
Algorithm 2: a fuzzy match for words / phrases not in the same order (e.g. matches 'Bill Jelen' with 'Jelen Bill')
Algorithm 3 a compromise match of the above.
 
Upvote 0
Hi ttawd, can you adapt this:

Input sheet 1:
Excel Workbook
ABCDE
1NameDataA 1DataA 2DataA 3DataA 4
2LndonB$B$2B$C$2B$B$2B$C$2
3MnchestrB$B$3B$C$3B$B$3B$C$3
4LvrplB$B$4B$C$4B$B$4B$C$4
5GlsgwB$B$5B$C$5B$B$5B$C$5
6HllB$B$6B$C$6B$B$6B$C$6
7YrkB$B$7B$C$7B$B$7B$C$7
8BrmnghmB$B$8B$C$8B$B$8B$C$8
Sheet1


Continued ...
 
Upvote 0
... continued

Input sheet 2:
Excel Workbook
ABC
1NameDataB 1DataB 2
2LondonB$B$2B$C$2
3ManchesterB$B$3B$C$3
4LiverpoolB$B$4B$C$4
5GlasgowB$B$5B$C$5
6HullB$B$6B$C$6
7YorkB$B$7B$C$7
8BirminghamB$B$8B$C$8
Sheet2
 
Upvote 0
Hi all,

I am not sure if this has already been posted but :)

If you are using the original example sheets format then this works great for just giving the result with the highest % match.

{=IF(MIN(IF(G3:I3=MAX(G3:I3),COLUMN(G3:I3),""))=7,C3,IF(MIN(IF(G3:I3=MAX(G3:I3),COLUMN(G3:I3),""))=8,D3,E3))}

I am curiose about using this kind of format to froduce a fuzzy match array formula all in one cell so that it takes up less data ink?

Any ideas.
 
Upvote 0
Thanks Alan for the solution to my problem. I made your code work for my application and my sheets that I was comparing had 30,000 rows each, so it took about 24 hours to run. Now, I have a new file with 30,000 rows, and with meaningful data in about 5% of the rows with values > 0.8. The computer will not let me save the file due to insuficcient resources.

Basically this is what I need:
IF B1 to B30000 >.8 THEN copy the VALUES of the entire row to a new worksheet in a new file.

Hopefully whatever macro I run, will not disturb my original sheet, because I cannot wait another 24 hours to generate the results.

Thanks for all your help. I can tell that the Computer Scientists/Engineers that are on this forum are really knowleageable. Your help made an impossible problem possible. Thanks again.

-Mike-
ttawd
 
Upvote 0
Hi Mike, place this macro in the original workbook. It will create a new w/book & write the appropriate rows to it.

The macro assumes that the data is in Sheet3:
Code:
Sub CopyRows()
Dim dblCurValue As Double
Dim lRow As Long, lRowEnd As Long, lCounter As Long
Dim rCur As Range
Dim wbNew As Workbook
Dim wsInput As Worksheet, wsOutput As Worksheet

Set wsInput = Sheets("Sheet3")
lRowEnd = wsInput.Cells(Rows.Count, "B").End(xlUp).Row

Set wbNew = Workbooks.Add
Set wsOutput = wbNew.Sheets(1)

lRow = 1
lCounter = 1000
For Each rCur In wsInput.Range("B2:B" & lRowEnd)
    lCounter = lCounter - 1
    If lCounter < 1 Then
        Application.StatusBar = "Processing row " & rCur.Row & " of " & lRowEnd
        lCounter = 1000
    End If
    
    dblCurValue = 0
    On Error Resume Next
    dblCurValue = Val(rCur.Value)
    On Error GoTo 0
    If dblCurValue > 0.8 Then
        lRow = lRow + 1
        wsOutput.Rows(lRow).Value = wsInput.Rows(rCur.Row).Value
    End If
Next rCur
Application.StatusBar = False
End Sub
 
Upvote 0
Alan,

Wow! Thanks a million. I just applied your macro to my original results (which I obtained with your help this morning after more than a full day of processing for my computer). It took less than 1 minute for this latest COPYROWS macro to run through my 30,000 rows of data and pull out 1700 records. Hurray! You are awesome. You saved me a a day of cut and paste.

-Mike-
 
Upvote 0
Hi,

Ur code works fine.but i need some wht clear ..For ex i have list of names in a table(table array).lookup value of the word ,length is 7.it will search in table array and find the closest match.whether the closest match word of the lenght is 6 or 7 or 8 or 9.it would print the result.But i need a exact lenghtwise match..

table array lookup value result

George Harrison george harros ?
george harris

our lookup value is george harros.instead of harris, i hve mispelled harros.if any chance to get correct answer by using exact length match?
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,334
Members
452,636
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top