Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,546
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
The code is not Normal Visual Basic!

It is Excel Visual Basic for Applications (Excel VBA)

It must be stored in a Standard code module, like: Module1, as is!

Alt + F11
Toolbar: Insert - Module
Paste code in the Editor window for that module!

Click the top-most-Right Close "X" to return to the sheet.

Hi All,

I copy the same code and paster it in personal Macro module. and after that copy the same example above and appy the formula , but their is Name error showing it it. Can anybody assist , wat I am doin wrong .

Regards,
Rajender
 
Upvote 0

Excel Facts

Does the VLOOKUP table have to be sorted?
No! when you are using an exact match, the VLOOKUP table can be in any order. Best-selling items at the top is actually the best.
Hi Alan,

Thanks for sharing this code with us. I am trying to still figure it out, but am having trouble getting accurate results with algorythm #4. I am using the code from the post in your signature since there are several versions throughout this thread. Is anything newer that I should be using instead?

Excel Workbook
ABCDEFGHIJ
1FuzzyVLookup - AlgorithmFuzzyPercent - Algorithm
212341234
3Paul McCartneyGeorge BushGeorgeGeorgeGeorgePaul McCartney100.00%100.00%100.00%100.00%
4George HarrisGorge hrsGeorge HarrisGeorgeGeorge HarrisPaul McCartney100.00%57.14%65.63%100.00%
5Ringo StarrStarr,RingoRingo StarrRingo StarrRingo StarrPaul McCartney45.45%62.07%57.50%100.00%
6John LennonBush, GeorgeBruce LeeGeorgeGeorgePaul McCartney33.33%100.00%80.00%100.00%
7Tony Blair
8George Clooney
9Chuck Norris
10Claude Van Damme
11Bruce Lee
12George
Sheet2
Excel 2010
Cell Formulas
RangeFormula
C3=FuzzyVLookup($B3,$A:$A,1,,,C$2)
C4=FuzzyVLookup($B4,$A:$A,1,,,C$2)
C5=FuzzyVLookup($B5,$A:$A,1,,,C$2)
C6=FuzzyVLookup($B6,$A:$A,1,,,C$2)
D3=FuzzyVLookup($B3,$A:$A,1,,,D$2)
D4=FuzzyVLookup($B4,$A:$A,1,,,D$2)
D5=FuzzyVLookup($B5,$A:$A,1,,,D$2)
D6=FuzzyVLookup($B6,$A:$A,1,,,D$2)
E3=FuzzyVLookup($B3,$A:$A,1,,,E$2)
E4=FuzzyVLookup($B4,$A:$A,1,,,E$2)
E5=FuzzyVLookup($B5,$A:$A,1,,,E$2)
E6=FuzzyVLookup($B6,$A:$A,1,,,E$2)
F3=FuzzyVLookup($B3,$A:$A,1,,,F$2)
F4=FuzzyVLookup($B4,$A:$A,1,,,F$2)
F5=FuzzyVLookup($B5,$A:$A,1,,,F$2)
F6=FuzzyVLookup($B6,$A:$A,1,,,F$2)
G3=FuzzyPercent($B3,C3,G$2)
G4=FuzzyPercent($B4,C4,G$2)
G5=FuzzyPercent($B5,C5,G$2)
G6=FuzzyPercent($B6,C6,G$2)
H3=FuzzyPercent($B3,D3,H$2)
H4=FuzzyPercent($B4,D4,H$2)
H5=FuzzyPercent($B5,D5,H$2)
H6=FuzzyPercent($B6,D6,H$2)
I3=FuzzyPercent($B3,E3,I$2)
I4=FuzzyPercent($B4,E4,I$2)
I5=FuzzyPercent($B5,E5,I$2)
I6=FuzzyPercent($B6,E6,I$2)
J3=FuzzyPercent($B3,F3,J$2)
J4=FuzzyPercent($B4,F4,J$2)
J5=FuzzyPercent($B5,F5,J$2)
J6=FuzzyPercent($B6,F6,J$2)
 
Upvote 0
Hi Amas (that's Latin for "You Love" isn't it? I seem to recall amo, amas, amat - I love, you love, he, she or it loves)

But I digress.

Yes, the code in my signature is the latest.

Algorithm 4 is based on code written by Dan Ostrander a while back. Whilst the code may be sound, the performance wasn't, so I tended to ignore it when updating my envelope code, so it may well not work :eeek:

It very much depends on what you're matching (hence the algorithm options), but these days I tend to prefer algorthm 2.
 
Upvote 0
Hi Alan,

I am testing #2 vs. #3 for my purposes. I am checking a long list of paragraphs (each paragraph in a cell) to check for similarities. I don't think that typos is the biggest problem here, so probably #2 will be best. What do think?

Also is there any way to visualize the progress of calculations? For example, I want to check 1000 paragraphs, is there a way to add a progress bar?

AMAS

P.S. Not sure if my screen name has any meanings in other languages
 
Upvote 0
Hi Amas,

Algorithm #3 is, in fact, a compromise of algorithms #1 and #2.

Have you also considered Levenshtein algorithm?

There's an example on page 22 (post #215), where I've included the logic in a function "GetLevenshteinPercentMatch"

For some indication of progress you would need a macro which would loop thru the rows. For that, I would need to know the layout of your worksheet - i.e. where the inputs are (i.e. what columns) , what row they start in, where you want the output to go and in what format, etc.

I normally use the cheap & cheerful status bar (bottom left, which usually says "Ready" to show something like "Processing row x of y - z% complete"
 
Upvote 0
Hi

This is a really helpful UDF. It works perfectly when placed in a module in the current workbook but is there a way to get it to work in PERSONAL.XLS? I've tried pasting it into a module in that but just get the NAME? error.

Thanks
Rob
 
Upvote 0
Hi Rob, Welcome to the board!

I always envisaged it as an addin:

Open a new w/book, delete all but 1 sheet and paste the code into a module.
Save it as FuzzyMatch, but as an Excel add-in (*.xla or *.xlam)

You can then selectively include the addin when loading your 'user' workbook.
 
Upvote 0
Hi All

Sorry, I am new to the board but I stumbled across fuzzyvlookup while searching online. Firstly, it is brilliant so thanks a lot for the work.

My problem is slightly complicated though.

I have two spreadsheets. The first one has a lot of customer leads data including the customer name, the date they were contacted and the order value expected from the customer. Then I have a second spreadsheet where I have customer name, date order received and order value. (all data is in its own column)

What I would like to do is to append the status of each lead (whether it transalted into an order or not) to the first spreadsheet.

The issue is two fold though. The customer names in the first spreadsheet don't exactly match the second spreadsheet, but that can be resolved by fuzzyvlookup. However, both the first and second spreadhsheet can have multiple instances of the same customer (for different orders). So what I would like to do is only show a match if the customer name matches (fuzzy match), order value matches (absolute match) and the date of the order is greater than or equal to the date when customer was contacted.

I have done this sort of thing in the past with index and match, where I continued to shift the the lookup range based on whether the all conditions were met. Is there anyway I can do the same here, or maybe there is a better way of doing this which I am missing?
 
Upvote 0
Hi, welcome to the board!

Be aware that the code as-is will not work across workbooks.

If you have source and search table in the same workbook, maybe you can use the Rank parameter, e.g. setting Rank to "2" will return the 2nd best match, "3" will return the 3rd best match etc.

Follow the link in my signature for the latest version of the code.
 
Upvote 0
Cheers Alan

However, using index and match, I was able to do this because using Countif, I knew if there were mutiple matches and so I could set the range accordingly. How would that work here? i.e. how do I know which customers have multiple instances? Sorry, am not a big VBE expert, though I can read and tweak the code slightly, so maybe I am getting a bit confused, but don't I need to know exactly how many instances of a particular customer are there?
 
Upvote 0

Forum statistics

Threads
1,223,275
Messages
6,171,123
Members
452,381
Latest member
Nova88

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top