Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,546
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
Hi duckuser,
I'm constantly amazed that this piece of code is still useful after over 15 years!
Instead of the Filter function, try the parameters "GroupColOffset" and "GroupValue", as described in the comments in the code, the extract of which is here:
'** The following parameters allow matching by group, and only those values
'** which are in the group specified will be considered for matching.
'** GroupColOffset: This parameter specifies the offset column which contains the group values.
'** To trigger group matching, this must be a non-zero integer
'** GroupValue : This parameter specifies the Group to be considered for matching

Hope this helps

Alan

Thank you, I think I had another version of the function. And thank you for the function, it's amazing - well deserved of 15 years of usage!

Maybe I'm a little unclear or maybe it's not working right for me. If I try to use a GroupColOffset of "2" - which, as far as I understand, should be checking the BreweryID if my array is D:F - and a value of E9, I get a "1" as a returned value. In this particular beer, if I just leave out all the extra parameters and groupcoloffset, I get the correct Key value back (in this case, it's 1125), so it's doing the search correctly. Just not checking by group.

"Beers" sheet (which I'm trying to do the lookup against)
1637457465443.png


Sheet with the junk data I'm trying to correct:
1637457767710.png


Excel Formula:
=fuzzyvlookup(H9,Beers!D:F,3,,,,,,2,E9)
 
Upvote 0

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
Thank you, I think I had another version of the function. And thank you for the function, it's amazing - well deserved of 15 years of usage!

Maybe I'm a little unclear or maybe it's not working right for me. If I try to use a GroupColOffset of "2" - which, as far as I understand, should be checking the BreweryID if my array is D:F - and a value of E9, I get a "1" as a returned value. In this particular beer, if I just leave out all the extra parameters and groupcoloffset, I get the correct Key value back (in this case, it's 1125), so it's doing the search correctly. Just not checking by group.

"Beers" sheet (which I'm trying to do the lookup against)
View attachment 51739

Sheet with the junk data I'm trying to correct:
View attachment 51740

Excel Formula:
=fuzzyvlookup(H9,Beers!D:F,3,,,,,,2,E9)
Hi Duckuser,
The GroupOffset parameter is an offset NOT a column, so column D would be offset 0, column E would be offset 1.
Sorry for the confusion,
Best wishers
Alan
 
Upvote 0
Alan, you rock! This is a HUGE help. Thank you! Thank you! Thank you!
Hello,

I am a novice user of excel. I am interested in knowing if there can be 2 fuzzy criteria?
Example:
Lookup Value 1 = Co Name
Lookup Value 2 = Invoice number

I am looking for a fuzzy match on Co Name (since there are misspellings) and the invoice number. I need 2 criteria since there can be multiple companies with the same invoice number.

thank you very much
 
Upvote 0
Hello,

I am a novice user of excel. I am interested in knowing if there can be 2 fuzzy criteria?
Example:
Lookup Value 1 = Co Name
Lookup Value 2 = Invoice number

I am looking for a fuzzy match on Co Name (since there are misspellings) and the invoice number. I need 2 criteria since there can be multiple companies with the same invoice number.

thank you very much
Hi pdvsa,
You may want to try the "AdditionalCols" parameter.
Any problems let me know.
Best wishes
Alan
 
Upvote 0
Hi Alan, happy to see you replied. I am not a programmer and I have not understood a lot on this post. Could you kindly give me an example of how to use that column parameter mentioned? I don’t know where the additional columns parameter is found. Thank you very much.
 
Upvote 0
Hi pdvsa,
Book1
ABCDEFG
1Name LookupInv LookupLookup ResultCo. NameInv NumLookup Value
2AlphzEe100$G$6AlphaAAA100$G$2
3AlphaBBB100$G$3
4AlphaCCC100$G$4
5AlphaDDD100$G$5
6AlphaEEE100$G$6
7BetaAAA110$G$7
8BetaBBB110$G$8
9BetaCCC110$G$9
10BetaDDD110$G$10
11BetaEEE110$G$11
12BetaAAA120$G$12
13BetaBBB120$G$13
14GammaCCC120$G$14
15GammaDDD120$G$15
16GammaEEE120$G$16
17GammaAAA130$G$17
Sheet1
Cell Formulas
RangeFormula
C2C2=fuzzyvlookup(A2&B2,E:G,3,,,,1)
G2:G17G2=ADDRESS(ROW(),COLUMN())
F7:F17F7=LEFT(F2,3)&H7

The lookup values are in cells A2 and B2, note that the FuzzyVLookup call in C2 concatenates these lookup values.
The formulae in columns F and G are just a quick way of generating some data for the example.

The "AdditionalCols" value is set to 1 which forces the code to concatenate the two adjacent columns (starting at the first column) before comparison.

Hope that helps

Alan
 
Upvote 0
Hi pdvsa,
Book1
ABCDEFG
1Name LookupInv LookupLookup ResultCo. NameInv NumLookup Value
2AlphzEe100$G$6AlphaAAA100$G$2
3AlphaBBB100$G$3
4AlphaCCC100$G$4
5AlphaDDD100$G$5
6AlphaEEE100$G$6
7BetaAAA110$G$7
8BetaBBB110$G$8
9BetaCCC110$G$9
10BetaDDD110$G$10
11BetaEEE110$G$11
12BetaAAA120$G$12
13BetaBBB120$G$13
14GammaCCC120$G$14
15GammaDDD120$G$15
16GammaEEE120$G$16
17GammaAAA130$G$17
Sheet1
Cell Formulas
RangeFormula
C2C2=fuzzyvlookup(A2&B2,E:G,3,,,,1)
G2:G17G2=ADDRESS(ROW(),COLUMN())
F7:F17F7=LEFT(F2,3)&H7

The lookup values are in cells A2 and B2, note that the FuzzyVLookup call in C2 concatenates these lookup values.
The formulae in columns F and G are just a quick way of generating some data for the example.

The "AdditionalCols" value is set to 1 which forces the code to concatenate the two adjacent columns (starting at the first column) before comparison.

Hope that helps

Alan
HI Alan,
I think I am understanding now. Very helpful indeed.
I know I initially I said I need 2 criteria and I see the above does clearly show there are 2 criteria (co name and invoice number)
I now think that I do not need a fuzzy on invoice number since the invoice number will not be misspelled like company name would be and it could match on an invoice number that is close for same company and I need to avoid that possibility for what I am doing.

Do you happen to know how to adjust for this? So basically 1 fuzzy criteria (Co Name) and inv no is not fuzzy but is still a criteria...just not fuzzy. I hope that is not confusing. I am working on a separate project and I have many cases of misspelled companies but I have to also include a match on inv no. Power query does fuzzy too but I am needing a solution that in the form of a formula that I can drag down the list whereas powerquery creates another sheet that makes it difficult to compare to the original list in the way I am doing it. thank you Alan. Been very helpful.
 
Upvote 0
Hi pdvsa,
Silly question but are the invoice numbers unique, could you just match on that?
Assuming not, we would require some bespoke code, which would incidentally make the process faster.
Assuming that invoice numbers are not unique (?) the code will need to only get the best match company name for entries which have a 100% match on invoice number.

What is the layout of your s/sheet? Can you post a sample with dummy data?

Alan
 
Upvote 0

Forum statistics

Threads
1,223,635
Messages
6,173,481
Members
452,516
Latest member
archcalx

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top