Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,546
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
Hi Amy,

The file you sent did not contain the FuzzyVLookup code in a module, hence the #NAME? error.
 
Upvote 0

Excel Facts

Can you AutoAverage in Excel?
There is a drop-down next to the AutoSum symbol. Open the drop-down to choose AVERAGE, COUNT, MAX, or MIN
al_b_cnu - this is fantastic and has tremendously reduced much time sorting through data. many, many thanks.


would it be possible to weight values to certain keywords? for example, words like "the", "and", "of", etc... are of little value when trying to match the content of two strings.

so would it be possible to set up a number of key words like:
"analyst"
"manager"
"recommends"
"strategic"
and if the original string contains these keywords, then increase the match score if the comparator strings also include these keywords? this would be tremendously valuable as I search through 000s of rows of data trying to find similar content.
 
Upvote 0
al_b_cnu - this is fantastic and has tremendously reduced much time sorting through data. many, many thanks.


would it be possible to weight values to certain keywords? for example, words like "the", "and", "of", etc... are of little value when trying to match the content of two strings.

so would it be possible to set up a number of key words like:
"analyst"
"manager"
"recommends"
"strategic"
and if the original string contains these keywords, then increase the match score if the comparator strings also include these keywords? this would be tremendously valuable as I search through 000s of rows of data trying to find similar content.


Hi,

That would be difficult to implement as part of FuzzyVLookup, but as a seperate UDF, it should be fairly straightforward.
I'm a bit busy right now, but watch this space!
 
Upvote 0
Hi,

ok, try KeyLookup UDF:
Code:
Option Explicit

Function KeyLookup(ByVal LookupString As String, _
                    ByVal LookupTable As Range, _
                    ByVal KeyTable As Range, _
                    ByVal Index As Integer) As Variant
Dim baLookupInd() As Boolean, baTableInd() As Boolean
Dim iPtr As Integer, iCurScore As Integer, iTablePtr As Integer, iLookupPtr As Integer
Dim iBestScore As Integer, iCount As Integer
Dim oKeyDict As Object
Dim rCur As Range
Dim sCur As String, sCurKey As String
Dim saLookupString() As String, saTableEntry() As String
Dim vBestScoreValue As Variant

If Trim$(LookupString) = "" Then
    KeyLookup = CVErr(xlErrNA)
    Exit Function
End If

'-- Populate dictionary with key words --
Set oKeyDict = Nothing
Set oKeyDict = CreateObject("Scripting.Dictionary")
For Each rCur In Intersect(KeyTable, Sheets(KeyTable.Parent.Name).UsedRange)
    sCur = CStr(rCur.Value)
    sCurKey = LCase$(Replace(sCur, " ", ""))
    If sCurKey <> "" Then
        On Error Resume Next
        oKeyDict.Add key:=sCurKey, Item:=sCur
        On Error GoTo 0
    End If
Next rCur

saLookupString = Split(LCase$(WorksheetFunction.Trim(LookupString)), " ")
iCount = 0
For iPtr = 0 To UBound(saLookupString)
    If oKeyDict.exists(saLookupString(iPtr)) Then
        iCount = iCount + 1
    Else
        saLookupString(iPtr) = ""
    End If
Next iPtr
If iCount = 0 Then
    oKeyDict.RemoveAll
    Set oKeyDict = Nothing
    KeyLookup = CVErr(xlErrNA)
    Exit Function
End If

For Each rCur In Intersect(LookupTable.Resize(, 1), Sheets(LookupTable.Parent.Name).UsedRange)
    saTableEntry = Split(" " & LCase$(WorksheetFunction.Trim(CStr(rCur.Value))), " ")
    ReDim baTableInd(0 To UBound(saTableEntry))
    ReDim baLookupInd(0 To UBound(saLookupString))
    iCurScore = 0
    For iTablePtr = 0 To UBound(saTableEntry)
        sCurKey = saTableEntry(iTablePtr)
        If sCurKey <> "" Then
            If oKeyDict.exists(sCurKey) Then
                For iLookupPtr = 0 To UBound(saLookupString)
                    If baLookupInd(iLookupPtr) = False _
                    And saLookupString(iLookupPtr) = sCurKey Then
                        iCurScore = iCurScore + 1
                        baLookupInd(iLookupPtr) = True
                        Exit For
                    End If
                Next iLookupPtr
            End If
        End If
    Next iTablePtr
    If iCurScore > iBestScore Then
        iBestScore = iCurScore
        If Index = 0 Then
            vBestScoreValue = rCur.Row
        Else
            vBestScoreValue = rCur.Offset(Index - 1).Value
        End If
    End If
Next rCur

oKeyDict.RemoveAll
Set oKeyDict = Nothing

If iBestScore > 0 Then
    KeyLookup = vBestScoreValue
Else
    KeyLookup = CVErr(xlErrNA)
End If

End Function

parameters are:
LookupString = Text to be looked up (String)
LookupTable = Lookup Table to be searched (Range)
KeyTable = List of Keywords (Range)
Index = 0 if row number to be returned, <>0 to return value in column relative to lookup table entry
 
Upvote 0
Hi,

ok, try KeyLookup UDF:
Code:
Option Explicit

Function KeyLookup(ByVal LookupString As String, _
                    ByVal LookupTable As Range, _
                    ByVal KeyTable As Range, _
                    ByVal Index As Integer) As Variant
Dim baLookupInd() As Boolean, baTableInd() As Boolean
Dim iPtr As Integer, iCurScore As Integer, iTablePtr As Integer, iLookupPtr As Integer
Dim iBestScore As Integer, iCount As Integer
Dim oKeyDict As Object
Dim rCur As Range
Dim sCur As String, sCurKey As String
Dim saLookupString() As String, saTableEntry() As String
Dim vBestScoreValue As Variant

If Trim$(LookupString) = "" Then
    KeyLookup = CVErr(xlErrNA)
    Exit Function
End If

'-- Populate dictionary with key words --
Set oKeyDict = Nothing
Set oKeyDict = CreateObject("Scripting.Dictionary")
For Each rCur In Intersect(KeyTable, Sheets(KeyTable.Parent.Name).UsedRange)
    sCur = CStr(rCur.Value)
    sCurKey = LCase$(Replace(sCur, " ", ""))
    If sCurKey <> "" Then
        On Error Resume Next
        oKeyDict.Add key:=sCurKey, Item:=sCur
        On Error GoTo 0
    End If
Next rCur

saLookupString = Split(LCase$(WorksheetFunction.Trim(LookupString)), " ")
iCount = 0
For iPtr = 0 To UBound(saLookupString)
    If oKeyDict.exists(saLookupString(iPtr)) Then
        iCount = iCount + 1
    Else
        saLookupString(iPtr) = ""
    End If
Next iPtr
If iCount = 0 Then
    oKeyDict.RemoveAll
    Set oKeyDict = Nothing
    KeyLookup = CVErr(xlErrNA)
    Exit Function
End If

For Each rCur In Intersect(LookupTable.Resize(, 1), Sheets(LookupTable.Parent.Name).UsedRange)
    saTableEntry = Split(" " & LCase$(WorksheetFunction.Trim(CStr(rCur.Value))), " ")
    ReDim baTableInd(0 To UBound(saTableEntry))
    ReDim baLookupInd(0 To UBound(saLookupString))
    iCurScore = 0
    For iTablePtr = 0 To UBound(saTableEntry)
        sCurKey = saTableEntry(iTablePtr)
        If sCurKey <> "" Then
            If oKeyDict.exists(sCurKey) Then
                For iLookupPtr = 0 To UBound(saLookupString)
                    If baLookupInd(iLookupPtr) = False _
                    And saLookupString(iLookupPtr) = sCurKey Then
                        iCurScore = iCurScore + 1
                        baLookupInd(iLookupPtr) = True
                        Exit For
                    End If
                Next iLookupPtr
            End If
        End If
    Next iTablePtr
    If iCurScore > iBestScore Then
        iBestScore = iCurScore
        If Index = 0 Then
            vBestScoreValue = rCur.Row
        Else
            vBestScoreValue = rCur.Offset(Index - 1).Value
        End If
    End If
Next rCur

oKeyDict.RemoveAll
Set oKeyDict = Nothing

If iBestScore > 0 Then
    KeyLookup = vBestScoreValue
Else
    KeyLookup = CVErr(xlErrNA)
End If

End Function
parameters are:
LookupString = Text to be looked up (String)
LookupTable = Lookup Table to be searched (Range)
KeyTable = List of Keywords (Range)
Index = 0 if row number to be returned, <>0 to return value in column relative to lookup table entry

thanks for that. I spent a few days tinkering with it. Am I correct that this would not sum the values of multiple key words? That is, if a string compared mutliple values from the keyword table, it would only return one value?
 
Upvote 0
HI,

Not sure what you mean, here's an example:
<b>Excel 2003</b><table cellpadding="2.5px" rules="all" style=";background-color: #FFFFFF;border: 1px solid;border-collapse: collapse; border-color: #A6AAB6"><colgroup><col width="25px" style="background-color: #E0E0F0" /><col /><col /><col /><col /><col /><col /></colgroup><thead><tr style=" background-color: #E0E0F0;text-align: center;color: #161120"><th></th><th>A</th><th>B</th><th>C</th><th>D</th><th>E</th><th>F</th></tr></thead><tbody><tr ><td style="color: #161120;text-align: center;">1</td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="font-weight: bold;text-align: center;;">Result</td><td style="font-weight: bold;text-align: center;;">Result</td></tr><tr ><td style="color: #161120;text-align: center;">2</td><td style="font-weight: bold;;">Lookup Table</td><td style="font-weight: bold;;">Keywords</td><td style="text-align: right;;"></td><td style="font-weight: bold;;">Search String</td><td style="font-weight: bold;text-align: center;;">0</td><td style="font-weight: bold;text-align: center;;">1</td></tr><tr ><td style="color: #161120;text-align: center;">3</td><td style=";">analyst Minion</td><td style=";">analyst</td><td style="text-align: right;;"></td><td style=";">Strategic or analyst and  Manager</td><td style="text-align: center;;">6</td><td style="text-align: center;;">Strategic  and Analyst</td></tr><tr ><td style="color: #161120;text-align: center;">4</td><td style=";">Manager Recommends</td><td style=";">manager</td><td style="text-align: right;;"></td><td style=";">manager xyz recommends</td><td style="text-align: center;;">4</td><td style="text-align: center;;">Manager Recommends</td></tr><tr ><td style="color: #161120;text-align: center;">5</td><td style=";">Strategic  Minion</td><td style=";">recommends</td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: center;;"></td><td style="text-align: center;;"></td></tr><tr ><td style="color: #161120;text-align: center;">6</td><td style=";">Strategic  and Analyst</td><td style=";">strategic</td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: center;;"></td><td style="text-align: center;;"></td></tr><tr ><td style="color: #161120;text-align: center;">7</td><td style=";">Strategic or and Manager</td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: center;;"></td><td style="text-align: center;;"></td></tr><tr ><td style="color: #161120;text-align: center;">8</td><td style=";">Recommends if Strategic</td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: center;;"></td><td style="text-align: center;;"></td></tr></tbody></table><p style="width:3.6em;font-weight:bold;margin:0;padding:0.2em 0.6em 0.2em 0.5em;border: 1px solid #A6AAB6;border-top:none;text-align: center;background-color: #E0E0F0;color: #161120">Sheet1</p><br /><br /><table width="85%" cellpadding="2.5px" rules="all" style=";border: 2px solid black;border-collapse:collapse;padding: 0.4em;background-color: #FFFFFF" ><tr><td style="padding:6px" ><b>Worksheet Formulas</b><table cellpadding="2.5px" width="100%" rules="all" style="border: 1px solid;text-align:center;background-color: #FFFFFF;border-collapse: collapse; border-color: #A6AAB6"><thead><tr style=" background-color: #E0E0F0;color: #161120"><th width="10px">Cell</th><th style="text-align:left;padding-left:5px;">Formula</th></tr></thead><tbody><tr><th width="10px" style=" background-color: #E0E0F0;color: #161120">E3</th><td style="text-align:left">=KeyLookup(<font color="Blue">$D3,$A:$A,$B:$B,E$2</font>)</td></tr><tr><th width="10px" style=" background-color: #E0E0F0;color: #161120">F3</th><td style="text-align:left">=KeyLookup(<font color="Blue">$D3,$A:$A,$B:$B,F$2</font>)</td></tr><tr><th width="10px" style=" background-color: #E0E0F0;color: #161120">E4</th><td style="text-align:left">=KeyLookup(<font color="Blue">$D4,$A:$A,$B:$B,E$2</font>)</td></tr><tr><th width="10px" style=" background-color: #E0E0F0;color: #161120">F4</th><td style="text-align:left">=KeyLookup(<font color="Blue">$D4,$A:$A,$B:$B,F$2</font>)</td></tr></tbody></table></td></tr></table><br />
 
Upvote 0
Thanks so much for this! Brilliant work.

In case anyone's interested, and it hasn't already been mentioned somewhere in this very long thread, FuzzyPercent() can also be used in Word VBA (Word 2010, here). Just change occurrences of Application.Trim to Trim.
 
Upvote 0
Hi, thanks for your input. Can I just point out that application.worksheetfunction.trim removes multiple internal spaces as well as leading and trailing spaces.
 
Upvote 0
Thanks for that. I should have checked to see if there was something magical about XL's Application.Trim! And sure enough, there is. Oops.

In that case, ignore my previous bad advice. To use FuzzyPercent() with Word, replace--

Code:
If Normalised = False Then
   String1 = LCase$(Application.Trim(String1))
   String2 = LCase$(Application.Trim(String2))
End If
--with--

Code:
If Normalised = False Then
   String1 = LCase$(Replace(String1, " ", ""))
   String2 = LCase$(Replace(String2, " ", ""))
End If
 
Upvote 0
And another tip for any Word VBA folks who stumble on this thread during a web search and are blissfully unaware of all the history, as I was. Follow advice in al_b_cnu's sig, and use the current version, which has evolved far beyond the one to which Bing led me. It makes a BIG difference!

Thanks, again, al_b_cnu; this is a real find. It looks like it was really well crafted, too. Its functionality ought to be built in to VBA for those trying to write code that does what the user intends instead of what they type.

You're a VBA god<G>.:bow:
 
Upvote 0

Forum statistics

Threads
1,224,862
Messages
6,181,463
Members
453,043
Latest member
Sronquest

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top