Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,546
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
Hi, Not sure what format you want - can you reformat the example sheet I supplied to show what you actually want?
 
Upvote 0

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).
Hi,

This version:
Code:
Option Explicit
Const miResultsStartColumn As Integer = 1   'Output results starting in column A

Const msInputSheet As String = "Sheet1"
Const msOutputSheet As String = "Sheet2"

Const msngThresholdValue As Single = 0.5    'Threshold value of 50%

Const msLookupTableColumn As String = "A"   'Column A for lookup table
Const msLookupValueColumn As String = "B"   'Column B for lookup values

Const mlDataStartRow As Long = 3            'start of data in input & output sheet

Sub GetAddresses()
Dim iResultsColumn As Integer

Dim lLookupValuesRow As Long
Dim lLookupTableRow As Long
Dim lResultsRow As Long

Dim sCurrentLookupValue As String
Dim sCurrentTableValue As String

Dim sngCurrentPercent As Single

Dim vaLookupTable As Variant
Dim vaLookupValues As Variant
Dim vaResults As Variant

Dim wsInput As Worksheet
Dim wsOutput As Worksheet

Set wsInput = Sheets(msInputSheet)
Set wsOutput = Sheets(msOutputSheet)

With wsInput
    vaLookupTable = Intersect(.UsedRange, .Columns(msLookupTableColumn))
    vaLookupValues = Intersect(.UsedRange, .Columns(msLookupValueColumn))
End With

wsOutput.UsedRange.ClearContents
iResultsColumn = miResultsStartColumn
lResultsRow = 0
ReDim vaResults(1 To 1, 1 To 1)

For lLookupValuesRow = mlDataStartRow To UBound(vaLookupValues, 1)
    sCurrentLookupValue = WorksheetFunction.Trim(LCase$(CStr(vaLookupValues(lLookupValuesRow, 1))))
    For lLookupTableRow = mlDataStartRow To UBound(vaLookupTable, 1)
        sCurrentTableValue = WorksheetFunction.Trim(LCase$(CStr(vaLookupTable(lLookupTableRow, 1))))
        sngCurrentPercent = FuzzyPercent(String1:=sCurrentLookupValue, _
                                         String2:=sCurrentTableValue, _
                                         Algorithm:=2, _
                                         Normalised:=True)
        If sngCurrentPercent >= msngThresholdValue Then
            lResultsRow = lResultsRow + 1
            ReDim Preserve vaResults(1 To 1, 1 To lResultsRow)
            vaResults(1, lResultsRow) = vaLookupTable(lLookupTableRow, 1)
        End If
    Next lLookupTableRow
Next lLookupValuesRow

With wsOutput
    .Range(.Cells(mlDataStartRow, iResultsColumn).Address).Resize(UBound(vaResults, 2), 1).Value = WorksheetFunction.Transpose(vaResults)
End With

End Sub
Gives these results:
<b>Excel 2003</b><table cellpadding="2.5px" rules="all" style=";background-color: #FFFFFF;border: 1px solid;border-collapse: collapse; border-color: #A6AAB6"><colgroup><col width="25px" style="background-color: #E0E0F0" /><col /></colgroup><thead><tr style=" background-color: #E0E0F0;text-align: center;color: #161120"><th></th><th>A</th></tr></thead><tbody><tr ><td style="color: #161120;text-align: center;">1</td><td style="text-align: right;;"></td></tr><tr ><td style="color: #161120;text-align: center;">2</td><td style="text-align: right;;"></td></tr><tr ><td style="color: #161120;text-align: center;">3</td><td style=";">5 grapefruit Grange</td></tr><tr ><td style="color: #161120;text-align: center;">4</td><td style=";">5 grapfruit gotto</td></tr><tr ><td style="color: #161120;text-align: center;">5</td><td style=";">6 grapefruit grotto</td></tr><tr ><td style="color: #161120;text-align: center;">6</td><td style=";">5 grape grotto</td></tr><tr ><td style="color: #161120;text-align: center;">7</td><td style=";">4 banana road</td></tr><tr ><td style="color: #161120;text-align: center;">8</td><td style=";">2 Orange Grove</td></tr><tr ><td style="color: #161120;text-align: center;">9</td><td style=";">1 Apple Avenue</td></tr><tr ><td style="color: #161120;text-align: center;">10</td><td style="text-align: right;;"></td></tr></tbody></table><p style="width:3.6em;font-weight:bold;margin:0;padding:0.2em 0.6em 0.2em 0.5em;border: 1px solid #A6AAB6;border-top:none;text-align: center;background-color: #E0E0F0;color: #161120">Sheet2</p><br /><br />
 
Upvote 0
Hio,

Yep, should work fine, if you use this version, you'll get a progress report.

Code:
Option Explicit
Const miResultsStartColumn As Integer = 1   'Output results starting in column A

Const msInputSheet As String = "Sheet1"
Const msOutputSheet As String = "Sheet2"

Const msngThresholdValue As Single = 0.5    'Threshold value of 50%

Const msLookupTableColumn As String = "A"   'Column A for lookup table
Const msLookupValueColumn As String = "B"   'Column B for lookup values

Const mlDataStartRow As Long = 3            'start of data in input & output sheet

Sub GetAddresses()
Dim iResultsColumn As Integer

Dim lLookupValuesRow As Long
Dim lLookupTableRow As Long
Dim lResultsRow As Long
Dim lNextProgressReport As Long
Dim lProgressReportTrigger As Long

Dim sCurrentLookupValue As String
Dim sCurrentTableValue As String

Dim sngCurrentPercent As Single

Dim vaLookupTable As Variant
Dim vaLookupValues As Variant
Dim vaResults As Variant

Dim wsInput As Worksheet
Dim wsOutput As Worksheet

Set wsInput = Sheets(msInputSheet)
Set wsOutput = Sheets(msOutputSheet)

With wsInput
    vaLookupTable = Intersect(.UsedRange, .Columns(msLookupTableColumn))
    vaLookupValues = Intersect(.UsedRange, .Columns(msLookupValueColumn))
End With

wsOutput.UsedRange.ClearContents
iResultsColumn = miResultsStartColumn
lResultsRow = 0
ReDim vaResults(1 To 1, 1 To 1)

lProgressReportTrigger = Int(UBound(vaLookupValues, 1) / 100)
lNextProgressReport = 0

For lLookupValuesRow = mlDataStartRow To UBound(vaLookupValues, 1)
    If lLookupValuesRow >= lNextProgressReport Then
        Application.StatusBar = "Processing row " & lLookupValuesRow & " of " & UBound(vaLookupValues, 1)
        lNextProgressReport = lLookupValuesRow + lProgressReportTrigger
    End If
    
    sCurrentLookupValue = WorksheetFunction.Trim(LCase$(CStr(vaLookupValues(lLookupValuesRow, 1))))
    For lLookupTableRow = mlDataStartRow To UBound(vaLookupTable, 1)
        sCurrentTableValue = WorksheetFunction.Trim(LCase$(CStr(vaLookupTable(lLookupTableRow, 1))))
        sngCurrentPercent = FuzzyPercent(String1:=sCurrentLookupValue, _
                                         String2:=sCurrentTableValue, _
                                         Algorithm:=2, _
                                         Normalised:=True)
        If sngCurrentPercent >= msngThresholdValue Then
            lResultsRow = lResultsRow + 1
            ReDim Preserve vaResults(1 To 1, 1 To lResultsRow)
            vaResults(1, lResultsRow) = vaLookupTable(lLookupTableRow, 1)
        End If
    Next lLookupTableRow
Next lLookupValuesRow

With wsOutput
    .Range(.Cells(mlDataStartRow, iResultsColumn).Address).Resize(UBound(vaResults, 2), 1).Value = WorksheetFunction.Transpose(vaResults)
End With

Application.StatusBar = False

End Sub
 
Upvote 0
I get the following error when I run:
Compile Error:
Variable not defined.

msInputSheet is highlighted in blue...

Sub GetAddresses() is also highlighted, but in yellow with a yellow arrow to the left next to the line of code.

I'm in Excel 2007.

:: News Flash ::
Placed the code above in a module SEPARATE from the FuzzyVlookup and it appears to be processing.
 
Last edited:
Upvote 0
Ok, so it appears to be processing, but I don't see a progress bar. It just says processing row 3 of 16000 in the bottom left corner.

Just leave it? :)
 
Upvote 0
Hi,

That suggests that your input & output sheets are not named 'Sheet1' and 'Sheet2' respectively.
When trying, it seemed to be complaining that the FuzzyPercent was in a different module, so here's the amalgamated version, you'll have to get rid of the module containiong FuzzyVLookup code, or you'll hit all sorts of problems.

Try the attached
Code:
Option Explicit
Const miResultsStartColumn As Integer = 1   'Output results starting in column A

Const msInputSheet As String = "Sheet1"
Const msOutputSheet As String = "Sheet2"

Const msngThresholdValue As Single = 0.5    'Threshold value of 50%

Const msLookupTableColumn As String = "A"   'Column A for lookup table
Const msLookupValueColumn As String = "B"   'Column B for lookup values

Const mlDataStartRow As Long = 3            'start of data in input & output sheet

Type RankInfo
    Offset As Long
    Percentage As Single
End Type

Dim mudRankData() As RankInfo
Dim mlBestMatchPtr As Long

Dim TopMatch         As Long
Dim strCompare       As String

Function FuzzyPercent(ByVal String1 As String, _
                      ByVal String2 As String, _
                      Optional Algorithm As Integer = 3, _
                      Optional Normalised As Boolean = False) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim intLen1 As Integer, intLen2 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim sngScore As Single
Dim strWork As String

'-------------------------------------------------------
'-- If strings havent been normalised, normalise them --
'-------------------------------------------------------
If Normalised = False Then
    String1 = LCase$(Application.Trim(String1))
    String2 = LCase$(Application.Trim(String2))
End If

'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If String1 = String2 Then
    FuzzyPercent = 1
    Exit Function
End If

intLen1 = Len(String1)
intLen2 = Len(String2)

If intLen1 = 0 Or intLen2 = 0 Then
    FuzzyPercent = 0
    Exit Function
End If

'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
    FuzzyPercent = 0
    Exit Function
End If

intTotScore = 0                   'initialise total possible score
intScore = 0                      'initialise current score

'--------------------------------------------------------
'-- If Algorithm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (Algorithm And 1) <> 0 Then
    If intLen1 < intLen2 Then
        FuzzyAlg1 String1, String2, intScore, intTotScore
    Else
        FuzzyAlg1 String2, String1, intScore, intTotScore
    End If
End If

'-----------------------------------------------------------
'-- If Algorithm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (Algorithm And 2) <> 0 Then
    If intLen1 < intLen2 Then
        FuzzyAlg2 String1, String2, intScore, intTotScore
    Else
        FuzzyAlg2 String2, String1, intScore, intTotScore
    End If
End If

'------------------------------------------------------
'-- If Algorithm = 4,5,6,7, use Dan Ostander's code. --
'------------------------------------------------------
'If (Algorithm And 4) <> 0 Then
'    If intLen1 < intLen2 Then
'        sngScore = FuzzyAlg4(String1, String1)
'    Else
'        sngScore = FuzzyAlg4(String2, String1)
'    End If
'    intScore = intScore + (sngScore * 100)
'    intTotScore = intTotScore + 100
'End If

FuzzyPercent = intScore / intTotScore

End Function

Private Sub FuzzyAlg1(ByVal String1 As String, _
                      ByVal String2 As String, _
                      ByRef Score As Integer, _
                      ByRef TotScore As Integer)
Dim intLen1 As Integer, intPos As Integer, intPtr As Integer, intStartPos As Integer

intLen1 = Len(String1)
TotScore = TotScore + intLen1              'update total possible score
intPos = 0
For intPtr = 1 To intLen1
    intStartPos = intPos + 1
    intPos = InStr(intStartPos, String2, Mid$(String1, intPtr, 1))
    If intPos > 0 Then
        If intPos > intStartPos + 3 Then     'No match if char is > 3 bytes away
            intPos = intStartPos
        Else
            Score = Score + 1          'Update current score
        End If
    Else
        intPos = intStartPos
    End If
Next intPtr
End Sub
Private Sub FuzzyAlg2(ByVal String1 As String, _
                        ByVal String2 As String, _
                        ByRef Score As Integer, _
                        ByRef TotScore As Integer)
Dim intCurLen As Integer, intLen1 As Integer, intTo As Integer, intPtr As Integer, intPos As Integer
Dim strWork As String

intLen1 = Len(String1)
For intCurLen = 1 To intLen1
    strWork = String2                          'Get a copy of String2
    intTo = intLen1 - intCurLen + 1
    TotScore = TotScore + Int(intLen1 / intCurLen)  'Update total possible score
    For intPtr = 1 To intTo Step intCurLen
        intPos = InStr(strWork, Mid$(String1, intPtr, intCurLen))
        If intPos > 0 Then
            Mid$(strWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
            Score = Score + 1     'Update current score
        End If
    Next intPtr
Next intCurLen

End Sub
Private Function FuzzyAlg4(strIn1 As String, strIn2 As String) As Single

'Dim L1               As Integer
'Dim In1Mask(1 To 24) As Long     'strIn1 is 24 characters max
'Dim iCh              As Integer
'Dim N                As Long
'Dim strTry           As String
'Dim strTest          As String
'
'TopMatch = 0
'L1 = Len(strIn1)
'strTest = UCase(strIn1)
'strCompare = UCase(strIn2)
'For iCh = 1 To L1
'    In1Mask(iCh) = 2 ^ iCh
'Next iCh      'Loop thru all ordered combinations of characters in strIn1
'For N = 2 ^ (L1 + 1) - 1 To 1 Step -1
'    strTry = ""
'    For iCh = 1 To L1
'        If In1Mask(iCh) And N Then
'            strTry = strTry & Mid(strTest, iCh, 1)
'        End If
'    Next iCh
'    If Len(strTry) > TopMatch Then FuzzyAlg4Test strTry
'Next N
'FuzzyAlg4 = TopMatch / CSng(L1)
End Function
Sub GetAddresses()
Dim iResultsColumn As Integer

Dim lLookupValuesRow As Long
Dim lLookupTableRow As Long
Dim lResultsRow As Long
Dim lNextProgressReport As Long
Dim lProgressReportTrigger As Long

Dim sCurrentLookupValue As String
Dim sCurrentTableValue As String

Dim sngCurrentPercent As Single

Dim vaLookupTable As Variant
Dim vaLookupValues As Variant
Dim vaResults As Variant

Dim wsInput As Worksheet
Dim wsOutput As Worksheet

Set wsInput = Sheets(msInputSheet)
Set wsOutput = Sheets(msOutputSheet)

With wsInput
    vaLookupTable = Intersect(.UsedRange, .Columns(msLookupTableColumn))
    vaLookupValues = Intersect(.UsedRange, .Columns(msLookupValueColumn))
End With

wsOutput.UsedRange.ClearContents
iResultsColumn = miResultsStartColumn
lResultsRow = 0
ReDim vaResults(1 To 1, 1 To 1)

lProgressReportTrigger = Int(UBound(vaLookupValues, 1) / 100)
lNextProgressReport = 0

For lLookupValuesRow = mlDataStartRow To UBound(vaLookupValues, 1)
    If lLookupValuesRow >= lNextProgressReport Then
        Application.StatusBar = "Processing row " & lLookupValuesRow & " of " & UBound(vaLookupValues, 1)
        lNextProgressReport = lLookupValuesRow + lProgressReportTrigger
    End If
    
    sCurrentLookupValue = WorksheetFunction.Trim(LCase$(CStr(vaLookupValues(lLookupValuesRow, 1))))
    For lLookupTableRow = mlDataStartRow To UBound(vaLookupTable, 1)
        sCurrentTableValue = WorksheetFunction.Trim(LCase$(CStr(vaLookupTable(lLookupTableRow, 1))))
        sngCurrentPercent = FuzzyPercent(String1:=sCurrentLookupValue, _
                                         String2:=sCurrentTableValue, _
                                         Algorithm:=2, _
                                         Normalised:=True)
        If sngCurrentPercent >= msngThresholdValue Then
            lResultsRow = lResultsRow + 1
            ReDim Preserve vaResults(1 To 1, 1 To lResultsRow)
            vaResults(1, lResultsRow) = vaLookupTable(lLookupTableRow, 1)
        End If
    Next lLookupTableRow
Next lLookupValuesRow

With wsOutput
    .Range(.Cells(mlDataStartRow, iResultsColumn).Address).Resize(UBound(vaResults, 2), 1).Value = WorksheetFunction.Transpose(vaResults)
End With

Application.StatusBar = False

End Sub

As I intimated earlier, this may take some time on 16,00 rows, hence the progress status message every 160 rows (i.e. every 1%), controlled by the variable "lProgressReportTrigger" You may wish, initially, to set this value to, say, 10 to get a feel of how long it's going to take.
If too long, we'll have to re-visit (or tailor) the fuzzymatching algorithm.
Keep me posted :)
 
Upvote 0
Hi,

Did you change the statement
Code:
lProgressReportTrigger = Int(UBound(vaLookupValues, 1) / 100
to
Code:
lProgressReportTrigger = 5
The code will update the display at bottom left
"Processing row x of y"

Perhaps we should look at a bespoke fuzzy matching algorithm, what do your adrfesses look like - can we remove 'noise' words for instance, also the algorithm 2 matches & scores on number of matching characters, pairs, triplets, quads etc, perhaps we should just start at, say, matching quads and upwards which should speed things up, but depends on your data.
 
Upvote 0

Forum statistics

Threads
1,223,929
Messages
6,175,455
Members
452,643
Latest member
gjcase

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top