Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,546
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
I stumbled into this board and found what I needed with the fuzzy vlookup, so thank you for the great macro!

However, and I have not read through all 400+ posts here, I was wondering if it is possible to include the closest matched terms (say above a certain threshold, but not including exact matches) into one cell?

For example, is it possible to produce a result like cell C3 from using fuzzy Vlookup on column A with the search term from cell B3, as shown in the below table? Hopefully this poses a decent challenge for you.

Thanks for any assistance you can provide.

<table border="0" cellpadding="0" cellspacing="0" width="659"><colgroup><col style="mso-width-source:userset;mso-width-alt:2706;width:56pt" width="74"> <col style="mso-width-source:userset;mso-width-alt:4864; width:100pt" span="2" width="133"> <col style="mso-width-source:userset;mso-width-alt:11666;width:239pt" width="319"> </colgroup><tbody><tr style="height:16.5pt" height="22"> <td class="xl65" style="height:16.5pt;width:56pt" height="22" width="74">
</td> <td class="xl65" style="border-left:none;width:100pt" width="133">A</td> <td class="xl65" style="border-left:none;width:100pt" width="133">B</td> <td class="xl65" style="border-left:none;width:239pt" width="319">algorithm x
</td> </tr> <tr style="height:16.5pt" height="22"> <td class="xl65" style="height:16.5pt;border-top:none" height="22">3</td> <td class="xl65" style="border-top:none;border-left:none">Jane L Doe</td> <td class="xl65" style="border-top:none;border-left:none">Jane Doe</td> <td class="xl65" style="border-top:none;border-left:none">Jane L Doe, Jane B Doe, Jane Doh, Doe Jan
</td> </tr> <tr style="height:16.5pt" height="22"> <td class="xl65" style="height:16.5pt;border-top:none" height="22">4</td> <td class="xl65" style="border-top:none;border-left:none">Jane Doe</td> <td class="xl65" style="border-top:none;border-left:none">
</td> <td class="xl65" style="border-top:none;border-left:none">
</td> </tr> <tr style="height:16.5pt" height="22"> <td class="xl65" style="height:16.5pt;border-top:none" height="22">5</td> <td class="xl65" style="border-top:none;border-left:none">Jack Dee</td> <td class="xl65" style="border-top:none;border-left:none">
</td> <td class="xl65" style="border-top:none;border-left:none">
</td> </tr> <tr style="height:16.5pt" height="22"> <td class="xl65" style="height:16.5pt;border-top:none" height="22">6</td> <td class="xl65" style="border-top:none;border-left:none">Steve Bart</td> <td class="xl65" style="border-top:none;border-left:none">
</td> <td class="xl65" style="border-top:none;border-left:none">
</td> </tr> <tr style="height:16.5pt" height="22"> <td class="xl65" style="height:16.5pt;border-top:none" height="22">7</td> <td class="xl65" style="border-top:none;border-left:none">Mike Moe</td> <td class="xl65" style="border-top:none;border-left:none">
</td> <td class="xl65" style="border-top:none;border-left:none">
</td> </tr> <tr style="height:16.5pt" height="22"> <td class="xl65" style="height:16.5pt;border-top:none" height="22">8</td> <td class="xl65" style="border-top:none;border-left:none">Jane B Doe</td> <td class="xl65" style="border-top:none;border-left:none">
</td> <td class="xl65" style="border-top:none;border-left:none">
</td> </tr> <tr style="height:16.5pt" height="22"> <td class="xl65" style="height:16.5pt;border-top:none" height="22">9</td> <td class="xl65" style="border-top:none;border-left:none">Jane Doh</td> <td class="xl65" style="border-top:none;border-left:none">
</td> <td class="xl65" style="border-top:none;border-left:none">
</td> </tr><tr><td style="vertical-align: top;">10
</td><td style="vertical-align: top;">Doe Jan
</td><td style="vertical-align: top;">
</td><td style="vertical-align: top;">
</td></tr> </tbody></table>
 
Upvote 0

Excel Facts

Excel motto
Not everything I do at work revolves around Excel. Only the fun parts.
Hi foxwine, welcome to the board!

The 'not exact matches' would be an issue with the code as-is.
The multiple returns in a cell is possible sort of by use of the Rank parameter

I'll have a look at amending the code to accommodate you, but not too sure when, as I have a busy few days ahead of me.

an interesting problem though.
 
Upvote 0
Hi,

Ok most of the way there:
Code:
Option Explicit

Type RankInfo
    Offset As Long
    Percentage As Single
End Type

Dim mudRankData() As RankInfo
Dim mlBestMatchPtr As Long

Dim TopMatch         As Long
Dim strCompare       As String

Function FuzzyCount(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Algorithm As Integer = 3) As Long
'**********************************************************************
'** Simple count of (Fuzzy) Matching strings >= NFPercent threshold  **
'**********************************************************************
Dim lMatchCount As Long

Dim rCur As Range

Dim sString1 As String
Dim sString2 As String

'** Normalise lookup value **
sString1 = LCase$(Application.Trim(LookupValue))

For Each rCur In Intersect(TableArray.Resize(, 1), Sheets(TableArray.Parent.Name).UsedRange)

    '** Normalise current Table entry **
    sString2 = LCase$(Application.Trim(CStr(rCur)))
    
    If sString2 <> "" Then
        If FuzzyPercent(String1:=sString1, _
                        String2:=sString2, _
                        Algorithm:=Algorithm, _
                        Normalised:=True) >= NFPercent Then
            lMatchCount = lMatchCount + 1
        End If
    End If
Next rCur

FuzzyCount = lMatchCount

End Function

Function FuzzyPercent(ByVal String1 As String, _
                      ByVal String2 As String, _
                      Optional Algorithm As Integer = 3, _
                      Optional Normalised As Boolean = False) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim intLen1 As Integer, intLen2 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim sngScore As Single
Dim strWork As String

'-------------------------------------------------------
'-- If strings havent been normalised, normalise them --
'-------------------------------------------------------
If Normalised = False Then
    String1 = LCase$(Application.Trim(String1))
    String2 = LCase$(Application.Trim(String2))
End If

'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If String1 = String2 Then
    FuzzyPercent = 1
    Exit Function
End If

intLen1 = Len(String1)
intLen2 = Len(String2)

If intLen1 = 0 Or intLen2 = 0 Then
    FuzzyPercent = 0
    Exit Function
End If

'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
    FuzzyPercent = 0
    Exit Function
End If

intTotScore = 0                   'initialise total possible score
intScore = 0                      'initialise current score

'--------------------------------------------------------
'-- If Algorithm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (Algorithm And 1) <> 0 Then
    If intLen1 < intLen2 Then
        FuzzyAlg1 String1, String2, intScore, intTotScore
    Else
        FuzzyAlg1 String2, String1, intScore, intTotScore
    End If
End If

'-----------------------------------------------------------
'-- If Algorithm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (Algorithm And 2) <> 0 Then
    If intLen1 < intLen2 Then
        FuzzyAlg2 String1, String2, intScore, intTotScore
    Else
        FuzzyAlg2 String2, String1, intScore, intTotScore
    End If
End If

'------------------------------------------------------
'-- If Algorithm = 4,5,6,7, use Dan Ostander's code. --
'------------------------------------------------------
If (Algorithm And 4) <> 0 Then
    If intLen1 < intLen2 Then
        sngScore = FuzzyAlg4(String1, String1)
    Else
        sngScore = FuzzyAlg4(String2, String1)
    End If
    intScore = intScore + (sngScore * 100)
    intTotScore = intTotScore + 100
End If

FuzzyPercent = intScore / intTotScore

End Function

Private Sub FuzzyAlg1(ByVal String1 As String, _
                      ByVal String2 As String, _
                      ByRef Score As Integer, _
                      ByRef TotScore As Integer)
Dim intLen1 As Integer, intPos As Integer, intPtr As Integer, intStartPos As Integer

intLen1 = Len(String1)
TotScore = TotScore + intLen1              'update total possible score
intPos = 0
For intPtr = 1 To intLen1
    intStartPos = intPos + 1
    intPos = InStr(intStartPos, String2, Mid$(String1, intPtr, 1))
    If intPos > 0 Then
        If intPos > intStartPos + 3 Then     'No match if char is > 3 bytes away
            intPos = intStartPos
        Else
            Score = Score + 1          'Update current score
        End If
    Else
        intPos = intStartPos
    End If
Next intPtr
End Sub
Private Sub FuzzyAlg2(ByVal String1 As String, _
                        ByVal String2 As String, _
                        ByRef Score As Integer, _
                        ByRef TotScore As Integer)
Dim intCurLen As Integer, intLen1 As Integer, intTo As Integer, intPtr As Integer, intPos As Integer
Dim strWork As String

intLen1 = Len(String1)
For intCurLen = 1 To intLen1
    strWork = String2                          'Get a copy of String2
    intTo = intLen1 - intCurLen + 1
    TotScore = TotScore + Int(intLen1 / intCurLen)  'Update total possible score
    For intPtr = 1 To intTo Step intCurLen
        intPos = InStr(strWork, Mid$(String1, intPtr, intCurLen))
        If intPos > 0 Then
            Mid$(strWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
            Score = Score + 1     'Update current score
        End If
    Next intPtr
Next intCurLen

End Sub
Private Function FuzzyAlg4(strIn1 As String, strIn2 As String) As Single

Dim L1               As Integer
Dim In1Mask(1 To 24) As Long     'strIn1 is 24 characters max
Dim iCh              As Integer
Dim N                As Long
Dim strTry           As String
Dim strTest          As String

TopMatch = 0
L1 = Len(strIn1)
strTest = UCase(strIn1)
strCompare = UCase(strIn2)
For iCh = 1 To L1
    In1Mask(iCh) = 2 ^ iCh
Next iCh      'Loop thru all ordered combinations of characters in strIn1
For N = 2 ^ (L1 + 1) - 1 To 1 Step -1
    strTry = ""
    For iCh = 1 To L1
        If In1Mask(iCh) And N Then
            strTry = strTry & Mid(strTest, iCh, 1)
        End If
    Next iCh
    If Len(strTry) > TopMatch Then FuzzyAlg4Test strTry
Next N
FuzzyAlg4 = TopMatch / CSng(L1)
End Function
Sub FuzzyAlg4Test(strIn As String)

Dim l          As Integer
Dim strTry   As String
Dim iCh        As Integer

l = Len(strIn)
If l <= TopMatch Then Exit Sub
strTry = "*"
For iCh = 1 To l
    strTry = strTry & Mid(strIn, iCh, 1) & "*"
Next iCh
If strCompare Like strTry Then
    If l > TopMatch Then TopMatch = l
End If
End Sub

Function FuzzyVLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Rank As String = "1", _
                      Optional Algorithm As Integer = 3, _
                      Optional AdditionalCols As Integer = 0, _
                      Optional LookupColOffset As Integer = 0, _
                      Optional GroupColOffset As Integer = 0, _
                      Optional GroupValue As Variant = "") As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** column 1 of table specified by TableArray.                                 **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in column 1       **
'** is found.                                                                  **
'** For each entry in column 1 of the table, FuzzyPercent is called to match   **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the column number of the entry in TableArray required to be    **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the column entry indicated by IndexNum is     **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset row (starting at 1) is returned.   **
'**                 This value can be used directly in the 'Index' function.   **
'**                                                                            **
'** Algorithm can take one of the following values:                            **
'** Algorithm = 1:                                                             **
'**     This algorithm is best suited for matching mis-spellings.              **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algorithm = 2:                                                             **
'**     This algorithm is best suited for matching sentences, or               **
'**     'firstname lastname' compared with 'lastname firstname' combinations   **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algorithm = 3: Both Algorithms 1 and 2 are performed.                      **
'** Algorithm = 4: Dan Ostrander's algorithm                                   **
'**                                                                            **
'** The following parameters allow matching by group, and only those values    **
'** which are in the group specified will be considered for matching.          **
'** GroupColOffset                                                             **
'** This parameter specifies the offset column which contains the group values **
'** To trigger group matching, this must be a non-zero integer                 **
'** GroupValue                                                                 **
'** This parameter specifies the Group to be considered for matching           **
'********************************************************************************
Dim bWanted As Boolean
Dim bError As Boolean

Dim iPtr1 As Integer
Dim iPtr2 As Integer
Dim iaRank() As Integer

Dim lEndRow As Long

Dim rCur As Range
Dim rSearchRange As Range

Dim sngMinPercent As Single

Dim sChar As String
Dim sPrevChar As String
Dim sRankDelimiter As String
Dim sResult As String

Dim vCurValue As Variant
Dim wsTable As Worksheet

'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------

LookupValue = LCase$(Application.Trim(LookupValue))

If IsMissing(NFPercent) Then
    sngMinPercent = 0.05
Else
    If (NFPercent <= 0) Or (NFPercent > 1) Then
        FuzzyVLookup = "*** 'NFPercent' must be a percentage > zero ***"
        Exit Function
    End If
    sngMinPercent = NFPercent
End If

ReDim iaRank(1 To 2)
sRankDelimiter = ""

If IsNumeric(Rank) Then
    iaRank(1) = Val(Rank)
    iaRank(2) = iaRank(1)
    If iaRank(1) < 1 Then
        FuzzyVLookup = "*** 'Rank' must be a positive integer or format integer-delimiter-integer ***"
        Exit Function
    End If
Else
    iPtr2 = 1
    sPrevChar = "0"
    For iPtr1 = 1 To Len(Rank)
         sChar = Mid$(Rank, iPtr1, 1)
         If IsNumeric(sChar) Then
            iaRank(iPtr2) = (iaRank(iPtr2) * 10) + Val(sChar)
         Else
            If IsNumeric(sPrevChar) Then
                If iPtr2 = 2 Then Exit For
                iPtr2 = 2
            End If
            sRankDelimiter = sRankDelimiter & sChar
         End If
         sPrevChar = sChar
    Next iPtr1
    bError = False
    If iaRank(1) < 1 Then bError = True
    If iaRank(2) < iaRank(1) Then bError = True
    If bError Then
        FuzzyVLookup = "*** 'Rank' must be a positive integer or format integer-delimiter-integer ***"
        Exit Function
    End If
End If

Set wsTable = Sheets(TableArray.Parent.Name)

ReDim mudRankData(1 To iaRank(2))

lEndRow = TableArray.Rows.Count
If VarType(TableArray.Cells(lEndRow, 1).Value) = vbEmpty Then
    lEndRow = TableArray.Cells(lEndRow, 1).End(xlUp).Row
End If
Set rSearchRange = Intersect(TableArray.Resize(, 1), wsTable.UsedRange)
If rSearchRange Is Nothing Then
    FuzzyVLookup = CVErr(xlErrNA)
Else


    '---------------
    '-- Main loop --
    '---------------
    
    If Len(GroupValue) <> 0 Then
        With rSearchRange.Offset(, GroupColOffset)
            For Each rCur In rSearchRange.Offset(, GroupColOffset)
                vCurValue = rCur.Value
                If VarType(vCurValue) = vbString Then
                    bWanted = LCase$(CStr(vCurValue)) = LCase$(CStr(GroupValue))
                Else
                    bWanted = Val(vCurValue) = Val(GroupValue)
                End If
                If bWanted Then
                    FuzzyVlookupMain LookupValue:=LookupValue, _
                                    TableArray:=rCur.Offset(, GroupColOffset * -1), _
                                    IndexNum:=IndexNum, _
                                    NFPercent:=NFPercent, _
                                    Rank:=iaRank(2), _
                                    Algorithm:=Algorithm, _
                                    AdditionalCols:=AdditionalCols, _
                                    LookupColOffset:=LookupColOffset
                End If
            Next rCur
        End With
    Else
        For Each rCur In rSearchRange
            FuzzyVlookupMain LookupValue:=LookupValue, _
                            TableArray:=rCur, _
                            IndexNum:=IndexNum, _
                            NFPercent:=NFPercent, _
                            Rank:=iaRank(2), _
                            Algorithm:=Algorithm, _
                            AdditionalCols:=AdditionalCols, _
                            LookupColOffset:=LookupColOffset
        Next rCur
    End If
    
    If mudRankData(iaRank(1)).Percentage < sngMinPercent Then
        '--------------------------------------
        '-- Return '#N/A' if below NFPercent --
        '--------------------------------------
        FuzzyVLookup = CVErr(xlErrNA)
    Else
        sResult = ""
        For iPtr2 = iaRank(1) To iaRank(2)
            If mudRankData(iPtr2).Percentage < sngMinPercent Then Exit For
            mlBestMatchPtr = mudRankData(iPtr2).Offset - TableArray.Cells(1, 1).Row + 1
            If sResult <> "" Then sResult = sResult & sRankDelimiter
            
            If IndexNum > 0 Then
                '-----------------------------------
                '-- Return column entry specified --
                '-----------------------------------
                sResult = sResult & CStr(TableArray.Cells(mlBestMatchPtr, IndexNum))
            Else
                '-----------------------
                '-- Return offset row --
                '-----------------------
                sResult = sResult & CStr(mlBestMatchPtr)
            End If
        Next iPtr2
        FuzzyVLookup = sResult
    End If
    
End If          'If rSearchRange Is Nothing Then
End Function
Private Sub FuzzyVlookupMain(ByVal LookupValue As String, _
                                    ByVal TableArray As Range, _
                                    ByVal IndexNum As Integer, _
                                    Optional NFPercent As Single = 0.05, _
                                    Optional Rank As Integer = 1, _
                                    Optional Algorithm As Integer = 3, _
                                    Optional AdditionalCols As Integer = 0, _
                                    Optional LookupColOffset As Integer = 0)
Dim I As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer
Dim strListString As String
Dim sngCurPercent As Single
Dim sngMinPercent As Single
Dim vCurValue As Variant

vCurValue = ""
For I = 0 To AdditionalCols
    vCurValue = vCurValue & CStr(TableArray.Offset(0, I + LookupColOffset).Value)
Next I
If VarType(vCurValue) = vbString Then
    strListString = LCase$(Application.Trim(vCurValue))
    
    '------------------------------------------------
    '-- Fuzzy match strings & get percentage match --
    '------------------------------------------------
    sngCurPercent = FuzzyPercent(String1:=LookupValue, _
                                 String2:=strListString, _
                                 Algorithm:=Algorithm, _
                                 Normalised:=True)
    
    If sngCurPercent >= NFPercent Then
        '---------------------------
        '-- Store in ranked array --
        '---------------------------
        For intRankPtr = 1 To Rank
            If sngCurPercent > mudRankData(intRankPtr).Percentage Then
                For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                    With mudRankData(intRankPtr1)
                        .Offset = mudRankData(intRankPtr1 - 1).Offset
                        .Percentage = mudRankData(intRankPtr1 - 1).Percentage
                    End With
                Next intRankPtr1
                With mudRankData(intRankPtr)
                    .Offset = TableArray.Row
                    .Percentage = sngCurPercent
                End With
                Exit Sub
            End If
        Next intRankPtr
    End If
    
End If
End Sub

This version has the "Rank" parameter amended to accept a range , in the example I've specified 2nd to 10th best matches, with a 50% confidence value.

Continued ...
 
Upvote 0
foxwine - Can i assume that "Jane Doe" is the same as "jane doe" which is the same as "JANE DOE" which is the same as "Jane space space space Doe" which is the same as "space space Jane space space space Doe space space"?

fuzzyvlookup currently 'normalises' input to lower case and removes multiple, leading ant trailing spaces ... which would make all of the above a 100% match.
 
Last edited:
Upvote 0
foxwine - Can i assume that "Jane Doe" is the same as "jane doe" which is the same as "JANE DOE" which is the same as "Jane space space space Doe" which is the same as "space space Jane space space space Doe space space"?

fuzzyvlookup currently 'normalises' input to lower case and removes multiple, leading ant trailing spaces ... which would make all of the above a 100% match.

Whew, long work day!

This is great, I am playing around with your solution like it is a Christmas toy :rofl: I will certainly get a lot of mileage out of it. And yes, you are correct, case sensitivity can be ignored.

Thanks very much for the replies and help, Alan.
 
Upvote 0
Hi, ok final solution.

NFPercent parameter amended to allow a min andf max %, so 50%~99% will return ionly those matches with a 50 to 99% confidence.
Rank paramerter amended to return results in rank order implied, so "Rank = 1 10" will return best to worst, "Rank = 10 1" will return worst to best within NFPercent confines.
Code:
Option Explicit

Type RankInfo
    Offset As Long
    Percentage As Single
End Type

Dim mudRankData() As RankInfo
Dim mlBestMatchPtr As Long

Dim TopMatch         As Long
Dim strCompare       As String

Function FuzzyCount(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Algorithm As Integer = 3) As Long
'**********************************************************************
'** Simple count of (Fuzzy) Matching strings >= NFPercent threshold  **
'**********************************************************************
Dim lMatchCount As Long

Dim rCur As Range

Dim sString1 As String
Dim sString2 As String

'** Normalise lookup value **
sString1 = LCase$(Application.Trim(LookupValue))

For Each rCur In Intersect(TableArray.Resize(, 1), Sheets(TableArray.Parent.Name).UsedRange)

    '** Normalise current Table entry **
    sString2 = LCase$(Application.Trim(CStr(rCur)))
    
    If sString2 <> "" Then
        If FuzzyPercent(String1:=sString1, _
                        String2:=sString2, _
                        Algorithm:=Algorithm, _
                        Normalised:=True) >= NFPercent Then
            lMatchCount = lMatchCount + 1
        End If
    End If
Next rCur

FuzzyCount = lMatchCount

End Function

Function FuzzyPercent(ByVal String1 As String, _
                      ByVal String2 As String, _
                      Optional Algorithm As Integer = 3, _
                      Optional Normalised As Boolean = False) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim intLen1 As Integer, intLen2 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim sngScore As Single
Dim strWork As String

'-------------------------------------------------------
'-- If strings havent been normalised, normalise them --
'-------------------------------------------------------
If Normalised = False Then
    String1 = LCase$(Application.Trim(String1))
    String2 = LCase$(Application.Trim(String2))
End If

'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If String1 = String2 Then
    FuzzyPercent = 1
    Exit Function
End If

intLen1 = Len(String1)
intLen2 = Len(String2)

If intLen1 = 0 Or intLen2 = 0 Then
    FuzzyPercent = 0
    Exit Function
End If

'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
    FuzzyPercent = 0
    Exit Function
End If

intTotScore = 0                   'initialise total possible score
intScore = 0                      'initialise current score

'--------------------------------------------------------
'-- If Algorithm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (Algorithm And 1) <> 0 Then
    If intLen1 < intLen2 Then
        FuzzyAlg1 String1, String2, intScore, intTotScore
    Else
        FuzzyAlg1 String2, String1, intScore, intTotScore
    End If
End If

'-----------------------------------------------------------
'-- If Algorithm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (Algorithm And 2) <> 0 Then
    If intLen1 < intLen2 Then
        FuzzyAlg2 String1, String2, intScore, intTotScore
    Else
        FuzzyAlg2 String2, String1, intScore, intTotScore
    End If
End If

'------------------------------------------------------
'-- If Algorithm = 4,5,6,7, use Dan Ostander's code. --
'------------------------------------------------------
If (Algorithm And 4) <> 0 Then
    If intLen1 < intLen2 Then
        sngScore = FuzzyAlg4(String1, String1)
    Else
        sngScore = FuzzyAlg4(String2, String1)
    End If
    intScore = intScore + (sngScore * 100)
    intTotScore = intTotScore + 100
End If

FuzzyPercent = intScore / intTotScore

End Function

Private Sub FuzzyAlg1(ByVal String1 As String, _
                      ByVal String2 As String, _
                      ByRef Score As Integer, _
                      ByRef TotScore As Integer)
Dim intLen1 As Integer, intPos As Integer, intPtr As Integer, intStartPos As Integer

intLen1 = Len(String1)
TotScore = TotScore + intLen1              'update total possible score
intPos = 0
For intPtr = 1 To intLen1
    intStartPos = intPos + 1
    intPos = InStr(intStartPos, String2, Mid$(String1, intPtr, 1))
    If intPos > 0 Then
        If intPos > intStartPos + 3 Then     'No match if char is > 3 bytes away
            intPos = intStartPos
        Else
            Score = Score + 1          'Update current score
        End If
    Else
        intPos = intStartPos
    End If
Next intPtr
End Sub
Private Sub FuzzyAlg2(ByVal String1 As String, _
                        ByVal String2 As String, _
                        ByRef Score As Integer, _
                        ByRef TotScore As Integer)
Dim intCurLen As Integer, intLen1 As Integer, intTo As Integer, intPtr As Integer, intPos As Integer
Dim strWork As String

intLen1 = Len(String1)
For intCurLen = 1 To intLen1
    strWork = String2                          'Get a copy of String2
    intTo = intLen1 - intCurLen + 1
    TotScore = TotScore + Int(intLen1 / intCurLen)  'Update total possible score
    For intPtr = 1 To intTo Step intCurLen
        intPos = InStr(strWork, Mid$(String1, intPtr, intCurLen))
        If intPos > 0 Then
            Mid$(strWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
            Score = Score + 1     'Update current score
        End If
    Next intPtr
Next intCurLen

End Sub
Private Function FuzzyAlg4(strIn1 As String, strIn2 As String) As Single

Dim L1               As Integer
Dim In1Mask(1 To 24) As Long     'strIn1 is 24 characters max
Dim iCh              As Integer
Dim N                As Long
Dim strTry           As String
Dim strTest          As String

TopMatch = 0
L1 = Len(strIn1)
strTest = UCase(strIn1)
strCompare = UCase(strIn2)
For iCh = 1 To L1
    In1Mask(iCh) = 2 ^ iCh
Next iCh      'Loop thru all ordered combinations of characters in strIn1
For N = 2 ^ (L1 + 1) - 1 To 1 Step -1
    strTry = ""
    For iCh = 1 To L1
        If In1Mask(iCh) And N Then
            strTry = strTry & Mid(strTest, iCh, 1)
        End If
    Next iCh
    If Len(strTry) > TopMatch Then FuzzyAlg4Test strTry
Next N
FuzzyAlg4 = TopMatch / CSng(L1)
End Function
Sub FuzzyAlg4Test(strIn As String)

Dim l          As Integer
Dim strTry   As String
Dim iCh        As Integer

l = Len(strIn)
If l <= TopMatch Then Exit Sub
strTry = "*"
For iCh = 1 To l
    strTry = strTry & Mid(strIn, iCh, 1) & "*"
Next iCh
If strCompare Like strTry Then
    If l > TopMatch Then TopMatch = l
End If
End Sub

Function FuzzyVLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As String = "5%", _
                      Optional Rank As String = "1", _
                      Optional Algorithm As Integer = 3, _
                      Optional AdditionalCols As Integer = 0, _
                      Optional LookupColOffset As Integer = 0, _
                      Optional GroupColOffset As Integer = 0, _
                      Optional GroupValue As Variant = "") As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** column 1 of table specified by TableArray.                                 **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in column 1       **
'** is found.                                                                  **
'** For each entry in column 1 of the table, FuzzyPercent is called to match   **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the column number of the entry in TableArray required to be    **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the column entry indicated by IndexNum is     **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset row (starting at 1) is returned.   **
'**                 This value can be used directly in the 'Index' function.   **
'**                                                                            **
'** Algorithm can take one of the following values:                            **
'** Algorithm = 1:                                                             **
'**     This algorithm is best suited for matching mis-spellings.              **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algorithm = 2:                                                             **
'**     This algorithm is best suited for matching sentences, or               **
'**     'firstname lastname' compared with 'lastname firstname' combinations   **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algorithm = 3: Both Algorithms 1 and 2 are performed.                      **
'** Algorithm = 4: Dan Ostrander's algorithm                                   **
'**                                                                            **
'** The following parameters allow matching by group, and only those values    **
'** which are in the group specified will be considered for matching.          **
'** GroupColOffset                                                             **
'** This parameter specifies the offset column which contains the group values **
'** To trigger group matching, this must be a non-zero integer                 **
'** GroupValue                                                                 **
'** This parameter specifies the Group to be considered for matching           **
'********************************************************************************
Dim bWanted As Boolean
Dim bError As Boolean

Dim iPtr1 As Integer
Dim iPtr2 As Integer
Dim iaRank() As Integer
Dim iaRankLowHigh() As Integer
Dim iRankHigh As Integer
Dim iRankLow As Long
Dim iRankStep As Integer
Dim lEndRow As Long

Dim rCur As Range
Dim rSearchRange As Range

Dim sngNFPercent As Single
Dim sngaNFPercent() As Single

Dim sChar As String
Dim sPrevChar As String
Dim sRankDelimiter As String
Dim sResult As String
Dim strNFPercent As String
Dim straNFPercent() As String

Dim vCurValue As Variant
Dim wsTable As Worksheet

'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------

LookupValue = LCase$(Application.Trim(LookupValue))

If IsMissing(NFPercent) Then
    sngaNFPercent(0) = 0.05
    sngaNFPercent(1) = 1
'    sngMinPercent = 0.05
'    sngMaxPercent = 1
Else
    straNFPercent = Split(NFPercent, "~")
    ReDim sngaNFPercent(0 To 1)
    If UBound(straNFPercent) = 0 Then
        ReDim Preserve straNFPercent(0 To 1)
        straNFPercent(1) = "1"
    End If
    For iPtr1 = 0 To 1
        sngaNFPercent(iPtr1) = Val(straNFPercent(iPtr1))
        If sngaNFPercent(iPtr1) > 1 Then sngaNFPercent(iPtr1) = sngaNFPercent(iPtr1) / 100
        If (sngaNFPercent(iPtr1) <= 0) Or (sngaNFPercent(iPtr1) > 1) Then
            FuzzyVLookup = "*** 'NFPercent' is invalid ***"
            Exit Function
        End If
    Next iPtr1
    sngNFPercent = sngaNFPercent(0)
    If sngaNFPercent(0) > sngaNFPercent(1) Then
        sngNFPercent = sngaNFPercent(1)
        sngaNFPercent(1) = sngaNFPercent(0)
        sngaNFPercent(0) = sngNFPercent
    End If
End If

ReDim iaRank(1 To 2)
sRankDelimiter = ""

If IsNumeric(Rank) Then
    iaRank(1) = Val(Rank)
    iaRank(2) = iaRank(1)
    iRankLow = iaRank(1)
    iRankHigh = iaRank(1)
    If iaRank(1) < 1 Then
        FuzzyVLookup = "*** 'Rank' must be a positive integer or format integer-delimiter-integer ***"
        Exit Function
    End If
Else
    iPtr2 = 1
    sPrevChar = "0"
    For iPtr1 = 1 To Len(Rank)
         sChar = Mid$(Rank, iPtr1, 1)
         If IsNumeric(sChar) Then
            iaRank(iPtr2) = (iaRank(iPtr2) * 10) + Val(sChar)
         Else
            If IsNumeric(sPrevChar) Then
                If iPtr2 = 2 Then Exit For
                iPtr2 = 2
            End If
            sRankDelimiter = sRankDelimiter & sChar
         End If
         sPrevChar = sChar
    Next iPtr1
    bError = False
    If iaRank(1) < 1 Then bError = True
    If iaRank(2) < 1 Then bError = True
    If bError Then
        FuzzyVLookup = "*** 'Rank' must be a positive integer or format integer-delimiter-integer ***"
        Exit Function
    End If
    If iaRank(2) < iaRank(1) Then
        iRankStep = -1
        iRankLow = iaRank(2)
        iRankHigh = iaRank(1)
    Else
        iRankStep = 1
        iRankLow = iaRank(1)
        iRankHigh = iaRank(2)
    End If
End If

Set wsTable = Sheets(TableArray.Parent.Name)

ReDim mudRankData(1 To iRankHigh)

lEndRow = TableArray.Rows.Count
If VarType(TableArray.Cells(lEndRow, 1).Value) = vbEmpty Then
    lEndRow = TableArray.Cells(lEndRow, 1).End(xlUp).Row
End If
Set rSearchRange = Intersect(TableArray.Resize(, 1), wsTable.UsedRange)
If rSearchRange Is Nothing Then
    FuzzyVLookup = CVErr(xlErrNA)
Else


    '---------------
    '-- Main loop --
    '---------------
    
    If Len(GroupValue) <> 0 Then
        With rSearchRange.Offset(, GroupColOffset)
            For Each rCur In rSearchRange.Offset(, GroupColOffset)
                vCurValue = rCur.Value
                If VarType(vCurValue) = vbString Then
                    bWanted = LCase$(CStr(vCurValue)) = LCase$(CStr(GroupValue))
                Else
                    bWanted = Val(vCurValue) = Val(GroupValue)
                End If
                If bWanted Then
                    FuzzyVlookupMain LookupValue:=LookupValue, _
                                    TableArray:=rCur.Offset(, GroupColOffset * -1), _
                                    IndexNum:=IndexNum, _
                                    NFPercent:=sngNFPercent, _
                                    Rank:=iRankHigh, _
                                    Algorithm:=Algorithm, _
                                    AdditionalCols:=AdditionalCols, _
                                    LookupColOffset:=LookupColOffset
                End If
            Next rCur
        End With
    Else
        For Each rCur In rSearchRange
            FuzzyVlookupMain LookupValue:=LookupValue, _
                            TableArray:=rCur, _
                            IndexNum:=IndexNum, _
                            NFPercent:=sngNFPercent, _
                            Rank:=iRankHigh, _
                            Algorithm:=Algorithm, _
                            AdditionalCols:=AdditionalCols, _
                            LookupColOffset:=LookupColOffset
        Next rCur
    End If
    
    If (mudRankData(iaRank(1)).Percentage < sngNFPercent) _
    And (mudRankData(iaRank(2)).Percentage < sngNFPercent) Then
        '--------------------------------------
        '-- Return '#N/A' if below NFPercent --
        '--------------------------------------
        FuzzyVLookup = CVErr(xlErrNA)
    Else
        sResult = ""
        For iPtr2 = iaRank(1) To iaRank(2) Step iRankStep
            If (mudRankData(iPtr2).Percentage >= sngaNFPercent(0)) _
            And (mudRankData(iPtr2).Percentage <= sngaNFPercent(1)) Then
                mlBestMatchPtr = mudRankData(iPtr2).Offset - TableArray.Cells(1, 1).Row + 1
                If sResult <> "" Then sResult = sResult & sRankDelimiter
            
                If IndexNum > 0 Then
                    '-----------------------------------
                    '-- Return column entry specified --
                    '-----------------------------------
                    sResult = sResult & CStr(TableArray.Cells(mlBestMatchPtr, IndexNum))
                Else
                    '-----------------------
                    '-- Return offset row --
                    '-----------------------
                    sResult = sResult & CStr(mlBestMatchPtr)
                End If
            End If
        Next iPtr2
        FuzzyVLookup = sResult
    End If
    
End If          'If rSearchRange Is Nothing Then
End Function
Private Sub FuzzyVlookupMain(ByVal LookupValue As String, _
                                    ByVal TableArray As Range, _
                                    ByVal IndexNum As Integer, _
                                    Optional NFPercent As Single = 0.05, _
                                    Optional Rank As Integer = 1, _
                                    Optional Algorithm As Integer = 3, _
                                    Optional AdditionalCols As Integer = 0, _
                                    Optional LookupColOffset As Integer = 0)
Dim I As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer
Dim strListString As String
Dim sngCurPercent As Single
Dim sngMinPercent As Single
Dim vCurValue As Variant

vCurValue = ""
For I = 0 To AdditionalCols
    vCurValue = vCurValue & CStr(TableArray.Offset(0, I + LookupColOffset).Value)
Next I
If VarType(vCurValue) = vbString Then
    strListString = LCase$(Application.Trim(vCurValue))
    
    '------------------------------------------------
    '-- Fuzzy match strings & get percentage match --
    '------------------------------------------------
    sngCurPercent = FuzzyPercent(String1:=LookupValue, _
                                 String2:=strListString, _
                                 Algorithm:=Algorithm, _
                                 Normalised:=True)
    
    If sngCurPercent >= NFPercent Then
        '---------------------------
        '-- Store in ranked array --
        '---------------------------
        For intRankPtr = 1 To Rank
            If sngCurPercent > mudRankData(intRankPtr).Percentage Then
                For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                    With mudRankData(intRankPtr1)
                        .Offset = mudRankData(intRankPtr1 - 1).Offset
                        .Percentage = mudRankData(intRankPtr1 - 1).Percentage
                    End With
                Next intRankPtr1
                With mudRankData(intRankPtr)
                    .Offset = TableArray.Row
                    .Percentage = sngCurPercent
                End With
                Exit Sub
            End If
        Next intRankPtr
    End If
    
End If
End Sub

continued ....
 
Upvote 0
... continued

Example:

Excel 2003
ABCD
1TableLookupFuzzyvLookup
2Jane L DoeJane DoeRank = 1 to 10:Jane L Doe , Jane B Doe , Jane Doh , Doe Jan
3Jane Doe
4Jack DeeRank = 10 to 1:Doe Jan, Jane Doh , Jane B Doe , Jane L Doe
5Steve Bart
6Mike Moe
7Jane B Doe
8Jane Doh
9Doe Jan
Sheet1
Cell Formulas
RangeFormula
D2=FuzzyVLookup(B2,A:A,1,"50%~99%","1, 10",2)
D4=FuzzyVLookup(B2,A:A,1,"50%~99%","10, 1",2)


note that 'Exact' matches do not get included.
 
Upvote 0
Dear al_b_cnu,

Thanks, first of all, for providing this awesome code. I've been able to implement it in my spreadsheet and it's working great. There is just one function I am still missing for my ideal search and I wondered if you knew a way to handle it. What I am interested in is a way that the search will recognize similar words that have transposed characters.

For example, if I search for "Marshalnd" in my database of terms, it comes up with Mars, Marshal, Mary, Rash, and Marshland Area for ranks 1-5 but not simply "Marshland" which is arguably the closest (having all letters in common, but not all in the same order). Do you know how to tackle such a problem or could you point me in a good direction?

Thanks again for all the work you've put into this code over the years!
 
Upvote 0
It's too bad post rating can only be done once ;) Thanks very much for the final solution, Alan. I suspect eventually there will be a point sooner or later where the solution may be better off by using some other language besides vBA + Excel. Until then, I will keep track of this post to see what other type of ways can be applied by the macro. Again, excellent work!
 
Upvote 0

Forum statistics

Threads
1,223,270
Messages
6,171,102
Members
452,379
Latest member
IainTru

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top