Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,546
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
Hi TSO,

I've just run the above on Excel 2007, no problems. Do you want to send me a sample file & I'll look at it - PM me.
 
Upvote 0

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
Hi,

This amended version:
Code:
Sub MatchData()
Dim dblCur As Double
Dim iPtr As Integer
Dim lRow As Long, lRowEnd As Long, lResultRow As Long
Dim objCustDictionary As Object
Dim rCur As Range
Dim sCur As String, sCurSeries As String, sCurKey As String, sCurSplit() As String
Dim sRange As String
Dim vCur As Variant
Dim wsInput As Worksheet

Set wsInput = Sheets(msInputSheet)

wsInput.Range(msResultCol & "2:" & msResultCol & wsInput.UsedRange.Rows.Count).ClearContents

Set objCustDictionary = Nothing
Set objCustDictionary = CreateObject("Scripting.Dictionary")
lRowEnd = wsInput.Cells(Rows.Count, msMyDataCol).End(xlUp).Row
For Each rCur In wsInput.Range(msMyDataCol & "2:" & Cells(lRowEnd, msMyDataCol).Address)
    sCur = CStr(rCur.Value)
    If sCur<> "" Then
        sCurSplit = Split("-" & sCur, "-")
    End If
    iPtr = UBound(sCurSplit)
    If LCase$(sCurSplit(iPtr)) = "series" Then
        ReDim Preserve sCurSplit(0 To iPtr - 1)
        sCurSeries = Mid$(Join(sCurSplit, "-"), 2)
        On Error Resume Next
        objCustDictionary.Add key:=sCurSeries, Item:="XXX"
        On Error GoTo 0
    End If
Next rCur

lRowEnd = wsInput.Cells(Rows.Count, msCustCol).End(xlUp).Row
For Each rCur In wsInput.Range(msCustCol & "2:" & Cells(lRowEnd, msCustCol).Address)
    vCur = rCur.Value
    lRow = 0
    On Error Resume Next
    lRow = WorksheetFunction.Match(vCur, wsInput.Columns(msMyDataCol), 0)
    On Error GoTo 0
    If lRow = 0 Then
        sCurSplit = Split("-" & CStr(vCur), "-")
        iPtr = UBound(sCurSplit)
        ReDim Preserve sCurSplit(0 To iPtr - 1)
        sCurKey = Mid$(Join(sCurSplit, "-"), 2)
        If objCustDictionary.exists(sCurKey) Then wsInput.Range(msResultCol & rCur.Row).Value = sCurKey & "-SERIES"
    Else
        wsInput.Range(msResultCol & rCur.Row).Value = vCur
    End If
Next rCur

On Error Resume Next
objCustDictionary.RemoveAll
Set objCustDictionary = Nothing

End Sub

Returns these results:
Excel Workbook
ABC
1My DataCust DataResults
21231-183744-3
3542-series83744-2
483736-1542-10542-SERIES
583736-11231-11231-1
6020-238-SERIES034100AA000PC0A-4034100AA000PC0A-SERIES
7034100AA000PC0A-SERIES034100AA000PC0A-3034100AA000PC0A-SERIES
8123-123-2020-238-2020-238-SERIES
9324234-6020-238-1020-238-SERIES
1023hb23h412b-4-5123123
11123-123-3132121664
12123-123-4B1421B1421
13324234-7G05315G05315
1423hb23h412b-4-6240545240545
15123123-series
16132121664-series
17B1421
18G05315
19240545
Sheet1
Excel 2003

Regarding the issue that 123123-series not finding 123123, this seems to be a slight rule change here, the code WOULD, however find something like 123123-01. What are the rules?

should I just use these codes?
or add them together?
it says "subscript out of range" if I use these codes
 
Upvote 0
Hi TSO,

Remove the code from the Sheet1 code module, then from the menu [Insert] [Module] then paste i n the following into Module1:
Code:
Option Explicit
Const msInputSheet As String = "Sheet1"
Const msMyDataCol As String = "A"
Const msCustCol As String = "B"
Const msResultCol As String = "C"

Sub CorrectData()
Dim iSplitPtr As Integer
Dim lRow1 As Long, lRow2 As Long, lRowEnd As Long, lPtr As Long
Dim rCur As Range
Dim sFirstAddr As String
Dim sCur As String, saSplit() As String
Dim vaData() As Variant
Dim wsInput As Worksheet

Set wsInput = Sheets(msInputSheet)
lPtr = 0

With wsInput.Columns(msMyDataCol)
    Set rCur = .Find(what:=",", LookIn:=xlValues, lookat:=xlPart)
    If Not rCur Is Nothing Then
        sFirstAddr = rCur.Address
        Do
            sCur = rCur.Value
            saSplit = Split(sCur, ",")
            rCur.Value = saSplit(0) & saSplit(1)
            For iSplitPtr = 2 To UBound(saSplit)
                lPtr = lPtr + 1
                ReDim Preserve vaData(1 To 1, 1 To lPtr)
                vaData(1, lPtr) = saSplit(0) & saSplit(iSplitPtr)
            Next iSplitPtr
            Set rCur = .FindNext(rCur)
            If rCur Is Nothing Then Exit Do
        Loop While rCur.Address <> sFirstAddr
    End If
End With
If lPtr > 0 Then
    lRow1 = wsInput.Cells(Rows.Count, msMyDataCol).End(xlUp).Row + 1
    lRow2 = lRow1 + lPtr - 1
    wsInput.Range(msMyDataCol & lRow1 & ":" & msMyDataCol & lRow2).Value = WorksheetFunction.Transpose(vaData)
End If
End Sub

Sub MatchData()
Dim dblCur As Double
Dim iPtr As Integer
Dim lRow As Long, lRowEnd As Long, lResultRow As Long
Dim objCustDictionary As Object
Dim rCur As Range
Dim sCur As String, sCurSeries As String, sCurKey As String, sCurSplit() As String
Dim sRange As String
Dim vCur As Variant
Dim wsInput As Worksheet

Set wsInput = Sheets(msInputSheet)

wsInput.Range(msResultCol & "2:" & msResultCol & wsInput.UsedRange.Rows.Count).ClearContents

Set objCustDictionary = Nothing
Set objCustDictionary = CreateObject("Scripting.Dictionary")
lRowEnd = wsInput.Cells(Rows.Count, msMyDataCol).End(xlUp).Row
For Each rCur In wsInput.Range(msMyDataCol & "2:" & Cells(lRowEnd, msMyDataCol).Address)
    sCur = CStr(rCur.Value)
    If sCur <> "" Then
        sCurSplit = Split("-" & sCur, "-")
    End If
    iPtr = UBound(sCurSplit)
    If LCase$(sCurSplit(iPtr)) = "series" Then
        ReDim Preserve sCurSplit(0 To iPtr - 1)
        sCurSeries = Mid$(Join(sCurSplit, "-"), 2)
        On Error Resume Next
        objCustDictionary.Add Key:=sCurSeries, Item:="XXX"
        On Error GoTo 0
    End If
Next rCur

lRowEnd = wsInput.Cells(Rows.Count, msCustCol).End(xlUp).Row
For Each rCur In wsInput.Range(msCustCol & "2:" & Cells(lRowEnd, msCustCol).Address)
    vCur = rCur.Value
    lRow = 0
    On Error Resume Next
    lRow = WorksheetFunction.Match(vCur, wsInput.Columns(msMyDataCol), 0)
    On Error GoTo 0
    If lRow = 0 Then
        sCurSplit = Split("-" & CStr(vCur), "-")
        iPtr = UBound(sCurSplit)
        ReDim Preserve sCurSplit(0 To iPtr - 1)
        sCurKey = Mid$(Join(sCurSplit, "-"), 2)
        If objCustDictionary.exists(sCurKey) Then wsInput.Range(msResultCol & rCur.Row).Value = sCurKey & "-SERIES"
    Else
        wsInput.Range(msResultCol & rCur.Row).Value = vCur
    End If
Next rCur

On Error Resume Next
objCustDictionary.RemoveAll
Set objCustDictionary = Nothing

End Sub
 
Upvote 0
Hi TSO,

Remove the code from the Sheet1 code module, then from the menu [Insert] [Module] then paste i n the following into Module1:
Code:
Option Explicit
Const msInputSheet As String = "Sheet1"
Const msMyDataCol As String = "A"
Const msCustCol As String = "B"
Const msResultCol As String = "C"

Sub CorrectData()
Dim iSplitPtr As Integer
Dim lRow1 As Long, lRow2 As Long, lRowEnd As Long, lPtr As Long
Dim rCur As Range
Dim sFirstAddr As String
Dim sCur As String, saSplit() As String
Dim vaData() As Variant
Dim wsInput As Worksheet

Set wsInput = Sheets(msInputSheet)
lPtr = 0

With wsInput.Columns(msMyDataCol)
    Set rCur = .Find(what:=",", LookIn:=xlValues, lookat:=xlPart)
    If Not rCur Is Nothing Then
        sFirstAddr = rCur.Address
        Do
            sCur = rCur.Value
            saSplit = Split(sCur, ",")
            rCur.Value = saSplit(0) & saSplit(1)
            For iSplitPtr = 2 To UBound(saSplit)
                lPtr = lPtr + 1
                ReDim Preserve vaData(1 To 1, 1 To lPtr)
                vaData(1, lPtr) = saSplit(0) & saSplit(iSplitPtr)
            Next iSplitPtr
            Set rCur = .FindNext(rCur)
            If rCur Is Nothing Then Exit Do
        Loop While rCur.Address <> sFirstAddr
    End If
End With
If lPtr > 0 Then
    lRow1 = wsInput.Cells(Rows.Count, msMyDataCol).End(xlUp).Row + 1
    lRow2 = lRow1 + lPtr - 1
    wsInput.Range(msMyDataCol & lRow1 & ":" & msMyDataCol & lRow2).Value = WorksheetFunction.Transpose(vaData)
End If
End Sub

Sub MatchData()
Dim dblCur As Double
Dim iPtr As Integer
Dim lRow As Long, lRowEnd As Long, lResultRow As Long
Dim objCustDictionary As Object
Dim rCur As Range
Dim sCur As String, sCurSeries As String, sCurKey As String, sCurSplit() As String
Dim sRange As String
Dim vCur As Variant
Dim wsInput As Worksheet

Set wsInput = Sheets(msInputSheet)

wsInput.Range(msResultCol & "2:" & msResultCol & wsInput.UsedRange.Rows.Count).ClearContents

Set objCustDictionary = Nothing
Set objCustDictionary = CreateObject("Scripting.Dictionary")
lRowEnd = wsInput.Cells(Rows.Count, msMyDataCol).End(xlUp).Row
For Each rCur In wsInput.Range(msMyDataCol & "2:" & Cells(lRowEnd, msMyDataCol).Address)
    sCur = CStr(rCur.Value)
    If sCur <> "" Then
        sCurSplit = Split("-" & sCur, "-")
    End If
    iPtr = UBound(sCurSplit)
    If LCase$(sCurSplit(iPtr)) = "series" Then
        ReDim Preserve sCurSplit(0 To iPtr - 1)
        sCurSeries = Mid$(Join(sCurSplit, "-"), 2)
        On Error Resume Next
        objCustDictionary.Add Key:=sCurSeries, Item:="XXX"
        On Error GoTo 0
    End If
Next rCur

lRowEnd = wsInput.Cells(Rows.Count, msCustCol).End(xlUp).Row
For Each rCur In wsInput.Range(msCustCol & "2:" & Cells(lRowEnd, msCustCol).Address)
    vCur = rCur.Value
    lRow = 0
    On Error Resume Next
    lRow = WorksheetFunction.Match(vCur, wsInput.Columns(msMyDataCol), 0)
    On Error GoTo 0
    If lRow = 0 Then
        sCurSplit = Split("-" & CStr(vCur), "-")
        iPtr = UBound(sCurSplit)
        ReDim Preserve sCurSplit(0 To iPtr - 1)
        sCurKey = Mid$(Join(sCurSplit, "-"), 2)
        If objCustDictionary.exists(sCurKey) Then wsInput.Range(msResultCol & rCur.Row).Value = sCurKey & "-SERIES"
    Else
        wsInput.Range(msResultCol & rCur.Row).Value = vCur
    End If
Next rCur

On Error Resume Next
objCustDictionary.RemoveAll
Set objCustDictionary = Nothing

End Sub

Thanks..I could use the macro now but it did not get the things I mentioned...
 
Upvote 0
That's a remarkable piece of work!

I looked into the same issue a few years ago, in 2006, and posted up a solution on an obscure personal blog: my initial cut used a very similar solution to you: ranking, an alternative algorithm, a modified version for addresses.

In the end, I concluded that you can be too clever, and and up with too many features that clutter the interface and add sources of error: I stripped out the ranking option, and a 'column 0' undocumented parameter, which returned the match score instead a value retrieved from column 'n'. It's interesting that we ended up developing such similar features, though! It seems clear that the problems of address-matching impose similar solutions.

The choice of algorithms will never be perfect: I started out with a Levenshtein 'Edit Distance' algorithm, and there's a version of that on my LiveJournal pages which runs at an acceptable speed. 'Edit Distance' solutions are the gold standard of text-matching algorithms, but they tend to be slow and most of them require an additional layer of word-order and syntax analysis in order to cope with longer samples of text.

Later solutions (and the current version of FuzzyVLookup() on Excellerando: http://excellerando.blogspot.com/2010/03/vlookup-with-fuzzy-matching-to-get.html ) use a sum-of-common-strings approach. This is fast and consistent, but it has a sequencing bias in its results (it'll match 'wednesday' to 'wednXXday' in preference to ''dayXwedne', which has a better substring match) - but this sequence bias is functionally equivalent to 'scoring' the transposition as an additional edit distance, and works very well indeed in terms of 'real-world' text-matching and scoring, as humans regard large transpositions that change the sequence of the text as a much more significant 'edit' than swapping letters or even entire words.

Have you performed this kind of analysis on your algorithm? And how well does it work on postal addresses?
 
Last edited:
Upvote 0
is it possible to return the best match and another column of data connected with it? lets say fuzzymatch returns Jim as the best match, could I have it return Jim and his email address which is in the column to the right

Name | Match

Will | Jim |jim@gmail.com
Jim |
Bob |
Rob |
 
Upvote 0
Hi,

The most efficient way would be to use a helper column, use FuzzyVLookup to return the row and use OFFSET to return the desired value. This would eliminate the need to call FuzzyVLookup twice. e.g.:
Excel Workbook
ABCDEFGH
1NameEmailSearchResult RowResult NameResult Email
2WillWill@hotmail.comJ I M3JimJim@hotmail.com
3JimJim@hotmail.com
4BobBob@hotmail.com
5DaveDave@hotmail.com
Sheet1
Excel 2003
Cell Formulas
RangeFormula
F2=FuzzyVLookup(E2,A:B,0)
G2=OFFSET($A$1,$F2-1,0)
H2=OFFSET($A$1,$F2-1,1)


Make sure you use the latest version indicated in my signature.
 
Upvote 0
You could alternatively use the INDEX function:
Excel Workbook
ABCDEFGH
1NameEmailSearchResult RowResult NameResult Email
2WillWill@hotmail.comJ I M3JimJim@hotmail.com
3JimJim@hotmail.com
4BobBob@hotmail.com
5DaveDave@hotmail.com
Sheet1
Excel 2003
Cell Formulas
RangeFormula
F2=FuzzyVLookup(E2,A:B,0)
G2=INDEX($A:$B,$F2,1)
H2=INDEX($A:$B,$F2,2)
 
Upvote 0
Hi Folks,

karan_1317 has found a bug whereby if your data extends beyond row 32767, '#Value' may be returned.

Here's the corrected version:
Code:
Option Explicit

Type RankInfo
    Offset As Long
    Percentage As Single
End Type

Dim mudRankData() As RankInfo
Dim mlBestMatchPtr As Long

Dim TopMatch         As Integer
Dim strCompare       As String


Function FuzzyPercent(ByVal String1 As String, _
                      ByVal String2 As String, _
                      Optional Algorithm As Integer = 3, _
                      Optional Normalised As Boolean = False) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim intLen1 As Integer, intLen2 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim sngScore As Single
Dim strWork As String

'-------------------------------------------------------
'-- If strings havent been normalised, normalise them --
'-------------------------------------------------------
If Normalised = False Then
    String1 = LCase$(Application.Trim(String1))
    String2 = LCase$(Application.Trim(String2))
End If

'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If String1 = String2 Then
    FuzzyPercent = 1
    Exit Function
End If

intLen1 = Len(String1)
intLen2 = Len(String2)

If intLen1 = 0 Or intLen2 = 0 Then
    FuzzyPercent = 0
    Exit Function
End If

'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
    FuzzyPercent = 0
    Exit Function
End If

intTotScore = 0                   'initialise total possible score
intScore = 0                      'initialise current score

'--------------------------------------------------------
'-- If Algorithm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (Algorithm And 1) <> 0 Then
    If intLen1 < intLen2 Then
        FuzzyAlg1 String1, String2, intScore, intTotScore
    Else
        FuzzyAlg1 String2, String1, intScore, intTotScore
    End If
End If

'-----------------------------------------------------------
'-- If Algorithm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (Algorithm And 2) <> 0 Then
    If intLen1 < intLen2 Then
        FuzzyAlg2 String1, String2, intScore, intTotScore
    Else
        FuzzyAlg2 String2, String1, intScore, intTotScore
    End If
End If

'------------------------------------------------------
'-- If Algorithm = 4,5,6,7, use Dan Ostander's code. --
'------------------------------------------------------
If (Algorithm And 4) <> 0 Then
    If intLen1 < intLen2 Then
        sngScore = FuzzyAlg4(String1, String1)
    Else
        sngScore = FuzzyAlg4(String2, String1)
    End If
    intScore = intScore + (sngScore * 100)
    intTotScore = intTotScore + 100
End If

FuzzyPercent = intScore / intTotScore

End Function

Private Sub FuzzyAlg1(ByVal String1 As String, _
                      ByVal String2 As String, _
                      ByRef Score As Integer, _
                      ByRef TotScore As Integer)
Dim intLen1 As Integer, intPos As Integer, intPtr As Integer, intStartPos As Integer

intLen1 = Len(String1)
TotScore = TotScore + intLen1              'update total possible score
intPos = 0
For intPtr = 1 To intLen1
    intStartPos = intPos + 1
    intPos = InStr(intStartPos, String2, Mid$(String1, intPtr, 1))
    If intPos > 0 Then
        If intPos > intStartPos + 3 Then     'No match if char is > 3 bytes away
            intPos = intStartPos
        Else
            Score = Score + 1          'Update current score
        End If
    Else
        intPos = intStartPos
    End If
Next intPtr
End Sub
Private Sub FuzzyAlg2(ByVal String1 As String, _
                        ByVal String2 As String, _
                        ByRef Score As Integer, _
                        ByRef TotScore As Integer)
Dim intCurLen As Integer, intLen1 As Integer, intTo As Integer, intPtr As Integer, intPos As Integer
Dim strWork As String

intLen1 = Len(String1)
For intCurLen = 1 To intLen1
    strWork = String2                          'Get a copy of String2
    intTo = intLen1 - intCurLen + 1
    TotScore = TotScore + Int(intLen1 / intCurLen)  'Update total possible score
    For intPtr = 1 To intTo Step intCurLen
        intPos = InStr(strWork, Mid$(String1, intPtr, intCurLen))
        If intPos > 0 Then
            Mid$(strWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
            Score = Score + 1     'Update current score
        End If
    Next intPtr
Next intCurLen

End Sub
Private Function FuzzyAlg4(strIn1 As String, strIn2 As String) As Single

Dim L1               As Integer
Dim In1Mask(1 To 24) As Long     'strIn1 is 24 characters max
Dim iCh              As Integer
Dim N                As Long
Dim strTry           As String
Dim strTest          As String

TopMatch = 0
L1 = Len(strIn1)
strTest = UCase(strIn1)
strCompare = UCase(strIn2)
For iCh = 1 To L1
    In1Mask(iCh) = 2 ^ iCh
Next iCh      'Loop thru all ordered combinations of characters in strIn1
For N = 2 ^ (L1 + 1) - 1 To 1 Step -1
    strTry = ""
    For iCh = 1 To L1
        If In1Mask(iCh) And N Then
            strTry = strTry & Mid(strTest, iCh, 1)
        End If
    Next iCh
    If Len(strTry) > TopMatch Then FuzzyAlg4Test strTry
Next N
FuzzyAlg4 = TopMatch / CSng(L1)
End Function
Sub FuzzyAlg4Test(strIn As String)

Dim l          As Integer
Dim strTry   As String
Dim iCh        As Integer

l = Len(strIn)
If l <= TopMatch Then Exit Sub
strTry = "*"
For iCh = 1 To l
    strTry = strTry & Mid(strIn, iCh, 1) & "*"
Next iCh
If strCompare Like strTry Then
    If l > TopMatch Then TopMatch = l
End If
End Sub

Function FuzzyVLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Rank As Integer = 1, _
                      Optional Algorithm As Integer = 3, _
                      Optional AdditionalCols As Integer = 0, _
                      Optional LookupColOffset As Integer = 0, _
                      Optional GroupColOffset As Integer = 0, _
                      Optional GroupValue As Variant = "") As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** column 1 of table specified by TableArray.                                 **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in column 1       **
'** is found.                                                                  **
'** For each entry in column 1 of the table, FuzzyPercent is called to match   **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the column number of the entry in TableArray required to be    **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the column entry indicated by IndexNum is     **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset row (starting at 1) is returned.   **
'**                 This value can be used directly in the 'Index' function.   **
'**                                                                            **
'** Algorithm can take one of the following values:                            **
'** Algorithm = 1:                                                             **
'**     This algorithm is best suited for matching mis-spellings.              **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algorithm = 2:                                                             **
'**     This algorithm is best suited for matching sentences, or               **
'**     'firstname lastname' compared with 'lastname firstname' combinations   **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algorithm = 3: Both Algorithms 1 and 2 are performed.                      **
'** Algorithm = 4: Dan Ostrander's algorithm                                   **
'**                                                                            **
'** The following parameters allow matching by group, and only those values    **
'** which are in the group specified will be considered for matching.          **                  **
'** GroupColOffset                                                             **
'** This parameter specifies the offset column which contains the group values **
'** To trigger group matching, this must be a non-zero integer                 **
'** GroupValue                                                                 **
'** This parameter specifies the Group to be considered for matching           **
'********************************************************************************
Dim bWanted As Boolean
Dim rCur As Range
Dim rSearchRange As Range

Dim sngMinPercent As Single

Dim lEndRow As Long

Dim vCurValue As Variant

'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------

LookupValue = LCase$(Application.Trim(LookupValue))

If IsMissing(NFPercent) Then
    sngMinPercent = 0.05
Else
    If (NFPercent <= 0) Or (NFPercent > 1) Then
        FuzzyVLookup = "*** 'NFPercent' must be a percentage > zero ***"
        Exit Function
    End If
    sngMinPercent = NFPercent
End If

If Rank < 1 Then
    FuzzyVLookup = "*** 'Rank' must be an integer > 0 ***"
    Exit Function
End If

ReDim mudRankData(1 To Rank)

lEndRow = TableArray.Rows.Count
If VarType(TableArray.Cells(lEndRow, 1).Value) = vbEmpty Then
    lEndRow = TableArray.Cells(lEndRow, 1).End(xlUp).Row
End If
Set rSearchRange = Range(TableArray.Cells(1, 1).Address, TableArray.Cells(lEndRow, 1).Address)

'---------------
'-- Main loop --
'---------------

If Len(GroupValue) <> 0 Then
    With rSearchRange.Offset(, GroupColOffset)
        For Each rCur In rSearchRange.Offset(, GroupColOffset)
            vCurValue = rCur.Value
            If VarType(vCurValue) = vbString Then
                bWanted = LCase$(CStr(vCurValue)) = LCase$(CStr(GroupValue))
            Else
                bWanted = Val(vCurValue) = Val(GroupValue)
            End If
            If bWanted Then
                FuzzyVlookupMain LookupValue:=LookupValue, _
                                TableArray:=rCur.Offset(, GroupColOffset * -1), _
                                IndexNum:=IndexNum, _
                                NFPercent:=NFPercent, _
                                Rank:=Rank, _
                                Algorithm:=Algorithm, _
                                AdditionalCols:=AdditionalCols, _
                                LookupColOffset:=LookupColOffset
            End If
        Next rCur
    End With
Else
    For Each rCur In rSearchRange
        FuzzyVlookupMain LookupValue:=LookupValue, _
                        TableArray:=rCur, _
                        IndexNum:=IndexNum, _
                        NFPercent:=NFPercent, _
                        Rank:=Rank, _
                        Algorithm:=Algorithm, _
                        AdditionalCols:=AdditionalCols, _
                        LookupColOffset:=LookupColOffset
    Next rCur
End If

If mudRankData(Rank).Percentage < sngMinPercent Then
    '--------------------------------------
    '-- Return '#N/A' if below NFPercent --
    '--------------------------------------
    FuzzyVLookup = CVErr(xlErrNA)
Else
    mlBestMatchPtr = mudRankData(Rank).Offset - TableArray.Cells(1, 1).Row + 1
    If IndexNum > 0 Then
        '-----------------------------------
        '-- Return column entry specified --
        '-----------------------------------
        FuzzyVLookup = TableArray.Cells(mlBestMatchPtr, IndexNum)
    Else
        '-----------------------
        '-- Return offset row --
        '-----------------------
        FuzzyVLookup = mlBestMatchPtr
    End If
End If
End Function
Private Sub FuzzyVlookupMain(ByVal LookupValue As String, _
                                    ByVal TableArray As Range, _
                                    ByVal IndexNum As Integer, _
                                    Optional NFPercent As Single = 0.05, _
                                    Optional Rank As Integer = 1, _
                                    Optional Algorithm As Integer = 3, _
                                    Optional AdditionalCols As Integer = 0, _
                                    Optional LookupColOffset As Integer = 0)
Dim I As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer
Dim strListString As String
Dim sngCurPercent As Single
Dim sngMinPercent As Single
Dim vCurValue As Variant

vCurValue = ""
For I = 0 To AdditionalCols
    vCurValue = vCurValue & CStr(TableArray.Offset(0, I + LookupColOffset).Value)
Next I
If VarType(vCurValue) = vbString Then
    strListString = LCase$(Application.Trim(vCurValue))
    
    '------------------------------------------------
    '-- Fuzzy match strings & get percentage match --
    '------------------------------------------------
    sngCurPercent = FuzzyPercent(String1:=LookupValue, _
                                 String2:=strListString, _
                                 Algorithm:=Algorithm, _
                                 Normalised:=True)
    
    If sngCurPercent >= sngMinPercent Then
        '---------------------------
        '-- Store in ranked array --
        '---------------------------
        For intRankPtr = 1 To Rank
            If sngCurPercent > mudRankData(intRankPtr).Percentage Then
                For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                    With mudRankData(intRankPtr1)
                        .Offset = mudRankData(intRankPtr1 - 1).Offset
                        .Percentage = mudRankData(intRankPtr1 - 1).Percentage
                    End With
                Next intRankPtr1
                With mudRankData(intRankPtr)
                    .Offset = TableArray.Row
                    .Percentage = sngCurPercent
                End With
                Exit Sub
            End If
        Next intRankPtr
    End If
    
End If
End Sub
Function FuzzyHLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Rank As Integer = 1, _
                      Optional Algorithm As Integer = 3) As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** row 1 of table specified by TableArray.                                    **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in row 1          **
'** is found.                                                                  **
'** For each entry in row 1 of the table, FuzzyPercent is called to match      **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the row number of the entry in TableArray required to be       **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the row entry indicated by IndexNum is        **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset col (starting at 0) is returned.   **
'**                 This value can be used directly in the 'OffSet' function.  **
'**                                                                            **
'** Algorithm can take one of the following values:                            **
'** Algorithm = 1:                                                             **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algorithm = 2:                                                             **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algorithm = 3: Both Algorithms 1 and 2 are performed.                      **
'********************************************************************************
Dim R As Range

Dim strListString As String
Dim strWork As String

Dim sngMinPercent As Single
Dim sngWork As Single
Dim sngCurPercent  As Single

Dim mlBestMatchPtr As Integer
Dim intPtr As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer

Dim iEndCol As Integer

Dim vCurValue As Variant
'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------
LookupValue = LCase$(Application.Trim(LookupValue))

If IsMissing(NFPercent) Then
    sngMinPercent = 0.05
Else
    If (NFPercent <= 0) Or (NFPercent > 1) Then
        FuzzyHLookup = "*** 'NFPercent' must be a percentage > zero ***"
        Exit Function
    End If
    sngMinPercent = NFPercent
End If

If Rank < 1 Then
    FuzzyHLookup = "*** 'Rank' must be an integer > 0 ***"
    Exit Function
End If

ReDim mudRankData(1 To Rank)
'**************************
iEndCol = TableArray.Columns.Count
If VarType(TableArray.Cells(1, iEndCol).Value) = vbEmpty Then
    iEndCol = TableArray.Cells(1, iEndCol).End(xlToLeft).Column
End If

'---------------
'-- Main loop --
'---------------
For Each R In Range(TableArray.Cells(1, 1), TableArray.Cells(1, iEndCol))
    vCurValue = R.Value
    If VarType(vCurValue) = vbString Then
        strListString = LCase$(Application.Trim(vCurValue))
        
        '------------------------------------------------
        '-- Fuzzy match strings & get percentage match --
        '------------------------------------------------
        sngCurPercent = FuzzyPercent(String1:=LookupValue, _
                                     String2:=strListString, _
                                     Algorithm:=Algorithm, _
                                     Normalised:=True)
        
        If sngCurPercent >= sngMinPercent Then
            '---------------------------
            '-- Store in ranked array --
            '---------------------------
            For intRankPtr = 1 To Rank
                If sngCurPercent > mudRankData(intRankPtr).Percentage Then
                    For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                        With mudRankData(intRankPtr1)
                            .Offset = mudRankData(intRankPtr1 - 1).Offset
                            .Percentage = mudRankData(intRankPtr1 - 1).Percentage
                        End With
                    Next intRankPtr1
                    With mudRankData(intRankPtr)
                        .Offset = R.Column
                        .Percentage = sngCurPercent
                    End With
                    Exit For
                End If
            Next intRankPtr
        End If
        
    End If
Next R

If mudRankData(Rank).Percentage < sngMinPercent Then
    '--------------------------------------
    '-- Return '#N/A' if below NFPercent --
    '--------------------------------------
    FuzzyHLookup = CVErr(xlErrNA)
Else
    mlBestMatchPtr = mudRankData(Rank).Offset - TableArray.Cells(1, 1).Column + 1
    If IndexNum > 0 Then
        '-----------------------------------
        '-- Return row entry specified --
        '-----------------------------------
        FuzzyHLookup = TableArray.Cells(IndexNum, mlBestMatchPtr)
    Else
        '-----------------------
        '-- Return offset col --
        '-----------------------
        FuzzyHLookup = mlBestMatchPtr
    End If
End If
End Function
 
Upvote 0
Hi,

The most efficient way would be to use a helper column, use FuzzyVLookup to return the row and use OFFSET to return the desired value. This would eliminate the need to call FuzzyVLookup twice. e.g.:
Excel Workbook
ABCDEFGH
1NameEmailSearchResult RowResult NameResult Email
2WillWill@hotmail.comJ I M3JimJim@hotmail.com
3JimJim@hotmail.com
4BobBob@hotmail.com
5DaveDave@hotmail.com
Sheet1
Excel 2003
Cell Formulas
RangeFormula
F2=FuzzyVLookup(E2,A:B,0)
G2=OFFSET($A$1,$F2-1,0)
H2=OFFSET($A$1,$F2-1,1)


Make sure you use the latest version indicated in my signature.

how do I modify it to return the row number?
 
Upvote 0

Forum statistics

Threads
1,223,914
Messages
6,175,351
Members
452,638
Latest member
Oluwabukunmi

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top