Fuzzy Matching - new version plus explanation

al_b_cnu

Well-known Member
Joined
Jul 18, 2003
Messages
4,546
It has been a while since I originally posted my Fuzzy matching UDF’s on the board, and several variants have appeared subsequently.

I thought it time to ‘put the record straight’ & post a definitive version which contains slightly more efficient code, and better matching algorithms, so here it is.

Firstly, I must state that the Fuzzy matching algorithms are very CPU hungry, and should be used sparingly. If for instance you require to lookup a match for a string which starts with, contains or ends with a specified value, this can be performed far more efficiently using the MATCH function:
Fuzzy Examples.xls
ABCDE
1Starts WithEndsContains
2BilljelenBill
3Mr Bill Jelen433
4Bill Jelen
5Joe Bloggs
6Fred Smith
MATCH Example


... Continued ...
 
Hi Alan,

I was thinking of doing both keyword and location.

I was thinking of putting around six to seven search fields. So I would have additional rows of info about each excel file name in the dataset.
Search HeadingMatchExcel File TitleKeywordsLocations%Match
KeywordsDesserts, Soda, cigarettesairlines.xlsxAirlines, Transportations Services, Taxis, BusesFinland, Greece28.57%
LocationsCanada, Iceland,Polandfruits.xlsxFruits, Vegetables, Health-based foodsEngland, South Africa28.57%
smokeba.xlsxSmoke based products, Ciggaratees, tobaccoPoland, Germany86.36%
codings.xlsxCoding Solutions, Oranges, ApplesIndia, Japan0.00%
insuranc.xlsxInsurance Coverages, Deductible, Limit of InsuranceCanada50.00%
desserts.xlsxDesserts, Ice Cream, CakesCanada, USA100.00%
pearsp.xlsxPears, Peaches, Orange ConesJapan, Italy0.00%
sodaca.xlsxSoda, Cakes, BurgersIceland, Portugal100.00%
videoga.xlsxVideo Games, PC Games, Virtial EquipmentSpain, Argentina0.00%
marines.xlsxMarines, Army, Strength based GoodsGreece, Italy0.00%

Hi Rishm,
The following assumptions have been made:
* The text "Search Heading" is present in a cell and the rows immediately below it contain the File Data headings required to be checked with the search criteria in the corresponding column to the right.
* The text "Excel File Title" is present in a cell and is the heading for the Excel files column.
* The text "%Match" is present in a cell on the same row as the "Excel File Title" text and is the heading for the results
* The Cells in the same row as "Excel File Title" and "%Match" will contain the headings matching the "Search Heading" entries, and will contain the Keywords / Locations criteria for the Excel file.
VBA Code:
Option Explicit

Const msSheetName As String = "Sheet1"                      '** Defines the worksheet
Dim msMatchCriteriaCell As String                           '** Defines the cell containing the match Keywords heading
Const msMatchCriteriaHeading As String = "Search Heading"   '** Heading for search criteria headings
Const msFileTitleHeading As String = "Excel File Title"     '** Heading for file column
Const msMatchHeading As String = "%Match"                   '** Heading for results
Const msngMinPercent As Single = 0.5                        '** Minimum % Match

Sub FuzzySearch()

Dim lSearchRowPtr As Long
Dim lDataHeadingColumn As Long
Dim lFileDataEndRow As Long
Dim lFileDataRow As Long
Dim lMatchPercentColumn As Long

Dim rDataHeadingRow As Range

Dim sFileAddress As String
Dim sPercentMatchHeadingAddress As String

Dim sCurHeading As String
Dim sCurCriteria As String
Dim sCurFileKeywords As String
Dim sUsedRangeEndAddress As String
Dim sngCurMatch As Single
Dim sngTotalScore As Single
Dim sngMaxScore As Single

Dim vaSearchHeadingCriteria As Variant
Dim vaData As Variant

Dim WS As Worksheet

Set WS = Sheets(msSheetName)

With WS.UsedRange
    '** Get all the data in the worksheet **
    sUsedRangeEndAddress = WS.Cells(.Rows.Count, .Columns.Count).Address
    vaData = WS.Range("A1", sUsedRangeEndAddress).Value
End With

'** Find cell containing File Criteria heading **
msMatchCriteriaCell = FindHeading(msMatchCriteriaHeading, WS.Range("A1:" & sUsedRangeEndAddress))
If msMatchCriteriaCell = "" Then
    Exit Sub
End If

'** Get match criteria info **
vaSearchHeadingCriteria = GetMatchCriteria()

'** Find cell containing File Title heading **
sFileAddress = FindHeading(msFileTitleHeading, WS.Range("A1:" & sUsedRangeEndAddress))
If sFileAddress = "" Then
    Exit Sub
End If

'** find cell containing '%Match' heading **
sPercentMatchHeadingAddress = FindHeading(msMatchHeading, WS.Range("A1:" & sUsedRangeEndAddress))
If sPercentMatchHeadingAddress = "" Then
    Exit Sub
End If
lMatchPercentColumn = WS.Range(sPercentMatchHeadingAddress).Column

Set rDataHeadingRow = WS.Rows(WS.Range(sFileAddress).Row)

lFileDataEndRow = WS.Cells(WS.Rows.Count, WS.Range(sFileAddress).Column).End(xlUp).Row

'** Loop thru file data entries **
For lFileDataRow = WS.Range(sFileAddress).Row + 1 To lFileDataEndRow
    sngMaxScore = 0
    sngTotalScore = 0
    
    '** Loop thru match criteria **
    For lSearchRowPtr = 2 To UBound(vaSearchHeadingCriteria, 1)
        
        '** Get next criteria heading **
        sCurHeading = Trim$(vaSearchHeadingCriteria(lSearchRowPtr, 1))
        If sCurHeading <> "" Then
            '** Get file data column containing the required criteria heading **
            lDataHeadingColumn = FindDataHeadingColumn(Heading:=sCurHeading, _
                                                       SearchRange:=rDataHeadingRow)
            If lDataHeadingColumn > 0 Then
                sngMaxScore = sngMaxScore + 1
                
                '** Get criteria keywords **
                sCurCriteria = NormaliseKeywords(CStr(vaSearchHeadingCriteria(lSearchRowPtr, 2)))
                
                '** Get matching File keywords **
                sCurFileKeywords = NormaliseKeywords(CStr(vaData(lFileDataRow, lDataHeadingColumn)))
                sngCurMatch = GetMatchPercent(String1:=sCurCriteria, _
                                                String2:=sCurFileKeywords)
                If sngCurMatch >= msngMinPercent Then
                    sngTotalScore = sngTotalScore + sngCurMatch
                End If
                
            End If
        End If
    Next lSearchRowPtr
    sngTotalScore = sngTotalScore / sngMaxScore
    WS.Cells(lFileDataRow, lMatchPercentColumn).Value = sngTotalScore
Next lFileDataRow
End Sub

Private Function FindCellHeading()

End Function

Private Function GetMatchPercent(ByVal String1 As String, ByVal String2 As String) As Single
Dim lPtr1 As Long
Dim lPtr2 As Long

Dim sngMatchPercent As Single
Dim sngCurMatchPercent As Single

Dim saString1() As String
Dim saString2() As String

saString1 = Split(String1, ",")
saString2 = Split(String2, ",")

sngMatchPercent = 0
For lPtr1 = 0 To UBound(saString1)
    If saString1(lPtr1) <> "" Then
        For lPtr2 = 0 To UBound(saString2)
            If saString2(lPtr2) <> "" Then
                sngCurMatchPercent = GetLevenshteinPercentMatch(String1:=saString1(lPtr1), _
                                                                String2:=saString2(lPtr2), _
                                                                Normalised:=True)
                If sngCurMatchPercent > sngMatchPercent Then
                    sngMatchPercent = sngCurMatchPercent
                End If
            End If
        Next
    End If
Next lPtr1
GetMatchPercent = sngMatchPercent
End Function
Private Function FindDataHeadingColumn(ByVal Heading As String, ByRef SearchRange As Range) As Long

Dim vColumn As Variant

FindDataHeadingColumn = 0
On Error Resume Next
vColumn = Application.Match(Heading, SearchRange, 0)
If Not IsError(vColumn) Then
    FindDataHeadingColumn = CLng(vColumn)
End If
On Error GoTo 0

End Function

Private Function GetMatchCriteria() As Variant
Dim lEndRow As Long
Dim lColumn As Long

Dim vaData As Variant

With Sheets(msSheetName)
    lColumn = .Range(msMatchCriteriaCell).Column
    lEndRow = .Cells(.Rows.Count, lColumn).End(xlUp).Row
    vaData = .Range(msMatchCriteriaCell, .Cells(lEndRow, lColumn + 1)).Value
End With
GetMatchCriteria = vaData

End Function

Function FindHeading(ByVal SearchHeading As String, ByRef SearchRange As Range, Optional MessageIfNF As Boolean = True) As String
    Dim Rng As Range
    
    FindHeading = ""
    If Trim(SearchHeading) <> "" Then
        With SearchRange
            Set Rng = .Find(What:=SearchHeading, _
                            After:=.Cells(.Cells.Count), _
                            LookIn:=xlValues, _
                            LookAt:=xlWhole, _
                            SearchOrder:=xlByRows, _
                            SearchDirection:=xlNext, _
                            MatchCase:=False)
            If Not Rng Is Nothing Then
                FindHeading = Rng.Resize(1, 1).Address
            End If
        End With
    End If
    
    If FindHeading = "" _
    And MessageIfNF = True Then
        MsgBox prompt:="Heading '" & SearchHeading & "' not found!", _
                Buttons:=vbCritical + vbOKOnly
    End If
    
End Function

Private Function NormaliseKeywords(ByVal Keywords As String) As String
Dim lPtr As Long

Dim saKeywords() As String
Dim sResult As String

sResult = LCase$(Trim$(Keywords))
If sResult <> "" Then
    saKeywords = Split(sResult, ",")
    For lPtr = 0 To UBound(saKeywords)
        saKeywords(lPtr) = Trim$(saKeywords(lPtr))
    Next lPtr
    sResult = Join(saKeywords, ",")
End If
NormaliseKeywords = sResult
End Function

Public Function GetLevenshteinPercentMatch(ByVal String1 As String, _
                                            ByVal String2 As String, _
                                            Optional Normalised As Boolean = True) As Single
Dim iLen As Integer
If Normalised = False Then
    String1 = UCase$(WorksheetFunction.Trim(String1))
    String2 = UCase$(WorksheetFunction.Trim(String2))
End If
iLen = WorksheetFunction.Max(Len(String1), Len(String2))
GetLevenshteinPercentMatch = (iLen - LevenshteinDistance(String1, String2)) / iLen
End Function

Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer
'********************************
'*** Compute Levenshtein Distance
'********************************

Dim d() As Integer ' matrix
Dim m As Integer ' length of t
Dim N As Integer ' length of s
Dim I As Integer ' iterates through s
Dim j As Integer ' iterates through t
Dim s_i As String ' ith character of s
Dim t_j As String ' jth character of t
Dim cost As Integer ' cost

  ' Step 1

  N = Len(s)
  m = Len(t)
  If N = 0 Then
    LevenshteinDistance = m
    Exit Function
  End If
  If m = 0 Then
    LevenshteinDistance = N
    Exit Function
  End If
  ReDim d(0 To N, 0 To m) As Integer

  ' Step 2

  For I = 0 To N
    d(I, 0) = I
  Next I

  For j = 0 To m
    d(0, j) = j
  Next j

  ' Step 3

  For I = 1 To N
    s_i = Mid$(s, I, 1)
    ' Step 4
    For j = 1 To m
      t_j = Mid$(t, j, 1)
      
      ' Step 5
      If s_i = t_j Then
        cost = 0
      Else
        cost = 1
      End If
      ' Step 6

      d(I, j) = WorksheetFunction.Min(d(I - 1, j) + 1, d(I, j - 1) + 1, d(I - 1, j - 1) + cost)
    Next j
  Next I

  ' Step 7
  LevenshteinDistance = d(N, m)
End Function
 
Upvote 0

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.
Please share the user friendly version of this macro as we dont have the exact data for this macro to try.
Hi Earthworm,
Not sure what you mean by "User friendly version" ...

I HAVE commented the vba code above, probably not enough though :/
Best wishes
Alan
 
Upvote 0
Search HeadingMatchExcel File TitleKeywordsLocations%Match
KeywordsDesserts, Soda, cigarettesairlines.xlsxAirlines, Transportations Services, Taxis, BusesFinland, Greece28.57%
LocationsCanada, Iceland,Polandfruits.xlsxFruits, Vegetables, Health-based foodsEngland, South Africa28.57%
smokeba.xlsxSmoke based products, Ciggaratees, tobaccoPoland, Germany86.36%
codings.xlsxCoding Solutions, Oranges, ApplesIndia, Japan0.00%
insuranc.xlsxInsurance Coverages, Deductible, Limit of InsuranceCanada50.00%
desserts.xlsxDesserts, Ice Cream, CakesCanada, USA100.00%
pearsp.xlsxPears, Peaches, Orange ConesJapan, Italy0.00%
sodaca.xlsxSoda, Cakes, BurgersIceland, Portugal100.00%
videoga.xlsxVideo Games, PC Games, Virtial EquipmentSpain, Argentina0.00%
marines.xlsxMarines, Army, Strength based GoodsGreece, Italy0.00%

Hi Rishm,
The following assumptions have been made:
* The text "Search Heading" is present in a cell and the rows immediately below it contain the File Data headings required to be checked with the search criteria in the corresponding column to the right.
* The text "Excel File Title" is present in a cell and is the heading for the Excel files column.
* The text "%Match" is present in a cell on the same row as the "Excel File Title" text and is the heading for the results
* The Cells in the same row as "Excel File Title" and "%Match" will contain the headings matching the "Search Heading" entries, and will contain the Keywords / Locations criteria for the Excel file.
VBA Code:
Option Explicit

Const msSheetName As String = "Sheet1"                      '** Defines the worksheet
Dim msMatchCriteriaCell As String                           '** Defines the cell containing the match Keywords heading
Const msMatchCriteriaHeading As String = "Search Heading"   '** Heading for search criteria headings
Const msFileTitleHeading As String = "Excel File Title"     '** Heading for file column
Const msMatchHeading As String = "%Match"                   '** Heading for results
Const msngMinPercent As Single = 0.5                        '** Minimum % Match

Sub FuzzySearch()

Dim lSearchRowPtr As Long
Dim lDataHeadingColumn As Long
Dim lFileDataEndRow As Long
Dim lFileDataRow As Long
Dim lMatchPercentColumn As Long

Dim rDataHeadingRow As Range

Dim sFileAddress As String
Dim sPercentMatchHeadingAddress As String

Dim sCurHeading As String
Dim sCurCriteria As String
Dim sCurFileKeywords As String
Dim sUsedRangeEndAddress As String
Dim sngCurMatch As Single
Dim sngTotalScore As Single
Dim sngMaxScore As Single

Dim vaSearchHeadingCriteria As Variant
Dim vaData As Variant

Dim WS As Worksheet

Set WS = Sheets(msSheetName)

With WS.UsedRange
    '** Get all the data in the worksheet **
    sUsedRangeEndAddress = WS.Cells(.Rows.Count, .Columns.Count).Address
    vaData = WS.Range("A1", sUsedRangeEndAddress).Value
End With

'** Find cell containing File Criteria heading **
msMatchCriteriaCell = FindHeading(msMatchCriteriaHeading, WS.Range("A1:" & sUsedRangeEndAddress))
If msMatchCriteriaCell = "" Then
    Exit Sub
End If

'** Get match criteria info **
vaSearchHeadingCriteria = GetMatchCriteria()

'** Find cell containing File Title heading **
sFileAddress = FindHeading(msFileTitleHeading, WS.Range("A1:" & sUsedRangeEndAddress))
If sFileAddress = "" Then
    Exit Sub
End If

'** find cell containing '%Match' heading **
sPercentMatchHeadingAddress = FindHeading(msMatchHeading, WS.Range("A1:" & sUsedRangeEndAddress))
If sPercentMatchHeadingAddress = "" Then
    Exit Sub
End If
lMatchPercentColumn = WS.Range(sPercentMatchHeadingAddress).Column

Set rDataHeadingRow = WS.Rows(WS.Range(sFileAddress).Row)

lFileDataEndRow = WS.Cells(WS.Rows.Count, WS.Range(sFileAddress).Column).End(xlUp).Row

'** Loop thru file data entries **
For lFileDataRow = WS.Range(sFileAddress).Row + 1 To lFileDataEndRow
    sngMaxScore = 0
    sngTotalScore = 0
  
    '** Loop thru match criteria **
    For lSearchRowPtr = 2 To UBound(vaSearchHeadingCriteria, 1)
      
        '** Get next criteria heading **
        sCurHeading = Trim$(vaSearchHeadingCriteria(lSearchRowPtr, 1))
        If sCurHeading <> "" Then
            '** Get file data column containing the required criteria heading **
            lDataHeadingColumn = FindDataHeadingColumn(Heading:=sCurHeading, _
                                                       SearchRange:=rDataHeadingRow)
            If lDataHeadingColumn > 0 Then
                sngMaxScore = sngMaxScore + 1
              
                '** Get criteria keywords **
                sCurCriteria = NormaliseKeywords(CStr(vaSearchHeadingCriteria(lSearchRowPtr, 2)))
              
                '** Get matching File keywords **
                sCurFileKeywords = NormaliseKeywords(CStr(vaData(lFileDataRow, lDataHeadingColumn)))
                sngCurMatch = GetMatchPercent(String1:=sCurCriteria, _
                                                String2:=sCurFileKeywords)
                If sngCurMatch >= msngMinPercent Then
                    sngTotalScore = sngTotalScore + sngCurMatch
                End If
              
            End If
        End If
    Next lSearchRowPtr
    sngTotalScore = sngTotalScore / sngMaxScore
    WS.Cells(lFileDataRow, lMatchPercentColumn).Value = sngTotalScore
Next lFileDataRow
End Sub

Private Function FindCellHeading()

End Function

Private Function GetMatchPercent(ByVal String1 As String, ByVal String2 As String) As Single
Dim lPtr1 As Long
Dim lPtr2 As Long

Dim sngMatchPercent As Single
Dim sngCurMatchPercent As Single

Dim saString1() As String
Dim saString2() As String

saString1 = Split(String1, ",")
saString2 = Split(String2, ",")

sngMatchPercent = 0
For lPtr1 = 0 To UBound(saString1)
    If saString1(lPtr1) <> "" Then
        For lPtr2 = 0 To UBound(saString2)
            If saString2(lPtr2) <> "" Then
                sngCurMatchPercent = GetLevenshteinPercentMatch(String1:=saString1(lPtr1), _
                                                                String2:=saString2(lPtr2), _
                                                                Normalised:=True)
                If sngCurMatchPercent > sngMatchPercent Then
                    sngMatchPercent = sngCurMatchPercent
                End If
            End If
        Next
    End If
Next lPtr1
GetMatchPercent = sngMatchPercent
End Function
Private Function FindDataHeadingColumn(ByVal Heading As String, ByRef SearchRange As Range) As Long

Dim vColumn As Variant

FindDataHeadingColumn = 0
On Error Resume Next
vColumn = Application.Match(Heading, SearchRange, 0)
If Not IsError(vColumn) Then
    FindDataHeadingColumn = CLng(vColumn)
End If
On Error GoTo 0

End Function

Private Function GetMatchCriteria() As Variant
Dim lEndRow As Long
Dim lColumn As Long

Dim vaData As Variant

With Sheets(msSheetName)
    lColumn = .Range(msMatchCriteriaCell).Column
    lEndRow = .Cells(.Rows.Count, lColumn).End(xlUp).Row
    vaData = .Range(msMatchCriteriaCell, .Cells(lEndRow, lColumn + 1)).Value
End With
GetMatchCriteria = vaData

End Function

Function FindHeading(ByVal SearchHeading As String, ByRef SearchRange As Range, Optional MessageIfNF As Boolean = True) As String
    Dim Rng As Range
  
    FindHeading = ""
    If Trim(SearchHeading) <> "" Then
        With SearchRange
            Set Rng = .Find(What:=SearchHeading, _
                            After:=.Cells(.Cells.Count), _
                            LookIn:=xlValues, _
                            LookAt:=xlWhole, _
                            SearchOrder:=xlByRows, _
                            SearchDirection:=xlNext, _
                            MatchCase:=False)
            If Not Rng Is Nothing Then
                FindHeading = Rng.Resize(1, 1).Address
            End If
        End With
    End If
  
    If FindHeading = "" _
    And MessageIfNF = True Then
        MsgBox prompt:="Heading '" & SearchHeading & "' not found!", _
                Buttons:=vbCritical + vbOKOnly
    End If
  
End Function

Private Function NormaliseKeywords(ByVal Keywords As String) As String
Dim lPtr As Long

Dim saKeywords() As String
Dim sResult As String

sResult = LCase$(Trim$(Keywords))
If sResult <> "" Then
    saKeywords = Split(sResult, ",")
    For lPtr = 0 To UBound(saKeywords)
        saKeywords(lPtr) = Trim$(saKeywords(lPtr))
    Next lPtr
    sResult = Join(saKeywords, ",")
End If
NormaliseKeywords = sResult
End Function

Public Function GetLevenshteinPercentMatch(ByVal String1 As String, _
                                            ByVal String2 As String, _
                                            Optional Normalised As Boolean = True) As Single
Dim iLen As Integer
If Normalised = False Then
    String1 = UCase$(WorksheetFunction.Trim(String1))
    String2 = UCase$(WorksheetFunction.Trim(String2))
End If
iLen = WorksheetFunction.Max(Len(String1), Len(String2))
GetLevenshteinPercentMatch = (iLen - LevenshteinDistance(String1, String2)) / iLen
End Function

Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer
'********************************
'*** Compute Levenshtein Distance
'********************************

Dim d() As Integer ' matrix
Dim m As Integer ' length of t
Dim N As Integer ' length of s
Dim I As Integer ' iterates through s
Dim j As Integer ' iterates through t
Dim s_i As String ' ith character of s
Dim t_j As String ' jth character of t
Dim cost As Integer ' cost

  ' Step 1

  N = Len(s)
  m = Len(t)
  If N = 0 Then
    LevenshteinDistance = m
    Exit Function
  End If
  If m = 0 Then
    LevenshteinDistance = N
    Exit Function
  End If
  ReDim d(0 To N, 0 To m) As Integer

  ' Step 2

  For I = 0 To N
    d(I, 0) = I
  Next I

  For j = 0 To m
    d(0, j) = j
  Next j

  ' Step 3

  For I = 1 To N
    s_i = Mid$(s, I, 1)
    ' Step 4
    For j = 1 To m
      t_j = Mid$(t, j, 1)
    
      ' Step 5
      If s_i = t_j Then
        cost = 0
      Else
        cost = 1
      End If
      ' Step 6

      d(I, j) = WorksheetFunction.Min(d(I - 1, j) + 1, d(I, j - 1) + 1, d(I - 1, j - 1) + cost)
    Next j
  Next I

  ' Step 7
  LevenshteinDistance = d(N, m)
End Function
Hi Alan,

Thank you so much. I really appreciate it. That's what I was looking for in your screenshot. Hopefully, I can reach this level of coding in VBA! Thanks again.

Regards,
Rish
 
Upvote 0
Hi Earthworm,
Not sure what you mean by "User friendly version" ...

I HAVE commented the vba code above, probably not enough though :/
Best wishes
Alan
The Location of cell are confusing

suppose I have a list in column A and i need to match with Column C and Compare column A & B
 
Upvote 0
The Location of cell are confusing

suppose I have a list in column A and i need to match with Column C and Compare column A & B
Hi Earthworm.
I rewrote the code in my last post above, maybe that's a little easier as it looks for headings not named ranges.
Hth
Alan
 
Upvote 0
Search HeadingMatchExcel File TitleKeywordsLocations%Match
KeywordsDesserts, Soda, cigarettesairlines.xlsxAirlines, Transportations Services, Taxis, BusesFinland, Greece28.57%
LocationsCanada, Iceland,Polandfruits.xlsxFruits, Vegetables, Health-based foodsEngland, South Africa28.57%
smokeba.xlsxSmoke based products, Ciggaratees, tobaccoPoland, Germany86.36%
codings.xlsxCoding Solutions, Oranges, ApplesIndia, Japan0.00%
insuranc.xlsxInsurance Coverages, Deductible, Limit of InsuranceCanada50.00%
desserts.xlsxDesserts, Ice Cream, CakesCanada, USA100.00%
pearsp.xlsxPears, Peaches, Orange ConesJapan, Italy0.00%
sodaca.xlsxSoda, Cakes, BurgersIceland, Portugal100.00%
videoga.xlsxVideo Games, PC Games, Virtial EquipmentSpain, Argentina0.00%
marines.xlsxMarines, Army, Strength based GoodsGreece, Italy0.00%

Hi Rishm,
The following assumptions have been made:
* The text "Search Heading" is present in a cell and the rows immediately below it contain the File Data headings required to be checked with the search criteria in the corresponding column to the right.
* The text "Excel File Title" is present in a cell and is the heading for the Excel files column.
* The text "%Match" is present in a cell on the same row as the "Excel File Title" text and is the heading for the results
* The Cells in the same row as "Excel File Title" and "%Match" will contain the headings matching the "Search Heading" entries, and will contain the Keywords / Locations criteria for the Excel file.
VBA Code:
Option Explicit

Const msSheetName As String = "Sheet1"                      '** Defines the worksheet
Dim msMatchCriteriaCell As String                           '** Defines the cell containing the match Keywords heading
Const msMatchCriteriaHeading As String = "Search Heading"   '** Heading for search criteria headings
Const msFileTitleHeading As String = "Excel File Title"     '** Heading for file column
Const msMatchHeading As String = "%Match"                   '** Heading for results
Const msngMinPercent As Single = 0.5                        '** Minimum % Match

Sub FuzzySearch()

Dim lSearchRowPtr As Long
Dim lDataHeadingColumn As Long
Dim lFileDataEndRow As Long
Dim lFileDataRow As Long
Dim lMatchPercentColumn As Long

Dim rDataHeadingRow As Range

Dim sFileAddress As String
Dim sPercentMatchHeadingAddress As String

Dim sCurHeading As String
Dim sCurCriteria As String
Dim sCurFileKeywords As String
Dim sUsedRangeEndAddress As String
Dim sngCurMatch As Single
Dim sngTotalScore As Single
Dim sngMaxScore As Single

Dim vaSearchHeadingCriteria As Variant
Dim vaData As Variant

Dim WS As Worksheet

Set WS = Sheets(msSheetName)

With WS.UsedRange
    '** Get all the data in the worksheet **
    sUsedRangeEndAddress = WS.Cells(.Rows.Count, .Columns.Count).Address
    vaData = WS.Range("A1", sUsedRangeEndAddress).Value
End With

'** Find cell containing File Criteria heading **
msMatchCriteriaCell = FindHeading(msMatchCriteriaHeading, WS.Range("A1:" & sUsedRangeEndAddress))
If msMatchCriteriaCell = "" Then
    Exit Sub
End If

'** Get match criteria info **
vaSearchHeadingCriteria = GetMatchCriteria()

'** Find cell containing File Title heading **
sFileAddress = FindHeading(msFileTitleHeading, WS.Range("A1:" & sUsedRangeEndAddress))
If sFileAddress = "" Then
    Exit Sub
End If

'** find cell containing '%Match' heading **
sPercentMatchHeadingAddress = FindHeading(msMatchHeading, WS.Range("A1:" & sUsedRangeEndAddress))
If sPercentMatchHeadingAddress = "" Then
    Exit Sub
End If
lMatchPercentColumn = WS.Range(sPercentMatchHeadingAddress).Column

Set rDataHeadingRow = WS.Rows(WS.Range(sFileAddress).Row)

lFileDataEndRow = WS.Cells(WS.Rows.Count, WS.Range(sFileAddress).Column).End(xlUp).Row

'** Loop thru file data entries **
For lFileDataRow = WS.Range(sFileAddress).Row + 1 To lFileDataEndRow
    sngMaxScore = 0
    sngTotalScore = 0
   
    '** Loop thru match criteria **
    For lSearchRowPtr = 2 To UBound(vaSearchHeadingCriteria, 1)
       
        '** Get next criteria heading **
        sCurHeading = Trim$(vaSearchHeadingCriteria(lSearchRowPtr, 1))
        If sCurHeading <> "" Then
            '** Get file data column containing the required criteria heading **
            lDataHeadingColumn = FindDataHeadingColumn(Heading:=sCurHeading, _
                                                       SearchRange:=rDataHeadingRow)
            If lDataHeadingColumn > 0 Then
                sngMaxScore = sngMaxScore + 1
               
                '** Get criteria keywords **
                sCurCriteria = NormaliseKeywords(CStr(vaSearchHeadingCriteria(lSearchRowPtr, 2)))
               
                '** Get matching File keywords **
                sCurFileKeywords = NormaliseKeywords(CStr(vaData(lFileDataRow, lDataHeadingColumn)))
                sngCurMatch = GetMatchPercent(String1:=sCurCriteria, _
                                                String2:=sCurFileKeywords)
                If sngCurMatch >= msngMinPercent Then
                    sngTotalScore = sngTotalScore + sngCurMatch
                End If
               
            End If
        End If
    Next lSearchRowPtr
    sngTotalScore = sngTotalScore / sngMaxScore
    WS.Cells(lFileDataRow, lMatchPercentColumn).Value = sngTotalScore
Next lFileDataRow
End Sub

Private Function FindCellHeading()

End Function

Private Function GetMatchPercent(ByVal String1 As String, ByVal String2 As String) As Single
Dim lPtr1 As Long
Dim lPtr2 As Long

Dim sngMatchPercent As Single
Dim sngCurMatchPercent As Single

Dim saString1() As String
Dim saString2() As String

saString1 = Split(String1, ",")
saString2 = Split(String2, ",")

sngMatchPercent = 0
For lPtr1 = 0 To UBound(saString1)
    If saString1(lPtr1) <> "" Then
        For lPtr2 = 0 To UBound(saString2)
            If saString2(lPtr2) <> "" Then
                sngCurMatchPercent = GetLevenshteinPercentMatch(String1:=saString1(lPtr1), _
                                                                String2:=saString2(lPtr2), _
                                                                Normalised:=True)
                If sngCurMatchPercent > sngMatchPercent Then
                    sngMatchPercent = sngCurMatchPercent
                End If
            End If
        Next
    End If
Next lPtr1
GetMatchPercent = sngMatchPercent
End Function
Private Function FindDataHeadingColumn(ByVal Heading As String, ByRef SearchRange As Range) As Long

Dim vColumn As Variant

FindDataHeadingColumn = 0
On Error Resume Next
vColumn = Application.Match(Heading, SearchRange, 0)
If Not IsError(vColumn) Then
    FindDataHeadingColumn = CLng(vColumn)
End If
On Error GoTo 0

End Function

Private Function GetMatchCriteria() As Variant
Dim lEndRow As Long
Dim lColumn As Long

Dim vaData As Variant

With Sheets(msSheetName)
    lColumn = .Range(msMatchCriteriaCell).Column
    lEndRow = .Cells(.Rows.Count, lColumn).End(xlUp).Row
    vaData = .Range(msMatchCriteriaCell, .Cells(lEndRow, lColumn + 1)).Value
End With
GetMatchCriteria = vaData

End Function

Function FindHeading(ByVal SearchHeading As String, ByRef SearchRange As Range, Optional MessageIfNF As Boolean = True) As String
    Dim Rng As Range
   
    FindHeading = ""
    If Trim(SearchHeading) <> "" Then
        With SearchRange
            Set Rng = .Find(What:=SearchHeading, _
                            After:=.Cells(.Cells.Count), _
                            LookIn:=xlValues, _
                            LookAt:=xlWhole, _
                            SearchOrder:=xlByRows, _
                            SearchDirection:=xlNext, _
                            MatchCase:=False)
            If Not Rng Is Nothing Then
                FindHeading = Rng.Resize(1, 1).Address
            End If
        End With
    End If
   
    If FindHeading = "" _
    And MessageIfNF = True Then
        MsgBox prompt:="Heading '" & SearchHeading & "' not found!", _
                Buttons:=vbCritical + vbOKOnly
    End If
   
End Function

Private Function NormaliseKeywords(ByVal Keywords As String) As String
Dim lPtr As Long

Dim saKeywords() As String
Dim sResult As String

sResult = LCase$(Trim$(Keywords))
If sResult <> "" Then
    saKeywords = Split(sResult, ",")
    For lPtr = 0 To UBound(saKeywords)
        saKeywords(lPtr) = Trim$(saKeywords(lPtr))
    Next lPtr
    sResult = Join(saKeywords, ",")
End If
NormaliseKeywords = sResult
End Function

Public Function GetLevenshteinPercentMatch(ByVal String1 As String, _
                                            ByVal String2 As String, _
                                            Optional Normalised As Boolean = True) As Single
Dim iLen As Integer
If Normalised = False Then
    String1 = UCase$(WorksheetFunction.Trim(String1))
    String2 = UCase$(WorksheetFunction.Trim(String2))
End If
iLen = WorksheetFunction.Max(Len(String1), Len(String2))
GetLevenshteinPercentMatch = (iLen - LevenshteinDistance(String1, String2)) / iLen
End Function

Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer
'********************************
'*** Compute Levenshtein Distance
'********************************

Dim d() As Integer ' matrix
Dim m As Integer ' length of t
Dim N As Integer ' length of s
Dim I As Integer ' iterates through s
Dim j As Integer ' iterates through t
Dim s_i As String ' ith character of s
Dim t_j As String ' jth character of t
Dim cost As Integer ' cost

  ' Step 1

  N = Len(s)
  m = Len(t)
  If N = 0 Then
    LevenshteinDistance = m
    Exit Function
  End If
  If m = 0 Then
    LevenshteinDistance = N
    Exit Function
  End If
  ReDim d(0 To N, 0 To m) As Integer

  ' Step 2

  For I = 0 To N
    d(I, 0) = I
  Next I

  For j = 0 To m
    d(0, j) = j
  Next j

  ' Step 3

  For I = 1 To N
    s_i = Mid$(s, I, 1)
    ' Step 4
    For j = 1 To m
      t_j = Mid$(t, j, 1)
     
      ' Step 5
      If s_i = t_j Then
        cost = 0
      Else
        cost = 1
      End If
      ' Step 6

      d(I, j) = WorksheetFunction.Min(d(I - 1, j) + 1, d(I, j - 1) + 1, d(I - 1, j - 1) + cost)
    Next j
  Next I

  ' Step 7
  LevenshteinDistance = d(N, m)
End Function
Hi Alan,

I just have a question about the assumptions. If my file data had other headings, besides keywords and location, would this code work for other file data headings like maybe if I had "Company Type" or "Year Founded".

I just asked since the last assumption said: "will contain the Keywords / Locations criteria for the Excel file."

Regards,
Rish
 
Upvote 0
Hi Alan,

I just have a question about the assumptions. If my file data had other headings, besides keywords and location, would this code work for other file data headings like maybe if I had "Company Type" or "Year Founded".

I just asked since the last assumption said: "will contain the Keywords / Locations criteria for the Excel file."

Regards,
Rish
Search HeadingMatchExcel File TitleKeywordsCompany TypeYear FoundedLocations%Match
KeywordsDesserts, Soda, cigarettesairlines.xlsxAirlines, Transportations Services, Taxis, BusesAlpha2005Finland, Greece26.79%
LocationsCanada, Iceland,Polandfruits.xlsxFruits, Vegetables, Health-based foodsBeta2001England, South Africa45.54%
Company TypeEta, Gammasmokeba.xlsxSmoke based products, Ciggaratees, tobaccoGamma2010Poland, Germany93.18%
Year founded2020,2010codings.xlsxCoding Solutions, Oranges, ApplesDelta2018India, Japan33.75%
insuranc.xlsxInsurance Coverages, Deductible, Limit of InsuranceEpsilon2018Canada43.75%
desserts.xlsxDesserts, Ice Cream, CakesZeta2006Canada, USA81.25%
pearsp.xlsxPears, Peaches, Orange ConesEta2020Japan, Italy50.00%
sodaca.xlsxSoda, Cakes, BurgersTheta2020Iceland, Portugal90.00%
videoga.xlsxVideo Games, PC Games, Virtial EquipmentIota2013Spain, Argentina31.25%
marines.xlsxMarines, Army, Strength based GoodsKappa2005Greece, Italy12.50%

Hi Rish,
just add the headings under the "Search Headings" column along with their corresponding criteria.
Best wishes
Alan
 
Upvote 0
Search HeadingMatchExcel File TitleKeywordsCompany TypeYear FoundedLocations%Match
KeywordsDesserts, Soda, cigarettesairlines.xlsxAirlines, Transportations Services, Taxis, BusesAlpha2005Finland, Greece26.79%
LocationsCanada, Iceland,Polandfruits.xlsxFruits, Vegetables, Health-based foodsBeta2001England, South Africa45.54%
Company TypeEta, Gammasmokeba.xlsxSmoke based products, Ciggaratees, tobaccoGamma2010Poland, Germany93.18%
Year founded2020,2010codings.xlsxCoding Solutions, Oranges, ApplesDelta2018India, Japan33.75%
insuranc.xlsxInsurance Coverages, Deductible, Limit of InsuranceEpsilon2018Canada43.75%
desserts.xlsxDesserts, Ice Cream, CakesZeta2006Canada, USA81.25%
pearsp.xlsxPears, Peaches, Orange ConesEta2020Japan, Italy50.00%
sodaca.xlsxSoda, Cakes, BurgersTheta2020Iceland, Portugal90.00%
videoga.xlsxVideo Games, PC Games, Virtial EquipmentIota2013Spain, Argentina31.25%
marines.xlsxMarines, Army, Strength based GoodsKappa2005Greece, Italy12.50%

Hi Rish,
just add the headings under the "Search Headings" column along with their corresponding criteria.
Best wishes
Alan
Thank you Alan for clarifying. Makes sense!
 
Upvote 0
Hi Earthworm.
I rewrote the code in my last post above, maybe that's a little easier as it looks for headings not named ranges.
Hth
Alan
I tried to use the code but its not working . please refer attached image.
 

Attachments

  • Test.jpg
    Test.jpg
    43.9 KB · Views: 18
Upvote 0
I tried to use the code but its not working . please refer attached image.
Hi Earthworm.
I'm not at all surprised that the code doesn't work using your data. To explain:
Regarding these constants declared at the start of the code:
VBA Code:
Const msSheetName As String = "Sheet1"                      '** Defines the worksheet
Dim msMatchCriteriaCell As String                           '** Defines the cell containing the match Keywords heading
Const msMatchCriteriaHeading As String = "Search Heading"   '** Heading for search criteria headings
Const msFileTitleHeading As String = "Excel File Title"     '** Heading for file column
Const msMatchHeading As String = "%Match"                   '** Heading for results
Const msngMinPercent As Single = 0.5                        '** Minimum % Match
msSheetName defines the worksheet name that the code will perform the code on
msMatchCriteriaHeading defines the heading of the column which contains the search criteria headings, The column immediately to the right of this column contains the search criteria
msFileTitleHeading defines the heading at the start of the data and, in Rishm's case, defines the column heading for the filenames
msMatchHeading defines the heading to contain the %age match results (written to by the code)
msngMinPercent defines the minimum %age match to deem the entry a match
The code performs the following actions:
1) Find the criteria columns as defined by the heading specified by msMatchCriteriaHeading
2) For each entry in this column perform the criteria matching as defined by column+1 against the specified heading
3) Return the %age match in the column as specified by msMatchHeading
So ....
In the example I posted for Rishm the code will search for and match against the data in headings "Keywords", Locations", "Company Type" and "Year Founded" and return the %age match in the column headed "%Match"
Hope this helps

Alan
 
Upvote 0

Forum statistics

Threads
1,224,847
Messages
6,181,318
Members
453,032
Latest member
Pauh

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top