vLok a like?

fredrerik84

Active Member
Joined
Feb 26, 2017
Messages
383
Ive started a gigantic task where Im gonna try to convert soccer team names scraped from several different sources. problem is that many team names have small variation in how they are spelled from site to site. here is a sample to show what I'm struggling with:

POleksandriaOleksandriaOleksandria FCOleksandriaOleksandria
Crvena ZvezdaCrvena zvezdaCrvena ZvezdaCrvena ZvezdaCrvena Zvezda
FS Metta/LuFS METTA LUFS Metta-LuFS METTA LU
Chongqing Dangdai Lifan FCChongqing LifanChongqing Lifan FCChongqing LifanChongqing Lifan

<tbody>
</tbody>

Normally at least 2-3 sites have the same spelling but of course this will different from team to team. Up intill now ive been using a vlookup like this when I fetch the team:

Code:
str = Application.VLookup(str, sheet.Range("AC" & 2 & ":AH" & LRowData), 5, False)

And being optimistic I started building this list - but the I started to think "its gotta be a better way" as building this data sheet is a gigantic task

Then around a week ago or so I stumbled upon this code written a few years ago (credit to another forum)

Code:
Function VLookLike(txt As String, rng As Range) As String
    Dim temp As String, e, n As Long, a()
    Static RegX As Object
    If RegX Is Nothing Then
        Set RegX = CreateObject("VBScript.RegExp")
        With RegX
            .Global = True
            .IgnoreCase = True
            .Pattern = "(\S+).*" & Chr(2) & ".*\1"
        End With
    End If
    With RegX
        For Each e In rng.Value
            If UCase$(e) = UCase(txt) Then
                VLookLike = e
                Exit For
            End If
            temp = Join$(Array(e, txt), Chr(2))
            If .test(temp) Then
                n = n + 1
                ReDim Preserve a(1 To 2, 1 To n)
                a(2, n) = e
                Do While .test(temp)
                    a(1, n) = a(1, n) + Len(.Execute(temp)(0).submatches(0))
                    temp = Replace(temp, .Execute(temp)(0).submatches(0), "")
                Loop
            End If
        Next
    End With
    If (VLookLike = "") * (n > 0) Then
        With Application
            VLookLike = .HLookup(.Max(.Index(a, 1, 0)), a, 2, False)
        End With
    End If
End Function

This code is excellent for my problem (although its not perfect) - this will look for Lookalikes and I've started to test this with a test script like this:

Code:
Sub test()
Dim ateam As String, str As String, Search As String, Smatch As String, ateam2 As String
Dim i As Long, Row As Long, Col As Long, cell As Range, ii As Long
Dim sheet As Worksheet
Dim sim As Double
Dim found As Integer
Set sheet = ActiveWorkbook.Sheets("Sheet1")

For i = 2 To 50
   ateam = Cells(i, "I")
   ateam2 = VLookLike(ateam, sheet.Range("H" & 2 & ":H" & 1206))
   Cells(i, "J").Value = ateam2
   On Error Resume Next
   Row = Application.WorksheetFunction.Match(ateam2, Range("H1:H2000"), 0)
   Search = Application.WorksheetFunction.Match(Cells(i, "I"), Range("B" & Row & ":H" & Row), 0)
   sim = Similarity(ateam, ateam2)
   If Search = "" Then
      Cells(i, "K").Value = ateam
   Else
      Cells(i, "K").Value = ateam2
   End If
   Cells(i, "L").Value = sim
   Call test2(ateam, ateam2, i)
   Search = ""
   Row = ""
Next i

End Sub

This code basically runs this vLookLike code then vLookup to check if the lookalike suggestion is a valid suggestion or if its way of. I have also included in my code something that calculates the percent match between the suggestion and the actual name that is imported. so far in my testing ive figgured that a 75% match is enough to use the VlookLike suggestion.

However so far my data sheet has around 1000 rows or so but for all of this to work I guess I will need at least 10k rows , problem is this code here:

Code:
    With RegX
        For Each e In rng.Value
            If UCase$(e) = UCase(txt) Then
                VLookLike = e
                Exit For
            End If
            temp = Join$(Array(e, txt), Chr(2))
            If .test(temp) Then
                n = n + 1
                ReDim Preserve a(1 To 2, 1 To n)
                a(2, n) = e
                Do While .test(temp)
                    a(1, n) = a(1, n) + Len(.Execute(temp)(0).submatches(0))
                    temp = Replace(temp, .Execute(temp)(0).submatches(0), "")
                Loop
            End If
        Next
    End With

its unfortunately to slow for 1000 rows , and 10k will be real slow. I was really hoping that someone could help me rewrite this part so it runs faster ?

(also I have posted a similar thread here:

Close match ?

but in the end there was no help to be received there)

so I try my luck at this forum and I really hope that someone could have a look at this , any help at all is really appreciated. Also this became a real long post sorry about that but its kinda a complicated problem
 
Last edited:

Excel Facts

When did Power Query debut in Excel?
Although it was an add-in in Excel 2010 & Excel 2013, Power Query became a part of Excel in 2016, in Data, Get & Transform Data.
you might consider a simple vlookup table

Excel Workbook
AB
1Chongqing Dangdai Lifan FCCHONGQING DANGDAI LIFAN FC
2Chongqing LifanCHONGQING LIFAN
3Chongqing LifanCHONGQING LIFAN
4Chongqing LifanCHONGQING LIFAN
5Chongqing Lifan FCCHONGQING LIFAN FC
6Crvena ZvezdaCRVENA ZVEZDA
7Crvena zvezdaCRVENA ZVEZDA
8Crvena ZvezdaCRVENA ZVEZDA
9Crvena ZvezdaCRVENA ZVEZDA
10Crvena ZvezdaCRVENA ZVEZDA
11FS Metta/LuFS METTA LU
12FS METTA LUFS METTA LU
13FS Metta-LuFS METTA LU
14FS METTA LUFS METTA LU
15OleksandriaOLEKSANDRIA
16OleksandriaOLEKSANDRIA
17OleksandriaOLEKSANDRIA
18Oleksandria FCOLEKSANDRIA FC
19POleksandriaPOLEKSANDRIA
Sheet2
 
Upvote 0
Hi thanks for your reply.

"you might consider a simple vlookup table"

-actually this is not a bad idea at all, not sure why I didn't do this organizing 5x 1000+ rows are painful LOL. About your second table i'm not really sure what that does ? could it be something like this: (I didnt include it in first post)

Code:
Function soccer(ByVal sTitle As String) As String
    
    sTitle = Replace(sTitle, "FC ", "")
    sTitle = Replace(sTitle, "NK ", "")
    sTitle = Replace(sTitle, "SV ", "")
    sTitle = Replace(sTitle, "FSV ", "")
    sTitle = Replace(sTitle, "(Univ)", "University")
    sTitle = Replace(sTitle, " (Wom)", " (w)")
    sTitle = Replace(sTitle, " FK", "")
    sTitle = Replace(sTitle, " FC", "")
    sTitle = Replace(sTitle, "FC ", "")
    sTitle = Replace(sTitle, "CD ", "")
    sTitle = Replace(sTitle, " AFC", "")
    sTitle = Replace(sTitle, "FK ", "")
    sTitle = Replace(sTitle, "Youth ", "U21")
    sTitle = Replace(sTitle, "(Reserves)", "(Res)")
    sTitle = Replace(sTitle, " DFF", "")
    sTitle = Replace(sTitle, "TSG Hoffenheim", "Hoffenheim")
    sTitle = Replace(sTitle, "1899 Hoffenheim", "Hoffenheim")
    sTitle = Replace(sTitle, "FK ", "")
    sTitle = Replace(sTitle, " BK", "")
    sTitle = Replace(sTitle, "FF ", "")
    sTitle = Replace(sTitle, "Eif", "EIF")
    sTitle = Replace(sTitle, "Ifk ", "")
    sTitle = Replace(sTitle, " Fk", " FK")
    sTitle = Replace(sTitle, " Fc", " FC")
    sTitle = Replace(sTitle, " Ff", "")
    sTitle = Replace(sTitle, " FC", "")
    sTitle = Replace(sTitle, " FF", "")
    sTitle = Replace(sTitle, "Afc ", "")
    sTitle = Replace(sTitle, "Busan I'Park", "Busan I Park")
    sTitle = Replace(sTitle, "Women", "(w)")
    sTitle = Replace(sTitle, "SK Slavia Praha", "Slavia Praha")
    sTitle = Replace(sTitle, "Copenhagen", "FC Copenhagen")
    sTitle = Replace(sTitle, "Koebenhavn", "FC Copenhagen")
    soccer = sTitle
End Function
Have just a 2x vlookup will for sure make it easier as I will hit a quite a few lucky strikes doing it this way . But I was really hoping to get some help rebuilding the vLooklike code so it will run faster., I'm not a specialist but organize the entire rows into an array would be lightning fast ?

My overall goal is to do as little as possible manual work because in the end there are 10 000 + teams thats needs to be sorted :/
 
Upvote 0
the second table is the formula of the first, it shows the replacement of / and - and pushing to UPPER Case
 
Upvote 0
ok . know I understand - I like i. t do you know how to convert this into a code ? as I prefer to have only static text in the cells and let code do all the calculations..
i've started on something like this (just for testing)

Code:
Sub myfirstarray()
Dim lr As Long, i As Long, j As Long
Dim vData As Variant
Dim sheet As Worksheet
Set sheet = ActiveWorkbook.Sheets("Sheet1")
j = 2
lr = sheet.Cells(Rows.Count, "H").End(xlUp).Row
vData = sheet.Range("H2:H" & lr)


For i = LBound(vData, 1) To UBound(vData, 1)
   Cells(j, "O").Value = vData(i, 1)
   j = j + 1
Next i
End Sub

and this is the way to got rough loops, but how to add this to the vlookup code is probably more then I can handle
 
Upvote 0
nevermind the first question I can add a few of those criterias into my soccer function. Im trying to rewrite this (with my little basic knowledge) here is what I got mut it gives mismatch error

Code:
Function VLookLike(txt As String, rng As Variant) As String
    Dim temp As String, e, n As Long, a()
    Static RegX As Object
    If RegX Is Nothing Then
        Set RegX = CreateObject("VBScript.RegExp")
        With RegX
            .Global = True
            .IgnoreCase = True
            .Pattern = "(\S+).*" & Chr(2) & ".*\1"
        End With
    End If
    With RegX
        For e = LBound(vData, 1) To UBound(vData, 1)
            If UCase$(e) = UCase(txt) Then
                VLookLike = e
                Exit For
            End If
            temp = Join$(Array(e, txt), Chr(2))
            If .test(temp) Then
                n = n + 1
                ReDim Preserve a(1 To 2, 1 To n)
                a(2, n) = e
                Do While .test(temp)
                    a(1, n) = a(1, n) + Len(.Execute(temp)(0).submatches(0))
                    temp = Replace(temp, .Execute(temp)(0).submatches(0), "")
                Loop
            End If
        Next e
    End With
    If (VLookLike = "") * (n > 0) Then
        With Application
            VLookLike = .HLookup(.Max(.Index(a, 1, 0)), a, 2, False)
        End With
    End If
End Function
 
Upvote 0
here is my latest tweek:

Code:
Sub VLookLike()
    Dim txt As String
    Dim rng As Variant
    Dim temp As String, e, n As Long, a()
    Static RegX As Object
    Dim Vlook As String
    rng = Range("H" & 2 & ":H" & 1206)
    txt = "Sydney United"
    If RegX Is Nothing Then
        Set RegX = CreateObject("VBScript.RegExp")
        With RegX
            .Global = True
            .IgnoreCase = True
            .Pattern = "(\S+).*" & Chr(2) & ".*\1"
        End With
    End If
    With RegX
        For e = LBound(rng, 1) To UBound(rng, 1)
            If UCase$(e) = UCase(txt) Then
                Vlook = e
                Exit For
            End If
            temp = Join$(Array(e, txt), Chr(2))
            If .test(temp) Then
                n = n + 1
                ReDim Preserve a(1 To 2, 1 To n)
                a(2, n) = e
                Do While .test(temp)
                    a(1, n) = a(1, n) + Len(.Execute(temp)(0).submatches(0))
                    temp = Replace(temp, .Execute(temp)(0).submatches(0), "")
                Loop
            End If
        Next e
    End With
    If (Vlook = "") * (n > 0) Then
        With Application
            Vlook = .HLookup(.Max(.Index(a, 1, 0)), a, 2, False)
        
        End With
    End If
End Sub

and I belive this can work but I don't know how to finish this to actual extract the information , lol
 
Last edited:
Upvote 0

Forum statistics

Threads
1,225,750
Messages
6,186,809
Members
453,374
Latest member
Descant40

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top