fredrerik84
Active Member
- Joined
- Feb 26, 2017
- Messages
- 383
Ive started a gigantic task where Im gonna try to convert soccer team names scraped from several different sources. problem is that many team names have small variation in how they are spelled from site to site. here is a sample to show what I'm struggling with:
<tbody>
</tbody>
Normally at least 2-3 sites have the same spelling but of course this will different from team to team. Up intill now ive been using a vlookup like this when I fetch the team:
And being optimistic I started building this list - but the I started to think "its gotta be a better way" as building this data sheet is a gigantic task
Then around a week ago or so I stumbled upon this code written a few years ago (credit to another forum)
This code is excellent for my problem (although its not perfect) - this will look for Lookalikes and I've started to test this with a test script like this:
This code basically runs this vLookLike code then vLookup to check if the lookalike suggestion is a valid suggestion or if its way of. I have also included in my code something that calculates the percent match between the suggestion and the actual name that is imported. so far in my testing ive figgured that a 75% match is enough to use the VlookLike suggestion.
However so far my data sheet has around 1000 rows or so but for all of this to work I guess I will need at least 10k rows , problem is this code here:
its unfortunately to slow for 1000 rows , and 10k will be real slow. I was really hoping that someone could help me rewrite this part so it runs faster ?
(also I have posted a similar thread here:
Close match ?
but in the end there was no help to be received there)
so I try my luck at this forum and I really hope that someone could have a look at this , any help at all is really appreciated. Also this became a real long post sorry about that but its kinda a complicated problem
POleksandria | Oleksandria | Oleksandria FC | Oleksandria | Oleksandria |
---|---|---|---|---|
Crvena Zvezda | Crvena zvezda | Crvena Zvezda | Crvena Zvezda | Crvena Zvezda |
FS Metta/Lu | FS METTA LU | FS Metta-Lu | FS METTA LU | |
Chongqing Dangdai Lifan FC | Chongqing Lifan | Chongqing Lifan FC | Chongqing Lifan | Chongqing Lifan |
<tbody>
</tbody>
Normally at least 2-3 sites have the same spelling but of course this will different from team to team. Up intill now ive been using a vlookup like this when I fetch the team:
Code:
str = Application.VLookup(str, sheet.Range("AC" & 2 & ":AH" & LRowData), 5, False)
And being optimistic I started building this list - but the I started to think "its gotta be a better way" as building this data sheet is a gigantic task
Then around a week ago or so I stumbled upon this code written a few years ago (credit to another forum)
Code:
Function VLookLike(txt As String, rng As Range) As String
Dim temp As String, e, n As Long, a()
Static RegX As Object
If RegX Is Nothing Then
Set RegX = CreateObject("VBScript.RegExp")
With RegX
.Global = True
.IgnoreCase = True
.Pattern = "(\S+).*" & Chr(2) & ".*\1"
End With
End If
With RegX
For Each e In rng.Value
If UCase$(e) = UCase(txt) Then
VLookLike = e
Exit For
End If
temp = Join$(Array(e, txt), Chr(2))
If .test(temp) Then
n = n + 1
ReDim Preserve a(1 To 2, 1 To n)
a(2, n) = e
Do While .test(temp)
a(1, n) = a(1, n) + Len(.Execute(temp)(0).submatches(0))
temp = Replace(temp, .Execute(temp)(0).submatches(0), "")
Loop
End If
Next
End With
If (VLookLike = "") * (n > 0) Then
With Application
VLookLike = .HLookup(.Max(.Index(a, 1, 0)), a, 2, False)
End With
End If
End Function
This code is excellent for my problem (although its not perfect) - this will look for Lookalikes and I've started to test this with a test script like this:
Code:
Sub test()
Dim ateam As String, str As String, Search As String, Smatch As String, ateam2 As String
Dim i As Long, Row As Long, Col As Long, cell As Range, ii As Long
Dim sheet As Worksheet
Dim sim As Double
Dim found As Integer
Set sheet = ActiveWorkbook.Sheets("Sheet1")
For i = 2 To 50
ateam = Cells(i, "I")
ateam2 = VLookLike(ateam, sheet.Range("H" & 2 & ":H" & 1206))
Cells(i, "J").Value = ateam2
On Error Resume Next
Row = Application.WorksheetFunction.Match(ateam2, Range("H1:H2000"), 0)
Search = Application.WorksheetFunction.Match(Cells(i, "I"), Range("B" & Row & ":H" & Row), 0)
sim = Similarity(ateam, ateam2)
If Search = "" Then
Cells(i, "K").Value = ateam
Else
Cells(i, "K").Value = ateam2
End If
Cells(i, "L").Value = sim
Call test2(ateam, ateam2, i)
Search = ""
Row = ""
Next i
End Sub
This code basically runs this vLookLike code then vLookup to check if the lookalike suggestion is a valid suggestion or if its way of. I have also included in my code something that calculates the percent match between the suggestion and the actual name that is imported. so far in my testing ive figgured that a 75% match is enough to use the VlookLike suggestion.
However so far my data sheet has around 1000 rows or so but for all of this to work I guess I will need at least 10k rows , problem is this code here:
Code:
With RegX
For Each e In rng.Value
If UCase$(e) = UCase(txt) Then
VLookLike = e
Exit For
End If
temp = Join$(Array(e, txt), Chr(2))
If .test(temp) Then
n = n + 1
ReDim Preserve a(1 To 2, 1 To n)
a(2, n) = e
Do While .test(temp)
a(1, n) = a(1, n) + Len(.Execute(temp)(0).submatches(0))
temp = Replace(temp, .Execute(temp)(0).submatches(0), "")
Loop
End If
Next
End With
its unfortunately to slow for 1000 rows , and 10k will be real slow. I was really hoping that someone could help me rewrite this part so it runs faster ?
(also I have posted a similar thread here:
Close match ?
but in the end there was no help to be received there)
so I try my luck at this forum and I really hope that someone could have a look at this , any help at all is really appreciated. Also this became a real long post sorry about that but its kinda a complicated problem
Last edited: