Compare 2 columns that contain sentences and return matching words in the 3rd column

mypa333

New Member
Joined
May 15, 2016
Messages
7
[TABLE="width: 637"]
<colgroup><col><col><col></colgroup><tbody>[TR]
[TD]Column A[/TD]
[TD]Column B[/TD]
[TD]Column C[/TD]
[/TR]
[TR]
[TD]appartement a louer[/TD]
[TD]appartement a louer limoges[/TD]
[TD]appartement a louer[/TD]
[/TR]
[TR]
[TD]appartement a louer[/TD]
[TD]appartement grece[/TD]
[TD]appartement[/TD]
[/TR]
[TR]
[TD]barcelone[/TD]
[TD]hotels a gotic[/TD]
[TD]No Match[/TD]
[/TR]
[TR]
[TD]barcelone[/TD]
[TD]barcelone hotel[/TD]
[TD]barcelone[/TD]
[/TR]
[TR]
[TD]borussia dortmund liverpool[/TD]
[TD]dortmund hotels[/TD]
[TD]dortmund[/TD]
[/TR]
[TR]
[TD]Hotel in Livigno[/TD]
[TD]Hotel in Livigno[/TD]
[TD]Exact Match[/TD]
[/TR]
[TR]
[TD]camping car occasion[/TD]
[TD]camping village rosselba le palme portoferraio[/TD]
[TD]camping[/TD]
[/TR]
[TR]
[TD]camping car occasion[/TD]
[TD]eurocamping oliva[/TD]
[TD]No Match[/TD]
[/TR]
[TR]
[TD]five nights at freddy 4[/TD]
[TD]freddy desert nights camp hotel[/TD]
[TD]nights freddy[/TD]
[/TR]
</tbody>[/TABLE]

I need help in comparing each word in column A with each word in column B and returning the matched words in column C.
I can use a IF(A2=B2, true, false) to flag exact matches or no matches but I've been struggling(a lot, to the point of obsession) to get the matched words list.
I've been running a macro that would TextToColumns both columns, then Hlookup each resulting words from B to A, then concat the matched words = 30-60 min depending on the mood of my laptop.

Thank you
 

Excel Facts

Back into an answer in Excel
Use Data, What-If Analysis, Goal Seek to find the correct input cell value to reach a desired result
Maybe this UDF (User Defined Function)

Alt+F11 to oen the VBEditor
Menu
Insert > Module
Copy and paste the code b elow in the right panel

Code:
Function GetMatches(s1 As String, s2 As String)
    Dim spl1 As Variant, spl2 As Variant, i As Long
    
    If s1 = s2 Then GetMatches = "Exact Match": Exit Function
    
    spl1 = Split(Application.Trim(s1))
    spl2 = Split(Application.Trim(s2))
    
    For i = 0 To UBound(spl1)
           If Not IsError(Application.Match(spl1(i), spl2, 0)) Then GetMatches = GetMatches & " " & spl1(i)
    Next i
    If Len(GetMatches) = 0 Then
        GetMatches = "No Match"
    Else
        GetMatches = Application.Trim(GetMatches)
    End If
End Function

Back to Excel

Formula in C2
=GetMatches(A2,B2)
copy down


[Table="class: grid"][tr][td="bgcolor: #DCE6F1"][/td][td="bgcolor: #DCE6F1"]
A
[/td][td="bgcolor: #DCE6F1"]
B
[/td][td="bgcolor: #DCE6F1"]
C
[/td][/tr]
[tr][td="bgcolor: #DCE6F1"]
1
[/td][td]
Text1​
[/td][td]
Text2​
[/td][td="bgcolor:#D9D9D9"]
Result​
[/td][/tr]


[tr][td="bgcolor: #DCE6F1"]
2
[/td][td]
appartement a louer​
[/td][td]
appartement a louer limoges​
[/td][td="bgcolor:#D9D9D9"]
appartement a louer​
[/td][/tr]


[tr][td="bgcolor: #DCE6F1"]
3
[/td][td]
appartement a louer​
[/td][td]
appartement grece​
[/td][td="bgcolor:#D9D9D9"]
appartement​
[/td][/tr]


[tr][td="bgcolor: #DCE6F1"]
4
[/td][td]
barcelone​
[/td][td]
hotels a gotic​
[/td][td="bgcolor:#D9D9D9"]
No Match​
[/td][/tr]


[tr][td="bgcolor: #DCE6F1"]
5
[/td][td]
barcelone​
[/td][td]
barcelone hotel​
[/td][td="bgcolor:#D9D9D9"]
barcelone​
[/td][/tr]


[tr][td="bgcolor: #DCE6F1"]
6
[/td][td]
borussia dortmund liverpool​
[/td][td]
dortmund hotels​
[/td][td="bgcolor:#D9D9D9"]
dortmund​
[/td][/tr]


[tr][td="bgcolor: #DCE6F1"]
7
[/td][td]
Hotel in Livigno​
[/td][td]
Hotel in Livigno​
[/td][td="bgcolor:#D9D9D9"]
Exact Match​
[/td][/tr]


[tr][td="bgcolor: #DCE6F1"]
8
[/td][td]
camping car occasion​
[/td][td]
camping village rosselba le palme portoferraio​
[/td][td="bgcolor:#D9D9D9"]
camping​
[/td][/tr]


[tr][td="bgcolor: #DCE6F1"]
9
[/td][td]
camping car occasion​
[/td][td]
eurocamping oliva​
[/td][td="bgcolor:#D9D9D9"]
No Match​
[/td][/tr]


[tr][td="bgcolor: #DCE6F1"]
10
[/td][td]
five nights at freddy 4​
[/td][td]
freddy desert nights camp hotel​
[/td][td="bgcolor:#D9D9D9"]
nights freddy​
[/td][/tr]
[/table]


Hope this helps

M.
 
Upvote 0
Marcelo, this is absolutely awesome.
The code is very "elegant" compared to my texttocolumns macro.

My obsession is over. Now, I can rest. Many thanks!
 
Upvote 0
Hello again,

I've been running this for the last month and while it cut down the time spent in half, I still need to manually go through variations(E.g. plurals, words with accents, misspelled words), such as:

E.g. Apartments in barcalone vs apartments in barcelona - or - renting cars vs rent a car. The Levenstehein distance is 2 in the first case and in the second case is 5 for the "ing" and the "a".

Basically, I want to calculate the sum of the Levensthein distance returned when the above code compares the 2 words. My logic is that if the LD is closer to zero, the better the match it is.

Integrated in the above code or as a separate function, anything will help! Thank you! The below is helpful just on a word by word basis. The code above compares each word so if I can add this in the loop, it will absolutely rock!

<code style="margin: 0px; padding: 0px; border: 0px; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, sans-serif; white-space: inherit;">Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long

Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long

string1_length
= Len(string1)
string2_length
= Len(string2)
ReDim distance(string1_length, string2_length)

For i = 0 To string1_length
distance
(i, 0) = i
Next

For j = 0 To string2_length
distance
(0, j) = j
Next

For i = 1 To string1_length
For j = 1 To string2_length
If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
distance
(i, j) = distance(i - 1, j - 1)
Else
distance
(i, j) = Application.WorksheetFunction.Min _
(distance(i - 1, j) + 1, _
distance
(i, j - 1) + 1, _
distance
(i - 1, j - 1) + 1)
End If
Next
Next

Levenshtein
= distance(string1_length, string2_length)

End Function</code>
 
Upvote 0
I realized this makes close to no sense. Will post a better logic and my code if I get it to work in any way. Thanks.
 
Upvote 0
This just hit me ... What if I merge the words in each column, calculate the Levensthein Distance using the above formula and calculate a weighted percentage match between the length of characters of the 2 strings and also keep the initial code and make an assumption if it's a good match or a bad match.
 
Upvote 0

Forum statistics

Threads
1,223,230
Messages
6,170,883
Members
452,364
Latest member
springate

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top