Splitting a Text String into Identifiable Words and Comparing them to other Cells

chrisbwee

New Member
Joined
Sep 5, 2019
Messages
1
Hi! Newbie here, so please be gentle...

I have two columns of data where all the cells are text strings. I want to be able to compare each cell in the Column 2 with all of the cells in Column 1 to find which cell from column 1 has the most words that match the cell in column 2. None of them will exactly match, but some will match more than others (will have more words that match). I want to find the cell that best matches (has the most matching words) with the selected cell in column 2 and then pull off data relevant to the best matching cell, much like a vlookup would if it was a perfect match between the cells. I have provided a table below to try and better explain:

[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Column 1[/TD]
[TD]Info 1[/TD]
[TD]Column 2[/TD]
[TD]Info 2[/TD]
[/TR]
[TR]
[TD]The dog chased the cat down the road[/TD]
[TD]1[/TD]
[TD]The man loved his dog[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][TABLE="width: 276"]
<tbody>[TR]
[TD="width: 276"]The dog ate the man's shoe[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[TD]2[/TD]
[TD]The dog and the cat shared the shoe[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][TABLE="width: 276"]
<tbody>[TR]
[TD="width: 276"]The cat and the man loved the shoe[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[TD]3[/TD]
[TD][TABLE="width: 255"]
<tbody>[TR]
[TD="width: 255"]A cat is a man's best friend[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][TABLE="width: 276"]
<tbody>[TR]
[TD="width: 276"]A dog is a man's best friend[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[TD]4[/TD]
[TD][TABLE="width: 255"]
<tbody>[TR]
[TD="width: 255"]The dog chased the man down the road[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[TD]1[/TD]
[/TR]
</tbody>[/TABLE]

The fourth entry under column 2 best matches (has the most words that match) with the first entry under column 1, therefore it has pulled "1" from Info 1 into Info 2.

I understand that the VBA Split function would at least turn the text string into identifiable words, but where to go from there I do not know...

Your help would be greatly appreciated!
 

Excel Facts

Move date out one month or year
Use =EDATE(A2,1) for one month later. Use EDATE(A2,12) for one year later.
Try this as a possibility :-
NB:- I see from your data that Your "column 2" is actually column 3.
Results in column 4.
Code:
[COLOR="Navy"]Sub[/COLOR] MG05Sep18
[COLOR="Navy"]Dim[/COLOR] RngA [COLOR="Navy"]As[/COLOR] Range, DnA [COLOR="Navy"]As[/COLOR] Range, DnC [COLOR="Navy"]As[/COLOR] Range, RngC [COLOR="Navy"]As[/COLOR] Range, Sp [COLOR="Navy"]As[/COLOR] Variant, Num [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long[/COLOR]
[COLOR="Navy"]Dim[/COLOR] n [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] c [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] p [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long,[/COLOR] oMax [COLOR="Navy"]As[/COLOR] [COLOR="Navy"]Long[/COLOR]

[COLOR="Navy"]Set[/COLOR] RngA = Range(Range("A2"), Range("A" & Rows.Count).End(xlUp))
[COLOR="Navy"]Set[/COLOR] RngC = Range(Range("C2"), Range("c" & Rows.Count).End(xlUp))
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] DnA [COLOR="Navy"]In[/COLOR] RngA
   ReDim ray(1 To RngA.Count, 1 To 2)
    c = 0
   [COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] DnC [COLOR="Navy"]In[/COLOR] RngC
    Sp = Split(DnC, " ")
       [COLOR="Navy"]For[/COLOR] n = 0 To UBound(Sp)
        [COLOR="Navy"]If[/COLOR] InStr(1, DnA, Sp(n), vbTextCompare) > 0 [COLOR="Navy"]Then[/COLOR]
            Num = Num + 1
        [COLOR="Navy"]End[/COLOR] If
      [COLOR="Navy"]Next[/COLOR] n
        c = c + 1
        ray(c, 1) = Num: ray(c, 2) = DnC.Row - 1
        Num = 0
   [COLOR="Navy"]Next[/COLOR] DnC
       [COLOR="Navy"]For[/COLOR] n = 1 To UBound(ray)
         [COLOR="Navy"]If[/COLOR] ray(n, 1) > oMax [COLOR="Navy"]Then[/COLOR]
           oMax = ray(n, 1)
           p = ray(n, 2)
        [COLOR="Navy"]End[/COLOR] If
      [COLOR="Navy"]Next[/COLOR] n
     RngC(p).Offset(, 1) = RngC(p).Offset(, 1) & IIf(RngC(p).Offset(, 1) = "", DnA.Offset(, 1), "," & DnA.Offset(, 1))
    oMax = 0
[COLOR="Navy"]Next[/COLOR] DnA
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0
1. Not sure what "best match" means when there are multiple incidences of the same word
2. Do you want to exclude cetain words (like "the","is")

example
Which is the "best" match for The dog is happy below ? Is it cell1 or cell2

cell1 ( 13 matches , 8 matches if "the" excluded, 5 matches if "is" also excluded)
The head of a dog is in front of the tail of a dog and the dog is grateful that the tail of a dog is not in front of the head of a dog

cel 2 ( 4 matches )
The happy dog is asleep


3. I do not understand why there is nothing against 3 of the cells in column 2
- they all have some kind of match
- why does only one cell return a value ?


4 The dog is happy is being matched with The dog is very happy and The sleeping dog is very happy
- are they ranked equally ?

EDIT

I notice that @MickG has posted something whilst I was dozing!
- so I will stand back for the time being
 
Last edited:
Upvote 0
How about
With the same result as MicKG
Could redefine if the result acceptable
Code:
Sub teste()
    gmth = 0
    For i = 2 To 5
        myt = "(" & Join(Split(Join(Split(Cells(i, 1), " "), " (")), ")|") & ")"
        With CreateObject("VBScript.RegExp")
            .Global = True
            .Pattern = myt
            For j = 2 To 5
                Set mtch = .Execute(Cells(j, 3))
                If mtch.Count > gmth Then
                    gmth = mtch.Count
                    l = j
                End If
            Next
            Cells(l, 4) = Cells(l, 4).Value & " " & i - 1
            gmth = 0
        End With
    Next
End Sub
 
Upvote 0

Forum statistics

Threads
1,223,893
Messages
6,175,240
Members
452,621
Latest member
Laura_PinksBTHFT

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top