# Lookup for cell contains most of the words in a sentence.



## Flashbond (Sunday at 5:26 AM)

Hello everyone,

I am looking for a formula input to D2. The aim is to find the cell in Column A that contains most of the words given in C2 and return Column B value.


Column AColumn BColumn CColumn D1The quick brown1Given sentence:Desired result:2fox2The quick brown fox jumps over the lazy dog.33jumps over the lazy34dog4


----------



## DanteAmor (Sunday at 6:15 AM)

Try with this UDF:


```
Function contains_words(txt As String, rng As Range)
  Dim c As Range
  Dim nmax As Long, n As Long
  
  For Each c In rng.Columns(1).Cells
    If InStr(1, txt, c.Value) > 0 Then
      n = UBound(Split(c.Value, " ")) + 1
      If n > nmax Then
        nmax = n
        contains_words = c.Offset(, 1).Value
      End If
    End If
  Next
End Function
```


*HOW TO INSTALL UDFs*
If you are new to UDFs, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. You can now use *contains_words* just like it was a built-in Excel function. For example:

Dante AmorABCD1The quick brown1Given sentence:Desired result:2fox2The quick brown fox jumps over the lazy dog.33jumps over the lazy34dog4Hoja1Cell FormulasRangeFormulaD2D2=contains_words(C2,A2:B4)

---


----------



## Flashbond (Sunday at 6:19 AM)

Thanks for your effort. I am looking for a pure formula solution if possible.


----------



## DanteAmor (Monday at 12:15 AM)

In column B I put other numbers to do my tests.
Try:

Dante AmorABCD1The quick brown12Given sentence:Desired result:2fox13The quick brown fox jumps over the lazy dog.143jumps over the lazy144dog155some words to compare16Hoja1Cell FormulasRangeFormulaD2D2=INDEX(B1:B5,MAX(IF(LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1=
LARGE(IF(ISNUMBER(SEARCH(A1:A5,C2)),LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1),1),
IF(IF(ISNUMBER(SEARCH(A1:A5,C2)),LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1),ROW(B1:B5)))))Press CTRL+SHIFT+ENTER to enter array formulas.


----------



## Flashbond (Monday at 12:31 AM)

DanteAmor said:


> In column B I put other numbers to do my tests.
> Try:
> 
> Dante AmorABCD1The quick brown12Given sentence:Desired result:2fox13The quick brown fox jumps over the lazy dog.143jumps over the lazy144dog155some words to compare16Hoja1Cell FormulasRangeFormulaD2D2=INDEX(B1:B5,MAX(IF(LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1=
> ...


Flowless


----------



## Peter_SSs (Monday at 5:44 AM)

Does that mean your data could not be like this where all 6 of the words in cell A1 are in C2?

Flashbond.xlsmABCD1The quick brown over lazy dog1Given sentence:Desired result:2fox2The quick brown fox jumps over the lazy dog.33jumps over the lazy34dog4Sheet1Cell FormulasRangeFormulaD2D2=INDEX(B1:B5,MAX(IF(LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1=LARGE(IF(ISNUMBER(SEARCH(A1:A5,C2)),LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1),1),IF(IF(ISNUMBER(SEARCH(A1:A5,C2)),LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1),ROW(B1:B5)))))Press CTRL+SHIFT+ENTER to enter array formulas.


----------



## Flashbond (Monday at 5:57 AM)

Peter_SSs said:


> Does that mean your data could not be like this where all 6 of the words in cell A1 are in C2?
> 
> Flashbond.xlsmABCD1The quick brown over lazy dog1Given sentence:Desired result:2fox2The quick brown fox jumps over the lazy dog.33jumps over the lazy34dog4Sheet1Cell FormulasRangeFormulaD2D2=INDEX(B1:B5,MAX(IF(LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1=LARGE(IF(ISNUMBER(SEARCH(A1:A5,C2)),LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1),1),IF(IF(ISNUMBER(SEARCH(A1:A5,C2)),LEN(A1:A5)-LEN(SUBSTITUTE(A1:A5," ",""))+1),ROW(B1:B5)))))Press CTRL+SHIFT+ENTER to enter array formulas.


Oh, I've never thought of that scenario. Yes, it should be any of the words. In your scenario, the desired result should be 1. Tnaks a lot @Peter_SSs ! Actually, I haven't checked @DanteAmor 's suggestion against this scenario.


----------



## Peter_SSs (Monday at 6:52 AM)

Then in that case I suspect you will need a macro. 
There is a further question though. In the examples so far *all* of the words in each cell in column A exist in the C2 value. Is that always the case or could you have an example like this where the words "red and "cat" in column A do not exist in C2?

Flashbond.xlsmABC1The quick red1Given sentence:2cat2The quick brown fox jumps over the lazy dog.3jumps over the lazy34dog4Sheet1


----------



## Flashbond (Monday at 7:20 AM)

Oh yes. You always look through the pinhole @Peter_SSs ! A more precise example would be:


Column AColumn BColumn CColumn D1the lazy fox1 (3 matches)Given sentence:Desired result:2the quick dog jumps2 (4 matches)The quick brown fox jumps over the lazy dog23over the brown cat3 (3 matches)4jumps cat over4 (2 matches)
I think the example above is only achievable via VBA.

If the words are in order, is still VBA required? Like the example below:

Column AColumn BColumn CColumn D1jumps over the cat1 (3 matches)Given sentence:Desired result:2the lazy dog2 (3 matches)The quick brown fox jumps over the lazy dog33the brown cat jumps over3 (4 matches)4fox over cat4 (2 matches)


----------



## Peter_SSs (Monday at 6:46 PM)

I'm not sure that I will have an answer for you either way, but I can see more possible confusion - or at least different ways to interpret things.
For the sample below

Does A1 have two matches or one for "quick"?
Does A2 have two matches or one for "the" (since "the" occurs twice in C2)?
Does A3 have two matches or one for "the" (there are certainly two of each in each cell but perhaps it is only considered as one word)?
Flashbond.xlsmAC1A quick cat and a quick ratGiven sentence:2The slow horseThe quick brown fox jumps over the lazy dog.3The cat and the ratSheet1


----------



## Flashbond (Sunday at 5:26 AM)

Hello everyone,

I am looking for a formula input to D2. The aim is to find the cell in Column A that contains most of the words given in C2 and return Column B value.


Column AColumn BColumn CColumn D1The quick brown1Given sentence:Desired result:2fox2The quick brown fox jumps over the lazy dog.33jumps over the lazy34dog4


----------



## Flashbond (Monday at 10:47 PM)

Hi Peter,

Surely there will be no difference between picking either of the same amount of matches since they both meets the condition and there is no way not know which one is which. So, in this case first match will be adequate.

Most of the time words will be unique. Let me tell you about my actual need. I have a list of products and sometimes product name varies in purchase list. So let say, my product neme is "Electrical Insulated Gloves" and, I have "Electrical Tape" and "Insulated Gloves" in my purchase list. I want to match the quantity of "Insulation Gloves" in this case.

Briefly, most of the time the words are unique and in order. But there can be some skipped words. So, "Electrical Gloves" is also possible. All I need is some kind of a partial match.


----------



## Peter_SSs (Monday at 11:10 PM)

Does that mean that the column A list might have a fairly small number of words in a single cell, say max of 3 or 4 (or 5)?


----------



## Flashbond (Monday at 11:10 PM)

To be more precise, it is very very unlikely to have both "Electrical Gloves" and "Electrical Work Glows" in the same purchase order so that type scenario can be neglected.  But both have equal significance in general.


----------



## Flashbond (Monday at 11:12 PM)

Peter_SSs said:


> Does that mean that the column A list might have a fairly small number of words in a single cell, say max of 3 or 4 (or 5)?


It won't be safe to limit like that.


----------



## Peter_SSs (Monday at 11:19 PM)

Flashbond said:


> It won't be safe to limit like that.


Then in that case, with your version of Excel, I think that I would be pusuing a vba solution. Is that acceptable?
Also, your goal was to find the row with the highest number of word matches and return the value from column B. What should happen when there is not one clear winner like there has been in all your examples so far? For example ..

Flashbond.xlsmABCD1Electrical Insulated1Desired result:2Insulated Gloves2Electrical Insulated Gloves??3Electrical Gloves3Sheet1


----------



## Flashbond (Tuesday at 12:00 AM)

Peter_SSs said:


> Flashbond.xlsmABCD1Electrical Insulated1Desired result:2Insulated Gloves2Electrical Insulated Gloves??3Electrical Gloves3Sheet1


It is a very unlikely scenario. First match is ok. I can also write a vba also but your approaches always more efficient than mines. I will be looking forward to see. Thanks for your time and effort.


----------



## Peter_SSs (Tuesday at 12:39 AM)

Flashbond said:


> First match is ok.


OK, here is one way employing a UDF. If there is any punctuation involved that could introduce some problems.


```
Function LookupMost(r As Range, s As String) As Variant
  Dim RX As Object
  Dim a As Variant
  Dim i As Long, nMax As Long, n As Long
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.IgnoreCase = True
  a = r.Value
  For i = 1 To UBound(a)
    RX.Pattern = "\b(" & Replace(a(i, 1), " ", "|") & ")\b"
    n = RX.Execute(s).Count
    If n > nMax Then
      nMax = n
      LookupMost = a(i, 2)
    End If
  Next i
End Function
```

Flashbond.xlsmABCD1Electrical1Result2Insulated Gloves2Electrical Insulated Gloves23Electrical Gloves345The quick brown over lazy dog1Result6fox2The quick brown fox jumps over the lazy dog.17jumps over the lazy38dog4910jumps over the lazy1Result11fox2The quick brown fox jumps over the lazy dog.312The quick brown over lazy dog313dog4Sheet1Cell FormulasRangeFormulaD2D2=LookupMost(A1:B3,C2)D6,D11D6=LookupMost(A5:B8,C6)


----------



## Flashbond (Tuesday at 12:42 AM)

Any punctuation can be neglected. Thank you very much again!


----------



## Peter_SSs (Tuesday at 12:49 AM)

Flashbond said:


> Any punctuation can be neglected.


That is simple to say but not so simple to tell Excel how to do it. 
Do you actually have any punctuation in your data? If so, could you post a few sets of *representative *sample data that shows the sort of punctuation variation that can occur?


----------



## Flashbond (Tuesday at 1:21 AM)

It is all in Turkish  Mostly "/", "-" and "."
Like:
Electrical Glowes L/XL
Or
Anti-static Rope 7.00MM

They are mostly the part of the word. So main delimeter is " ". Thats why I said "can be neglected".


----------



## Flashbond (Sunday at 5:26 AM)

Hello everyone,

I am looking for a formula input to D2. The aim is to find the cell in Column A that contains most of the words given in C2 and return Column B value.


Column AColumn BColumn CColumn D1The quick brown1Given sentence:Desired result:2fox2The quick brown fox jumps over the lazy dog.33jumps over the lazy34dog4


----------



## Peter_SSs (Tuesday at 1:59 AM)

Flashbond said:


> Mostly "/", "-" and "."
> 
> They are mostly the part of the word.


You cannot tell vba to "mostly" do something. Those three punctuation symbols all have a special meaning in regular expression syntax, which is what I am using. They would need special treatment to deal with them but to determine the best way to do that I would need to see some good representative sample data and know *all* non-alphabetical and non-numerical characters that can occur as some may require different treatment.

I would also like to know whether any such punctuation does or can occur in the column A values and/or the column C values in the layout that we have been using.

Looks like numerical data is now also to be considered - not mentioned before. If "7.00MM" occurred in both column A and C2 do you consider that a "word" when counting how many words are found?

Any chance that we could still have this?


Peter_SSs said:


> could you post a *few sets* of *representative *sample data that shows the sort of punctuation variation that can occur?


----------



## bebo021999 (Tuesday at 2:18 AM)

Stick to your 1st sample
An approach with UDF (User define formula)
Right click on tab name, View code, Insert, Module then paste below code into:

```
Option Explicit
Function BestMatch(rng As Range, cell As Range)
Dim i&, pos&, max&, count&, ce As Range, sA, st As String
For Each ce In rng
    sA = Split(ce): count = 0: pos = 0
    For i = 0 To UBound(sA)
        pos = InStr(pos + 1, ce, sA(i))
        If pos > 0 Then count = count + 1
    Next
    If count > max And count = UBound(sA) + 1 Then
        max = count
        'BestMatch = ce.Value ===> if expect text string
        'BestMatch = ce.Address(0, 0) ===> if expect cell address
        BestMatch = ce.Row '===> if expect row index
    End If
Next
End Function
```
I leave 3 options of outcome: Textstring, cell address or cell row
BestMatch(rng As Range, cell As Range)
Assums part string in A1:A4 (rng), original string in C2 (cell)
In D2, type:

```
=BestMatch($A$1:$A$4,$C$2)
```


----------



## DanteAmor (Tuesday at 11:23 AM)

Hi @Flashbond , If you end up going with a UDF, you could check the UDF from *post #2* again, I just made a few small changes with the new information you provided:


```
Function contains_words(rng As Range, txt As String)
  Dim c As Range
  Dim nmax As Long, n As Long
  Dim w As Variant
  
  For Each c In rng.Columns(1).Cells
    n = 0
    For Each w In Split(c.Value, " ")
      If InStr(1, txt, w) > 0 Then
        n = n + 1
        If n > nmax Then
          nmax = n
          contains_words = c.Offset(, 1).Value
        End If
      End If
    Next
  Next
End Function
```

Dante AmorABCD1Electrical1Result2Insulated Gloves2Electrical Insulated Gloves23Electrical Gloves345The quick brown over lazy dog1Result6fox2The quick brown fox jumps over the lazy dog.17jumps over the lazy38dog4910jumps over the lazy1Result11fox2The quick brown fox jumps over the lazy dog.312The quick brown over lazy dog313dog4Hoja1Cell FormulasRangeFormulaD2D2=contains_words(A1:A3,C2)D6,D11D6=contains_words(A5:A8,C6)


If there is any case that the UDF does not comply with, then I join Peter's request:



Peter_SSs said:


> Any chance that we could still have this?
> 
> could you post a *few sets* of *representative *sample data that shows the sort of punctuation variation that can occur?


----------

