Extract multiple keywords from text string

HughT

Board Regular
Joined
Jan 6, 2012
Messages
113
Office Version
  1. 365
Platform
  1. Windows
I have searched the forum but I can't find anything that quite matches this.

This is for a healthy eating project. Students record their meals in a daily diary. This is random unformatted text in a single cell (B1, B2 etc). There is a keyword list of healthy foods, each item being in a separate cell. What I would like to do is search the random text for occurrences of the keywords and return the keywords in another cell adjacent to the text cell. I would then like to be able to search the returned cells by the keyword list.

So:

Keywords (each in a separate cell, but doesn't have to be in Column A):

A1 Apple
A2 Fries
A3 Salad
A4 Burger
etc

Text (in B1)
Today I ate a burger with fries, and had an apple afterwards.

Result (in C1)
Apple Fries Burger [order is not important]

C1 to C20 (etc) will be the searchable data. I want to be able to search this by each keyword in the range A1:A4, ie 'Apple', 'Fries', 'Salad' etc so I can see who has been eating Apples, Fries, etc. Using column filters will display the contents of every cell, so if some comedian enters the whole range A1:A4 (which will actually be much larger) the filter will also return the whole range, so I need an alternative method.

I possible I would like to do this by a formula rather than VBA as I have to hand this over to someone who will not understand VBA, and can add to or alter the contents of the lookup range (A1:A4) simply by adding to it or overtyping the existing contents.

Thank you for your help.
 

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.
Try,

In C1 enter:

=TRIM(IF(ISNUMBER(SEARCH(A1,B1)),A1,"")&" "&IF(ISNUMBER(SEARCH(A2,B1)),A2,"")&" "&IF(ISNUMBER(SEARCH(A3,B1)),A3,"")&" "&IF(ISNUMBER(SEARCH(A4,B1)),A4,""))

Regards
 
Upvote 0
Thank you very much for the quick reply!

I tried copying this down column C (ie to C2, C3, C4 etc) and it didn't work, so I fixed the references after the double brackets with $ signs eg (A1,B1)),$A$1,"") etc, and this didn't work either.

The keyword column could have 20 or more entries, how do I alter the formula to accommodate the range? When I tried to do this myself, I was wondering if a range statement eg '$A$1:$A$20' should be in there somewhere to capture the keyword criteria.
 
Upvote 0
Try this to copy down,

=TRIM(IF(ISNUMBER(SEARCH($A$1,B1)),$A$1,"")&" "&IF(ISNUMBER(SEARCH($A$2,B1)),$A$2,"")&" "&IF(ISNUMBER(SEARCH($A$3,B1)),$A$3,"")&" "&IF(ISNUMBER(SEARCH($A$4,B1)),$A$4,""))

You can nest the formula by 20s in the way of : .....&" "&IF(ISNUMBER(SEARCH($A$5,B1)),$A$5,"").....until 20.....&" "&IF(ISNUMBER(SEARCH($A$20,B1)),$A$20,""))

Regards
 
Upvote 0
[Table="width:, class:grid"][tr][td]Row\Col[/td][td]
A​
[/td][td]
B​
[/td][td]
C​
[/td][/tr]
[tr][td]
1​
[/td][td]Apple[/td][td]Today I ate a burger with fries, and had an apple afterwards.[/td][td]Apple, Fries, Burger[/td][/tr]


[tr][td]
2​
[/td][td]Fries[/td][td][/td][td][/td][/tr]


[tr][td]
3​
[/td][td]Salad[/td][td][/td][td][/td][/tr]


[tr][td]
4​
[/td][td]Burger[/td][td][/td][td][/td][/tr]


[tr][td]
5​
[/td][td][/td][td][/td][td][/td][/tr]
[/table]


In C1 control+shift+enter, not just enter:

=REPLACE(aconcat(IF(ISNUMBER(SEARCH($A$1:$A$4,B1)),", "&$A$1:$A$4,""),""),1,2,"")<strike></strike>

<strike></strike>For this formula to work, you need to add the <acronym title="visual basic for applications">VBA</acronym> code for ACONCAT to your workbook as a module, using Alt+F11...

Function aconcat(a As Variant, Optional sep As String = "") As String
' Harlan Grove, Mar 2002
Dim y As Variant
If TypeOf a Is Range Then
For Each y In a.Cells
aconcat = aconcat & y.Value & sep
Next y
ElseIf IsArray(a) Then
For Each y In a
aconcat = aconcat & y & sep
Next y
Else
aconcat = aconcat & a & sep
End If
aconcat = Left(aconcat, Len(aconcat) - Len(sep))
End Function
 
Upvote 0
Thank you both, but I think I may have overcomplicated the question. The basic criteria is that this must be capable of being maintained by someone who has little or not Excel knowledge, other than entering text in a cell (incredibly, such people do exist!)

So thinking around the problem, what I want to do is:

1. Create a list of keywords in A1:A20.
2. Enter unformatted text in B1, B2, to B100 (for the sake of this example)
3. Return results of match of keywords in A1:A20 in C1, C2, etc to C100.

So, spreading this out:

1. Check the text string in B1 for the ONE text string in A1, and return this in D1.
2. Check the text string in B1 for the ONE text string in A2, and return this in E1, and continue so that D1, E1, F1 etc are the matches against A1:A20 in B1 (ie column D is headed with contents of A1, E with contents of A2, F = A3 etc).
3. Repeat for entire range A1:A20 against B2 to B100.
4. Concatenate results of D1, E1, F1, etc in C1 (ie next to the unformatted text) to show all the matches of A1:A20 in B1. This solves the 'how do I extract the keywords in A1:A20 into C1' qustion. The next bit is harder:
5. Enable search / match / filter function of some sort across range D1:W100 (ie 20 columns corresponding to A1:A20) by the number of rows of data entered (B1 to B100) so that if the word 'Apple' is entered as search criteria, it filters the rows to show all the instances of Apple in B1 to B100. (This is the tricky bit).


Sorry if this confuses things, but I have got lots of empty columns to play with!

Many thanks

Hugh
 
Upvote 0
Edit: I hadn't seen your most recent post before submitting mine. I'll review mine once I've had a chance to study your last post.

I think you would be starting to see from bosco_yip's suggestion that dealing with, say, 20 keywords a standard formula would also be pretty horrendous to leave to somebody else later. In fact that formula would have to get even longer to work properly. With the keyword sample list given & text "Today I ate a burger with fries, and had a pineapple afterwards." the formula returns the keyword "Apple", even though none was eaten.

In fact that issue is the same with Aladin's formula at the moment so it also needs a tweak.

My suggestion below, which also uses vba, has something of the opposite problem - refer row 5 of my screen shot where mine did not return Apple because that 'word' was not found in the text.

There will also be the problem of other plurals where the initial ending changes. for example you may have "Cherry" in your list, so you would probably also need "Cherries"

All a minefield really. :eek:

Anyway, this is my attempt, also using a user-defined function. To implement ..

1. Right click the sheet name tab and choose "View Code".

2. In the Visual Basic window use the menu to Insert|Module

3. Copy and Paste the code below into the main right hand pane that opens at step 2.

4. Close the Visual Basic window.

5. Enter the formula as shown in cell C1 in the screen shot below and copy down.

6. If using Excel 2007 or later your workbook will need to be saved as a macro-enabled workbook (*.xlsm)

Rich (BB code):
Function Keywords(s As String, rWords As Range) As String
  Dim kw As Object
  Dim i As Long
  
  With CreateObject("VBScript.RegExp")
    .IgnoreCase = True
    .Global = True
    .Pattern = "\b(" & Join(Application.Transpose(rWords), "|") & ")\b"
    If .test(s) Then
      Set kw = .Execute(s)
      For i = 1 To kw.Count
        Keywords = Keywords & ", " & kw(i - 1)
      Next i
    End If
  End With
  Keywords = Mid(Keywords, 3)
End Function

For comparison of where all the suggestions may have issues, I've included Aladin's and bosco's suggestions in columns D & E respectively.

Excel Workbook
ABCDE
1AppleToday I ate a burger with fries, and had an apple afterwards.burger, fries, appleApple, Fries, BurgerApple Fries Burger
2FriesToday I had eggs
3SaladapplesAppleApple
4BurgerSalad, pineapple & hamSaladApple, SaladApple Salad
5I ate 4 applesAppleApple
6
Sheet1
 
Upvote 0
Thank you both, but I think I may have overcomplicated the question. The basic criteria is that this must be capable of being maintained by someone who has little or not Excel knowledge, other than entering text in a cell (incredibly, such people do exist!)

So thinking around the problem, what I want to do is:

1. Create a list of keywords in A1:A20.
2. Enter unformatted text in B1, B2, to B100 (for the sake of this example)
3. Return results of match of keywords in A1:A20 in C1, C2, etc to C100.

So, spreading this out:

1. Check the text string in B1 for the ONE text string in A1, and return this in D1.
2. Check the text string in B1 for the ONE text string in A2, and return this in E1, and continue so that D1, E1, F1 etc are the matches against A1:A20 in B1 (ie column D is headed with contents of A1, E with contents of A2, F = A3 etc).
3. Repeat for entire range A1:A20 against B2 to B100.
4. Concatenate results of D1, E1, F1, etc in C1 (ie next to the unformatted text) to show all the matches of A1:A20 in B1. This solves the 'how do I extract the keywords in A1:A20 into C1' qustion. The next bit is harder:
5. Enable search / match / filter function of some sort across range D1:W100 (ie 20 columns corresponding to A1:A20) by the number of rows of data entered (B1 to B100) so that if the word 'Apple' is entered as search criteria, it filters the rows to show all the instances of Apple in B1 to B100. (This is the tricky bit).


Sorry if this confuses things, but I have got lots of empty columns to play with!

Many thanks

Hugh

In D1, enter formula, copy across to W1 and all copy down to 100 rows :

=IF(ISNUMBER(SEARCH(INDEX($A$1:$A$20,COLUMNS($A1:A1)),$B1)),INDEX($A$1:$A$20,COLUMNS($A1:A1)),"")

In C1, enter formula and copy down :

=TRIM(CONCATENATE(D1," ",E1," ",F1," ",G1," ",H1," ",I1," ",J1," ",K1," ",L1," ",M1," ",N1," ",O1," ",P1," ",Q1," ",R1," ",S1," ",T1," ",U1," ",V1," ",W1))

Regards
 
Upvote 0
Brilliant!

Thank you very much. It all works beautifully. I now have a 20 column x 100 row table each column headed (for convenience) with the keyword from the index range, and beneath that a cell containing the same word wherever that word is also contained in the freetext box, all concatenated into one cell. But how do I perform a search across this data, for example to find all occurrences of the word 'Burger' when a burger always has another item concatenated with it such as fries, salad etc?

Very many thanks.
 
Upvote 0
I now have a 20 column x 100 row table each column headed (for convenience) with the keyword from the index range, ...
In that case, you can use them (headers) for a much simpler formula in your columns.

Note that you still have the issue of the apple showing up when pineapple is in the text & apple is not, but you seem happy to accept that. :confused: (same issue could occur with "ham" and "hamburger" etc)

Formula in D2 is copied across and down.

Excel Workbook
BCDEFG
1AppleFriesSaladBurger
2Today I ate a burger with fries, and had an apple afterwards.AppleFriesBurger
3Today I had eggs
4applesApple
5Salad, pineapple & hamAppleSalad
6I ate 4 applesApple
Keywords
 
Upvote 0

Forum statistics

Threads
1,224,828
Messages
6,181,204
Members
453,022
Latest member
RobertV1609

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top