Extract multiple keywords from text string

HughT

Board Regular
Joined
Jan 6, 2012
Messages
113
Office Version
  1. 365
Platform
  1. Windows
I have searched the forum but I can't find anything that quite matches this.

This is for a healthy eating project. Students record their meals in a daily diary. This is random unformatted text in a single cell (B1, B2 etc). There is a keyword list of healthy foods, each item being in a separate cell. What I would like to do is search the random text for occurrences of the keywords and return the keywords in another cell adjacent to the text cell. I would then like to be able to search the returned cells by the keyword list.

So:

Keywords (each in a separate cell, but doesn't have to be in Column A):

A1 Apple
A2 Fries
A3 Salad
A4 Burger
etc

Text (in B1)
Today I ate a burger with fries, and had an apple afterwards.

Result (in C1)
Apple Fries Burger [order is not important]

C1 to C20 (etc) will be the searchable data. I want to be able to search this by each keyword in the range A1:A4, ie 'Apple', 'Fries', 'Salad' etc so I can see who has been eating Apples, Fries, etc. Using column filters will display the contents of every cell, so if some comedian enters the whole range A1:A4 (which will actually be much larger) the filter will also return the whole range, so I need an alternative method.

I possible I would like to do this by a formula rather than VBA as I have to hand this over to someone who will not understand VBA, and can add to or alter the contents of the lookup range (A1:A4) simply by adding to it or overtyping the existing contents.

Thank you for your help.
 
Edit: I hadn't seen your most recent post before submitting mine. I'll review mine once I've had a chance to study your last post.

I think you would be starting to see from bosco_yip's suggestion that dealing with, say, 20 keywords a standard formula would also be pretty horrendous to leave to somebody else later. In fact that formula would have to get even longer to work properly. With the keyword sample list given & text "Today I ate a burger with fries, and had a pineapple afterwards." the formula returns the keyword "Apple", even though none was eaten.

In fact that issue is the same with Aladin's formula at the moment so it also needs a tweak.

My suggestion below, which also uses vba, has something of the opposite problem - refer row 5 of my screen shot where mine did not return Apple because that 'word' was not found in the text.

There will also be the problem of other plurals where the initial ending changes. for example you may have "Cherry" in your list, so you would probably also need "Cherries"

All a minefield really. :eek:

Anyway, this is my attempt, also using a user-defined function. To implement ..

1. Right click the sheet name tab and choose "View Code".

2. In the Visual Basic window use the menu to Insert|Module

3. Copy and Paste the code below into the main right hand pane that opens at step 2.

4. Close the Visual Basic window.

5. Enter the formula as shown in cell C1 in the screen shot below and copy down.

6. If using Excel 2007 or later your workbook will need to be saved as a macro-enabled workbook (*.xlsm)

Rich (BB code):
Function Keywords(s As String, rWords As Range) As String
  Dim kw As Object
  Dim i As Long
  
  With CreateObject("VBScript.RegExp")
    .IgnoreCase = True
    .Global = True
    .Pattern = "\b(" & Join(Application.Transpose(rWords), "|") & ")\b"
    If .test(s) Then
      Set kw = .Execute(s)
      For i = 1 To kw.Count
        Keywords = Keywords & ", " & kw(i - 1)
      Next i
    End If
  End With
  Keywords = Mid(Keywords, 3)
End Function

For comparison of where all the suggestions may have issues, I've included Aladin's and bosco's suggestions in columns D & E respectively.

Sheet1

ABCDE
AppleToday I ate a burger with fries, and had an apple afterwards.burger, fries, appleApple, Fries, BurgerApple Fries Burger
FriesToday I had eggs
Saladapples AppleApple
BurgerSalad, pineapple & hamSaladApple, SaladApple Salad
I ate 4 apples AppleApple

<colgroup><col style="font-weight:bold; width:30px; "><col style="width:56px;"><col style="width:399px;"><col style="width:142px;"><col style="width:146px;"><col style="width:141px;"></colgroup><tbody>
[TD="bgcolor: #cacaca, align: center"]1[/TD]

[TD="bgcolor: #cacaca, align: center"]2[/TD]

[TD="bgcolor: #cacaca, align: center"]3[/TD]

[TD="bgcolor: #cacaca, align: center"]4[/TD]

[TD="bgcolor: #cacaca, align: center"]5[/TD]

[TD="bgcolor: #cacaca, align: center"]6[/TD]

</tbody>

Spreadsheet Formulas
CellFormula
C1=Keywords(B1,$A$1:$A$4)
D1{=REPLACE(aconcat(IF(ISNUMBER(SEARCH($A$1:$A$4,B1)),", "&$A$1:$A$4,""),""),1,2,"")}
E1=TRIM(IF(ISNUMBER(SEARCH($A$1,B1)),$A$1,"")&" "&IF(ISNUMBER(SEARCH($A$2,B1)),$A$2,"")&" "&IF(ISNUMBER(SEARCH($A$3,B1)),$A$3,"")&" "&IF(ISNUMBER(SEARCH($A$4,B1)),$A$4,""))

<tbody>
</tbody>
Formula Array:
Produce enclosing
{ } by entering
formula with CTRL+SHIFT+ENTER!

<tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4


Hello Peter,
I was looking for a solution to a requirement I have and I came across this thread.

I tried the suggestions given by you and other posters but it did not work for me. I wanted to request your help on this matter.

Kindly provide your inputs.

My Requirement:


1. Column A has Keywords. Example:
Cell A1: FIELD-NAME-1
Cell A2: FIELD-NAME-2
so on...
Cell A100: FIELD-NAME-100


2. Column W has free form text, that may or may not have keywords from Column A.

Cell W1: This is the free form text for FIELD-NAME-2
Cell W2: This is the free form text for FIELD-NAME-1 and FIELD-NAME-100.
Cell W3: This is the free form text for no fields.
So on..
Cell W20: This is the free form text for FIELD-NAME-1, FIELD-NAME-3, FIELD-NAME-10 and FIELD-NAME-50.


3. I want the below in Column X (Basically check the Keywords in Column A in each of the cells in Column W and give the list of keywords.

Cell X1: FIELD-NAME-2
Cell X2: FIELD-NAME-1, FIELD-NAME-100
Cell X3:
So on..
Cell X20: FIELD-NAME-1, FIELD-NAME-3, FIELD-NAME-10, FIELD-NAME-50.
[TABLE="width: 500"]
<tbody>[TR]
[TD]Row Nbr[/TD]
[TD]A[/TD]
[TD]W[/TD]
[TD]X[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]FIELD-NAME-1[/TD]
[TD]This is the free form text for FIELD-NAME-2[/TD]
[TD]FIELD-NAME-2[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]FIELD-NAME-2[/TD]
[TD]This is the free form text for FIELD-NAME-1 and FIELD-NAME-100[/TD]
[TD]FIELD-NAME-1, FIELD-NAME-100[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]FIELD-NAME-3[/TD]
[TD]This is the free form text for no fields[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]So on..[/TD]
[TD]so on..[/TD]
[TD]so on..[/TD]
[TD]So on..[/TD]
[/TR]
</tbody>[/TABLE]


Solution (Did not work):

Followed your solution.

a. Save excel sheet in .xlsm type. (I have excel 2010).

b. Inserted the below module in vba

Function Keywords(s As String, rWords As Range) As String
Dim kw As Object
Dim i As Long

With CreateObject("VBScript.RegExp")
.IgnoreCase = True
.Global = True
.Pattern = "\b(" & Join(Application.Transpose(rWords), "|") & ")\b"
If .test(s) Then
Set kw = .Execute(s)
For i = 1 To kw.Count
Keywords = Keywords & ", " & kw(i - 1)
Next i
End If
End With
Keywords = Mid(Keywords, 3)
End Function

c. In X1 Cell I put the below formula
=Keywords(W1,$A$1:$A$100)
X2 Cell I put the below formula
=Keywords(W2,$A$1:$A$100)
So on...
W20 Cell I put the below forumula
=Keywords(W20,$A$1:$A$100)

d. However the Cells in Column X show blank and not the expected result.

Not sure what I am doing wrong. Kindly help.




 
Upvote 0

Excel Facts

Spell Check in Excel
Press F7 to start spell check in Excel. Be careful, by default, Excel does not check Capitalized Werds (whoops)
Dear Mr. Peter_SSs,

I have more or less same problem, want to extract the exact words from B1 by searching from A1:A10 list and display in column C. Please help me Sir.

It content ;

COLUMN A (LIST )

[TABLE="width: 64"]
<colgroup><col width="64" style="width:48pt"> </colgroup><tbody>[TR]
[TD="class: xl63, width: 64, align: left"]MDB[/TD]
[/TR]
[TR]
[TD="class: xl63, align: left"]SMDB[/TD]
[/TR]
[TR]
[TD="class: xl63, align: left"]ESMDB[/TD]
[/TR]
[TR]
[TD="class: xl63, align: left"]FESMDB[/TD]
[/TR]
[TR]
[TD="class: xl63, align: left"]/f[/TD]
[/TR]
</tbody>[/TABLE]
SWITCH
SWITCHGEAR

COLUMN B

[TABLE="width: 496"]
<colgroup><col></colgroup><tbody>[TR]
[TD="align: left"]testing for MDB, DB[/TD]
[/TR]
[TR]
[TD="align: left"]I am testing SMDB, MDB[/TD]
[/TR]
[TR]
[TD="align: left"]Looking for ESMDB[/TD]
[/TR]
[TR]
[TD="align: left"]Checking for FESMDB[/TD]
[/TR]
[TR]
[TD="align: left"]This is for cable /f[/TD]
[/TR]
[TR]
[TD="align: left"][/TD]
[/TR]
</tbody>[/TABLE]

COLUMN C (Result)


 
Upvote 0
I have more or less same problem, want to extract the exact words from B1 by searching from A1:A10 list and display in column C. Please help me Sir.
Your sample data is not entirely clear and you have not shown the expected results and their layout. Can you clarify both the sample data and the expected results?
 
Upvote 0
Your sample data is not entirely clear and you have not shown the expected results and their layout. Can you clarify both the sample data and the expected results?


Dear Sir,

Many thanks for your reply, Sir, actually I want to extract exact names from Description column and put in a Result Column (The Exacts Names are mentioned in the LIST below). Need command to read Item from Name List, search in Description and display in Result Column). Count column is only counting how many item there in Result i.e. Cable Ladder = 1 and MDB,Switch = 2 and so on......

Sir, Want to extract exact item name from the name list with comma in Result column. Many thanks in advance Sir.

Main Data sample With Result and Count of Result Item

[TABLE="width: 500"]
<tbody>[TR]
[TD]Description[/TD]
[TD]Result[/TD]
[TD]Count [/TD]
[/TR]
[TR]
[TD]HIGH VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Containment including all fittings, couplers, supports, fixings and ancillary work as necessary; HV cable Ladder[/TD]
[TD]Cable Ladder[/TD]
[TD]1[/TD]
[/TR]
[TR]
[TD] LOW VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Switch gear and equipment, including supports, final point-wiring, containment and ancillary work as necessary; MDB-P-GF-02(1887.7kW) 11 HV Switchgear[/TD]
[TD]11 HV Switchgear, MDB, Switch[/TD]
[TD]3[/TD]
[/TR]
[TR]
[TD]LOW VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Switch gear and equipment, including supports, final point-wiring, containment and ancillary work as necessary; MDB[/TD]
[TD]MDB, Switch [/TD]
[TD]2[/TD]
[/TR]
</tbody>[/TABLE]

[TABLE="width: 500"]
<tbody>[TR]
[TD]Name List[/TD]
[/TR]
[TR]
[TD]11 HV Switchgear[/TD]
[/TR]
[TR]
[TD]MDB [/TD]
[/TR]
[TR]
[TD]Switch[/TD]
[/TR]
[TR]
[TD]Cable Ladder[/TD]
[/TR]
[TR]
[TD]Access control[/TD]
[/TR]
[TR]
[TD]ATS[/TD]
[/TR]
[TR]
[TD]BGM[/TD]
[/TR]
[TR]
[TD]Central battery[/TD]
[/TR]
[TR]
[TD]......... many more names are there[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
Dear Sir,

Many thanks for your reply, Sir, actually I want to extract exact names from Description column and put in a Result Column (The Exacts Names are mentioned in the LIST below). Need command to read Item from Name List, search in Description and display in Result Column). Count column is only counting how many item there in Result i.e. Cable Ladder = 1 and MDB,Switch = 2 and so on......

Sir, Want to extract exact item name from the name list with comma in Result column. Many thanks in advance Sir.
Did you try the user-defined function from post #7 ?

Excel Workbook
ABC
1Name ListDescriptionResult
211 HV SwitchgearHIGH VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Containment including all fittings, couplers, supports, fixings and ancillary work as necessary; HV cable Laddercable Ladder
3MDBLOW VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Switch gear and equipment, including supports, final point-wiring, containment and ancillary work as necessary; MDB-P-GF-02(1887.7kW) 11 HV SwitchgearSwitch, MDB, 11 HV Switchgear
4SwitchLOW VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Switch gear and equipment, including supports, final point-wiring, containment and ancillary work as necessary; MDBSwitch, MDB
5Cable Ladder
6Access control
7ATS
8BGM
9Central battery
10
Sheet1
 
Upvote 0
Fantastic Sir, but the only problem is that words are repeating(Please see below in the table) i.e. Security,Access control and Earthing is showing more then ONE TIME, I need them to appear only one time in the Result column.



Did you try the user-defined function from post #7 ?

Sheet1

ABC
Name ListDescriptionResult
11 HV Switchgear SECURITY SYSTEM INSTALLATION; EQUIPMENT; CCTV - Switchgear and equipment, including supports and ancillary work as necessary; Hotel security control room - (including video wall monitors, anpr workstation, operator/supervisor workstation, badge [TABLE="width: 218"]
<tbody>[TR]
[TD="class: xl65, width: 218"]SECURITY, CCTV, security

<colgroup><col style="font-weight:bold; width:30px; "><col style="width:125px;"><col style="width:421px;"><col style="width:212px;"></colgroup><tbody>
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]1[/TD]

[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]2[/TD]

</tbody>
[/TD]
[/TR]
[TR]
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]3[/TD]
[TD]MDB[/TD]
[TD] EARTHING AND BONDING SYSTEM INSTALLATION; EQUIPMENT; Earthing and bonding system, including supports, final point-wiring, containment, test points, air termination network, strike pads, cables and accessories; Earthing for power[/TD]
[TD][TABLE="width: 218"]
<tbody>[TR]
[TD="class: xl65, width: 218"]EARTHING, Earthing, Earthing[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
[TR]
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]4[/TD]
[TD]Switch[/TD]
[TD] ACCESS CONTROL SYSTEM INSTALLATION; EQUIPMENT; Access control Panel; Access control Panel[/TD]
[TD][TABLE="width: 218"]
<tbody>[TR]
[TD="class: xl65, width: 218"]ACCESS CONTROL, Access control, Access control[/TD]
[/TR]
</tbody>[/TABLE]
[/TD]
[/TR]
[TR]
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]5[/TD]
[TD]Cable Ladder[/TD]
[TD] [/TD]
[TD] [/TD]
[/TR]
[TR]
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]6[/TD]
[TD]Access control[/TD]
[TD] [/TD]
[TD] [/TD]
[/TR]
[TR]
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]7[/TD]
[TD]ATS[/TD]
[TD] [/TD]
[TD] [/TD]
[/TR]
[TR]
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]8[/TD]
[TD]BGM[/TD]
[TD] [/TD]
[TD] [/TD]
[/TR]
[TR]
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]9[/TD]
[TD]Access control[/TD]
[TD] [/TD]
[TD] [/TD]
[/TR]
[TR]
[TD="bgcolor: [URL=https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=cacaca]#cacaca[/URL] , align: center"]10[/TD]
[TD]Earthing [/TD]
[TD] [/TD]
[TD] [/TD]
[/TR]
</tbody>[/TABLE]

Spreadsheet Formulas
CellFormula
C2=Keywords(B2,A$2:A$8)

<tbody>
</tbody>

<tbody>
</tbody>


Excel tables to the web >> Excel Jeanie HTML 4
 
Upvote 0
Fantastic Sir, but the only problem is that words are repeating(Please see below in the table) i.e. Security,Access control and Earthing is showing more then ONE TIME, I need them to appear only one time in the Result column.
Pity that wasn't stated, or an example like that given, earlier. ;)

Try changing the line in blue.
Rich (BB code):
For i = 1 To kw.Count
  If InStr(1, ", " & Keywords & ", ", ", " & kw(i - 1) & ", ", 1) = 0 Then Keywords = Keywords & ", " & kw(i - 1)
Next i

Also, I forgot to state earlier, that the column with the list of keywords must not contain any blank cells. If that could happen, then the code would need some further modifications.


BTW, best not to quote whole long posts as it makes the thread harder to read/navigate. If you want/need to quote, quote small, relevant parts only.
 
Last edited:
Upvote 0
Dear Sir,

I have data with description in column A, column B is Component name, Result will be in C column, need to search TEXT in Description (B Column) from given D,E,F,G,H and on..... If found any one of them in Description, then display Component name which is there in column B. Example below;

[TABLE="class: outer_border, width: 500"]
<tbody>[TR]
[TD]A[/TD]
[TD]B[/TD]
[TD]C[/TD]
[TD]D[/TD]
[TD]E[/TD]
[TD]F[/TD]
[TD]G[/TD]
[TD]H[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]Description[/TD]
[TD]Component [/TD]
[TD]Result[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]HIGH VOLTAGE DISTRIBUTION INSTALLATION; EQUIPMENT; Containment including all fittings, couplers, supports, fixings and ancillary work as necessary; HV cable Ladder, MV Switchgear[/TD]
[TD]HV Switchgear[/TD]
[TD]HV Switchgear[/TD]
[TD]MV Panels[/TD]
[TD]MV Switchgear[/TD]
[TD]11 KV Switchgear[/TD]
[TD]RMU[/TD]
[TD]MVS[/TD]
[TD]AND ON...............[/TD]
[/TR]
[TR]
[TD] SECURITY SYSTEM INSTALLATION; EQUIPMENT; CCTV - Switchgear and equipment, Card Reader, including supports and ancillary work as necessary; Hotel security control room - (including video wall monitors, anpr workstation, operator/supervisor workstation, badge [/TD]
[TD]Access control[/TD]
[TD]Access control[/TD]
[TD]Access control panel[/TD]
[TD]Card Reader[/TD]
[TD]Push button for exit[/TD]
[TD]Mortise Lock[/TD]
[TD]Input Module[/TD]
[TD]AND ON...............[/TD]
[/TR]
</tbody>[/TABLE]


Thank you Sir, in advance.
 
Upvote 0

Forum statistics

Threads
1,223,239
Messages
6,170,947
Members
452,368
Latest member
jayp2104

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top