Extract Keyword from text string

LambertyE

New Member
Joined
Nov 19, 2020
Messages
9
Office Version
  1. 2016
Platform
  1. Windows
Hi,

I am looking on how to simplify this formula since the list of keywords keep growing and the formula longer.

What I am trying to do is extract a keyword from a free type text.

1. Create a list of keywords in A1:A36.
2. Enter unformatted text in B1, B2, to B100 (just as an example)
3. Return results of match of keywords in A1:A20 in C1, C2, etc to C100.

So, spreading this out:

1. Check the text string in B1 for the ONE text string in A1, and return this in C1.
2. Check the text string in B1 for the ONE text string in A2, and return this in C2, and continue so that C3, C4, C5 etc are the matches against A1:A20 in B1.
3. Repeat for entire range A1:A20 against B2 to B100.

So, I applied the following formula but as I mentioned above the keyword list keeps growing and concern about the accuracy of the formula if I keep adding to the function of it.

=TRIM(IF(ISNUMBER(SEARCH($Z$2,AA2)),$Z$2,"")&" "&IF(ISNUMBER(SEARCH($Z$3,AA2)),$Z$3,"")&" "&IF(ISNUMBER(SEARCH($Z$4,AA2)),$Z$4,"")&" "&IF(ISNUMBER(SEARCH($Z$5,AA2)),$Z$5,"")&" "&IF(ISNUMBER(SEARCH($Z$6,AA2)),$Z$6,"")&" "&IF(ISNUMBER(SEARCH($Z$7,AA2)),$Z$7,"")&" "&IF(ISNUMBER(SEARCH($Z$8,AA2)),$Z$8,"")&" "&IF(ISNUMBER(SEARCH($Z$9,AA2)),$Z$9,"")&" "&IF(ISNUMBER(SEARCH($Z$10,AA2)),$Z$10,"")&" "&IF(ISNUMBER(SEARCH($Z$10,AA2)),$Z$10,"")&" "&IF(ISNUMBER(SEARCH($Z$11,AA2)),$Z$11,"")&" "&IF(ISNUMBER(SEARCH($Z$12,AA2)),$Z$12,"")&" "&IF(ISNUMBER(SEARCH($Z$13,AA2)),$Z$13,"")&" "&IF(ISNUMBER(SEARCH($Z$14,AA2)),$Z$14,"")&" "&IF(ISNUMBER(SEARCH($Z$15,AA2)),$Z$15,"")&" "&IF(ISNUMBER(SEARCH($Z$16,AA2)),$Z$16,"")&" "&IF(ISNUMBER(SEARCH($Z$17,AA2)),$Z$17,"")&" "&IF(ISNUMBER(SEARCH($Z$18,AA2)),$Z$18,"")&" "&IF(ISNUMBER(SEARCH($Z$19,AA2)),$Z$19,"")&" "&IF(ISNUMBER(SEARCH($Z$20,AA2)),$Z$20,"")&" "&IF(ISNUMBER(SEARCH($Z$21,AA2)),$Z$21,"")&" "&IF(ISNUMBER(SEARCH($Z$22,AA2)),$Z$22,"")&" "&IF(ISNUMBER(SEARCH($Z$23,AA2)),$Z$23,"")&" "&IF(ISNUMBER(SEARCH($Z$24,AA2)),$Z$24,"")&" "&IF(ISNUMBER(SEARCH($Z$25,AA2)),$Z$25,"")&" "&IF(ISNUMBER(SEARCH($Z$26,AA2)),$Z$26,"")&" "&IF(ISNUMBER(SEARCH($Z$27,AA2)),$Z$27,""))&" "&IF(ISNUMBER(SEARCH($Z$28,AA2)),$Z$28,"")&" "&IF(ISNUMBER(SEARCH($Z$29,AA2)),$Z$29,"")&" "&IF(ISNUMBER(SEARCH($Z$30,AA2)),$Z$30,"") &" "&IF(ISNUMBER(SEARCH($Z$31,AA2)),$Z$31,"")&" "&IF(ISNUMBER(SEARCH($Z$32,AA2)),$Z$32,"")&" "&IF(ISNUMBER(SEARCH($Z$33,AA2)),$Z$33,"")&" "&IF(ISNUMBER(SEARCH($Z$34,AA2)),$Z$34,"")&" "&IF(ISNUMBER(SEARCH($Z$35,AA2)),$Z$35,"")&" "&IF(ISNUMBER(SEARCH($Z$36,AA2)),$Z$36,"")

Any way this can be simplified? Help please!

Thanks in advance!

Evelyn L
 
Yeah, didn't know if they really cared about case sensitivity or not. Thanks for the regex info. Advanced regex topics are a bit beyond me. But I totally dig it.
 
Upvote 0

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
didn't know if they really cared about case sensitivity or not.
Going back to the original poster from post 1, they were not interested in case-sensitivity since they had a successful formula using SEARCH, not FIND.

As for the new poster (ACUK) we don't really know, but my guess would be that if they are interested in the case match then perhaps "Dog" should not be returned anywhere (row 8 or row 6) since "Dog" is not in the list of Keywords. We need to hear back from ACUK to be sure. :)
 
Upvote 0
At C2, try this
=SUBSTITUTE(TRIM(CONCAT(IFERROR(INDEX($A:$A,--RIGHT(AGGREGATE(15,6,FIND(($A$2:$A$4),B3)+ROW($A$2:$A$4)/100,ROW($A$2:$A$4)-1),2)),"")&" "))," ",", ")
 
Upvote 0
At C2, try this
=SUBSTITUTE(TRIM(CONCAT(IFERROR(INDEX($A:$A,--RIGHT(AGGREGATE(15,6,FIND(($A$2:$A$4),B3)+ROW($A$2:$A$4)/100,ROW($A$2:$A$4)-1),2)),"")&" "))," ",", ")
Some comments on that formula:
  • It isn't entirely clear what either questioner wants but if looking for words in the text, i note that your formula returns both 'dog' and 'cat' for the text "The underdogs scattered" when neither 'dog' nor 'cat' is a word in that string.
  • After entering your formula, if a user inserts any new rows at the top of the sheet, it begins to lose results.

Given that ACUK has Excel 365, here is another formula using worksheet functions that could be considered. Punctuation is usually a problem when looking for whole words & I have allowed for ".", "'" and "," only. More could be added if required.

ACUK.xlsm
ABC
1KeywordsTextResult
2catThe dog chased the catcat, dog
3dogThe cat ate the bird.cat, bird
4birdThe cat is a cat.cat
5The dog, bird, and cat.cat, dog, bird
6Dog's are big.dog
7I am the lizard king. 
8Big Dog small dog small cat medium dogcat, dog
9The underdogs scattered 
10 
Keywords
Cell Formulas
RangeFormula
C2:C10C2=TEXTJOIN(", ",1,FILTER($A$2:$A$4,ISNUMBER(SEARCH(" "&A$2:A$4&" "," "&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"."," "),","," "),"'"," ")&" ")),""))
 
Upvote 0
This should do the trick.

VBA Code:
Function getKeywords(keys As Range, s As String) As String
Dim AL As Object:   Set AL = CreateObject("System.Collections.ArrayList")
Dim RES As Object:  Set RES = CreateObject("System.Collections.ArrayList")
Dim SP() As String

For Each Key In keys
    AL.Add LCase(Key)
Next Key

With CreateObject("VBScript.RegExp")
    .Global = True
    .ignorecase = True
    .Pattern = "[A-Za-z]+"
    Set matches = .Execute(s)
    For i = 0 To matches.Count - 1
        SP = Split(matches(i), "'")
        For j = LBound(SP) To UBound(SP)
            If AL.contains(LCase(SP(j))) Then If Not RES.contains(SP(j)) Then RES.Add SP(j)
        Next j
    Next i
End With

If RES.Count > 0 Then
    getKeywords = Join(RES.toArray(), ", ")
Else
    getKeywords = "None Found"
End If
End Function
Thanks a million!
 
Upvote 0
if the text string has 6 instances of the word 'Cat', the function currently will return 'Cat, Cat, Cat, Cat, Cat, Cat'. Is it possible that it only shows the word once?

Thanks a million!
Just checking then that if a keyword is "dog" and the text is "Big DOG small dog Medium Dog" that you want all three versions of the word returned by the function?
 
Upvote 0
Just checking then that if a keyword is "dog" and the text is "Big DOG small dog Medium Dog" that you want all three versions of the word returned by the function?
Ideally, it'll return just one instance of the word, regardless of the case used.

So for example, if the string of text has 'COVID' and 'Covid', the code will return just the single 'Covid' (based on how it was spelt under the keywords column).

Many thanks again.
 
Upvote 0
Ideally, it'll return just one instance of the word, regardless of the case used.

So for example, if the string of text has 'COVID' and 'Covid', the code will return just the single 'Covid' (based on how it was spelt under the keywords column).
Might be worth trying some of the other suggestions made in the thread then since more than one of them seem to do exactly that? :)
 
Upvote 0
Hi,
I stumbled upon the solution of Peter in this thread and it worked for me like a charm, but I have one additional question:
I am using the VBA Code for German words, so it would be great if the code wouldn`t return the keywords in all lowercase letters but leave the capital letters in each word untouched. How do I change the Code so it accomplishes that?
Best regards and thank you!
Dag
 
Upvote 0
it would be great if the code wouldn`t return the keywords in all lowercase letters but leave the capital letters in each word untouched.
Welcome to the MrExcel board!

Could you give us a small but varied set of sample data and the expected results (with XL2BB so that we can easily copy for testing)?
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,176
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top