Remove list of keywords in Sentence

goble

New Member
Joined
Oct 7, 2010
Messages
49
Office Version
  1. 2019
Platform
  1. Windows
Hello,

this has been asked before but my problem is a bit different. I want to get rid of some words in a column using a Macro (or formulae)

textrevised textText to remove
This is a sentence containing keywords. For Analysis, for general.sentence,keywords,analysis,generalis
this is a General Analysisgeneral,analysisthis
a
containing
for

Notes:
  1. Revised text should not contain spaces i.e word1,word2
  2. "This" (starting with capital letter) has been removed even though "this" lower case has been specified
  3. Column "Text to remove" includes the list of predefined words that will need to be removed
  4. "Analysis" has not (or part of it) been removed even though it includes "is"
  5. "." and "," should be removed but result should be delimited with ","
  6. Revised text should all be in lower case.
P.S I know I'm asking a lot ! But thanks in advance ;-)
 

Excel Facts

Do you hate GETPIVOTDATA?
Prevent GETPIVOTDATA. Select inside a PivotTable. In the Analyze tab of the ribbon, open the dropown next to Options and turn it off
A couple of options
+Fluff 1.xlsm
ABCD
1textText to remove
2This is a sentence containing keywords. For Analysis, for general.issentence,keywords,analysis,generalsentence,keywords,analysis,general
3this is a General Analysisthisgeneral,analysisgeneral,analysis
4a
5containing
6for
Master
Cell Formulas
RangeFormula
C2:C3C2=TEXTJOIN(",",,FILTERXML("<k><m>"&LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,".",""),",","")," ","</m><m>"))&"</m></k>","//m[(.!='"&$B$2&"')and(.!='"&$B$3&"')and(.!='"&$B$4&"')and(.!='"&$B$5&"')and(.!='"&$B$6&"')]"))
D2:D3D2=SUBSTITUTE(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(" "&LOWER(A2)&" ",".",""),",","")," "&$B$2&" "," ")," "&$B$3&" "," ")," "&$B$4&" "," ")," "&$B$5&" "," ")," "&$B$6&" "," "))," ",",")
 
Upvote 0
Another option
Excel Formula:
=TEXTJOIN(",",,FILTERXML("<k><m>"&TEXTJOIN("</m><m>",0,LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,".",""),",","")," ","</m><m>")),$B$2:$B$6)&"</m></k>","//m[not(. =following-sibling::m)][position()<=last()-"&ROWS($B$2:$B$6)&"]"))
 
Upvote 0
Depending on just what the original texts might be like**, a third option might be a user-defined function.

VBA Code:
Function Remove(s As String, rkeys As Range) As String
  With CreateObject("VBScript.RegExp")
    .Global = True
    .Pattern = "[^A-Za-z]"
    Remove = LCase(.Replace(s, " "))
    .Pattern = "\b(" & Join(Application.Transpose(rkeys), "|") & ")\b"
    Remove = Replace(Application.Trim(.Replace(Remove, "")), " ", ",")
  End With
End Function

goble.xlsm
ABC
1textText to remove
2This is a sentence containing keywords. For Analysis, for general.issentence,keywords,analysis,general
3this is a General Analysisthisgeneral,analysis
4a
5containing
6for
Sheet1
Cell Formulas
RangeFormula
C2:C3C2=Remove(A2,B$2:B$6)



** My comments about the original texts is about whether it is possible to have texts like these and, if so, what results you would want.
This is John-Paul O'Connor's analysis!
What is this? Is it a 6?
 
Upvote 0
Solution
Hi and thanks for these incredible suggestions !

@Fluff, I forgot to specify that the list of "Text to remove" will be a long one. So unfortunately, solution 1 & 2 (in your 1st post) might not be appropriate for me.
The list of "Text to remove" will increase (when I find new ones).I noticed that for solution 3, :
  • if I have the same unwanted words twice in the list. Some allowed text might be removed.
  • If I define the range longer (ex. (B2:B20) and that I haven't got any unwanted text (yet) in some cells. I will get an error.
@Peter_SSs , thanks for your solution working great !
Actually, the texts are meta descriptions of web pages. I need to tag each and every page by their keywords (referred earlier as revised text). For example: The page description is : Beautiful flower in the Forest. My tags will be flower,forest. I won't have complicated sentences like you mentioned.

Thanks again to you guys ! ;-)
 
Upvote 0
Hi again @Peter_SSs,

even though you have find out the right solution for me. There are minor issues that I've noticed later when working with the data.

Example :
"4k desktop wallpaper" will become "k,desktop,wallpaper"
"state-of-the-art technique" will become "state,art,technique" (assuming "of" and "the" are in the list of texts to remove).

I was thinking that I could have a list of words to replace and then get a Marco/Formula which will take care of above (through a second process).

1st Result (as per above)Replace Words (Defined here)Replace by (Defined here)2nd Result
state,art,techniquestate,artstate-of-the-artstate-of-the-art,technique

So the column D "2nd result" would be like if data in column B is contained in A, then replace by data in column C.

Thanks !
 
Upvote 0
Example :
"4k desktop wallpaper" will become "k,desktop,wallpaper"
"state-of-the-art technique" will become "state,art,technique" (assuming "of" and "the" are in the list of texts to remove).
They (digits and hyphens) are exactly the sort of examples I suggested. :)

What do you want them to become? What is the logic of deciding what to remove and what not to remove?
 
Upvote 0
Hi and thanks,

since there might be exceptions where manual interventions will be required for example: "Building in Las Vegas" should be "building,las vegas" (and not "building,las,vegas") I was thinking of getting a 2nd formula which will fetch from a list the only the words that I would like to change from the results (following your macro).

So to summarize I don't want changes in your VBA code but rather a 2nd process so I can intervene (where required).

Make sense ? ;-)
 
Upvote 0
Hello,

let's take an example.

ABCDE
1Original textResult2nd ProcessSpecial wordsTo replace by
2Building in Miami Floridabuilding,miami,floridabuilding,miami,floridalas,vegaslas vegas
3Building in Las Vegasbuilding,las,vegasbuilding,las vegaslos,angeleslos angeles
4Flowers in Las Vegasflowers,las,vegasflowers,las vegas
5Flowers in Los Angelesflowers,los,angelesflowers,los angeles

From your Macro, the result at B2 is what I was expecting. However I need to have a 2nd process for B3 and B4 which will result as C3 and C4.

The idea is that after the Marco gives the result (example B3) , whatever texts listed in column D (for example "las,vegas" in D2) is looked up in column B and if found replaced by corresponding replacement text (in this case E2). So after that, column C3 will output "building,las vegas".
 
Upvote 0

Forum statistics

Threads
1,224,822
Messages
6,181,165
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top