Hello,
I am trying to create an excel function that will identify two specific words embedded in messages of varying lengths (that I scraped from a message board online) and delete all the text in between those two words.
Specifically, the message board that I scraped the data from allowed users to quote other users and then respond to those quoted messages. The quoted text is always presented in the following way:
PREVIOUS MESSAGE AUTHOR Username said:
Message written by the previous author
Click to expand...
Message written by the CURRENT MESSAGE AUTHOR
Here is what it looks like in excel:
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]User235235 said:
This is the text I need deleted.
Click to expand...
This is the text I want to save.[/TD]
[/TR]
[TR]
[TD]User8888986655 said:
This is the text I need deleted. Blah blah Blah blah Blah blah Blah blah
Click to expand...
This is the text I want to save. Blah blah Blah blah Blah blah[/TD]
[/TR]
[TR]
[TD]User2222222222 said:
This needs to be deleted.
Click to expand...
Save this please[/TD]
[/TR]
</tbody>[/TABLE]
To complicate things further, sometimes there are more than one quoted messages from a previous author and the words "said:" and "Click to expand..." appear more than once in that case.
Therefore, I need an excel function that will identify the words " said:" and "Click to expand..." and delete all the words in between those two phrases (as well as those two words).
Any suggestions would be greatly appreciated. Thanks so much in advance!!
I am trying to create an excel function that will identify two specific words embedded in messages of varying lengths (that I scraped from a message board online) and delete all the text in between those two words.
Specifically, the message board that I scraped the data from allowed users to quote other users and then respond to those quoted messages. The quoted text is always presented in the following way:
PREVIOUS MESSAGE AUTHOR Username said:
Message written by the previous author
Click to expand...
Message written by the CURRENT MESSAGE AUTHOR
Here is what it looks like in excel:
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]User235235 said:
This is the text I need deleted.
Click to expand...
This is the text I want to save.[/TD]
[/TR]
[TR]
[TD]User8888986655 said:
This is the text I need deleted. Blah blah Blah blah Blah blah Blah blah
Click to expand...
This is the text I want to save. Blah blah Blah blah Blah blah[/TD]
[/TR]
[TR]
[TD]User2222222222 said:
This needs to be deleted.
Click to expand...
Save this please[/TD]
[/TR]
</tbody>[/TABLE]
To complicate things further, sometimes there are more than one quoted messages from a previous author and the words "said:" and "Click to expand..." appear more than once in that case.
Therefore, I need an excel function that will identify the words " said:" and "Click to expand..." and delete all the words in between those two phrases (as well as those two words).
Any suggestions would be greatly appreciated. Thanks so much in advance!!