Hi,
I want to make a macro to search for repeating blocks of text in a text file. I know how to load the text file to memory but I don't know how to find the repeated text.
I've seen examples of searching for text using regular expressions which seems like it might be the way to go, but I don't know how to use them correctly.
The text file consists of HTML. I have tried to access the data with PowerQuery but I can't get it to work. Furthermore, some of the data I need is in the tag attributes and so is not displayed by Excel if opened as an HTML file.
The text file looks like this (simplified for illustration):
The data I need to pick out in the above example are "Marker #6", "Small_Images/045854_157363913466558000_16438.jpg" and "Nov 08, 2019 09:07 am"
The text pattern repeats dozens of times with different details. Thus the next "Marker #" could be "Marker #6" or another number such as "Marker #5". The next "Small_Images" will always be different as will the "Taken on" date and time. The text file does not have consistent line breaks so you can't rely on looping through one line at a time. However, the order of the tags is always the same.
I want the output to be on a new sheet with one row for each data block.
E.g. A1 = "Marker #6" B1 = "'Small_Images/045854_157363913466558000_16438.jpg" and C1 = "Nov 08, 2019 09:07 am". The next block will start at A2 and so on.
I know how to put the output on the new sheet, I just don't know how to extract it from the text file.
It seems to be a similar problem to the one on this thread but I don't have the ":" before each piece of data. Instead I presume I could search for each instance of "Marker #", "src=" and "Taken on:"
Any help would be much appreciated.
I want to make a macro to search for repeating blocks of text in a text file. I know how to load the text file to memory but I don't know how to find the repeated text.
I've seen examples of searching for text using regular expressions which seems like it might be the way to go, but I don't know how to use them correctly.
The text file consists of HTML. I have tried to access the data with PowerQuery but I can't get it to work. Furthermore, some of the data I need is in the tag attributes and so is not displayed by Excel if opened as an HTML file.
The text file looks like this (simplified for illustration):
Rich (BB code):
<div>Images from "Marker #6"</div><img src='Small_Images/045854_157363913466558000_16438.jpg' /></a></div>
<span class='image_taken_on'>Taken on: Nov 08, 2019 09:07 am</span></div>
The data I need to pick out in the above example are "Marker #6", "Small_Images/045854_157363913466558000_16438.jpg" and "Nov 08, 2019 09:07 am"
The text pattern repeats dozens of times with different details. Thus the next "Marker #" could be "Marker #6" or another number such as "Marker #5". The next "Small_Images" will always be different as will the "Taken on" date and time. The text file does not have consistent line breaks so you can't rely on looping through one line at a time. However, the order of the tags is always the same.
I want the output to be on a new sheet with one row for each data block.
E.g. A1 = "Marker #6" B1 = "'Small_Images/045854_157363913466558000_16438.jpg" and C1 = "Nov 08, 2019 09:07 am". The next block will start at A2 and so on.
I know how to put the output on the new sheet, I just don't know how to extract it from the text file.
It seems to be a similar problem to the one on this thread but I don't have the ":" before each piece of data. Instead I presume I could search for each instance of "Marker #", "src=" and "Taken on:"
Any help would be much appreciated.