Hi forum, any help on this, including approaches I haven't considered, would be helpful. I scrape several websites to get raw data on foreclosure sales. One such website has changed their formatting and royally hampered my data gathering efforts. My starting sheet contains these data points:
[TABLE="width: 700"]
<tbody>[TR]
[TD]County[/TD]
[TD]Sale Date/Time[/TD]
[TD]Opening Bid[/TD]
[TD]State[/TD]
[TD]SP #[/TD]
[TD]File #[/TD]
[TD]Address[/TD]
[TD]Book/Page[/TD]
[/TR]
[TR]
[TD]Person[/TD]
[TD]04/08/2019 100000 AM[/TD]
[TD]69358.36[/TD]
[TD]NC[/TD]
[TD]16 SP 134[/TD]
[TD]16-12617-FC01[/TD]
[TD]500 Virgilina Road Roxboro, North Carolina 27573[/TD]
[TD]872/3[/TD]
[/TR]
</tbody>[/TABLE]
Simple enough and was not difficult before. Now, when copying the information directly from the website (displayed horizontally, btw) each individual entry is copied and pasted vertically in this format:
[TABLE="width: 300"]
<tbody>[TR]
[TD]County:[/TD]
[/TR]
[TR]
[TD]Person[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Sale Date:[/TD]
[/TR]
[TR]
[TD]04/08/201910:00:00 AM[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]State:[/TD]
[/TR]
[TR]
[TD]NC[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Court SP#:[/TD]
[/TR]
[TR]
[TD]16 SP 134[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Case #:[/TD]
[/TR]
[TR]
[TD]16-12617-FC01[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Address:[/TD]
[/TR]
[TR]
[TD]500 Virgilina Road Roxboro, NC 27573[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Opening Bid Amount:[/TD]
[/TR]
[TR]
[TD]69538.36[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Book Page:[/TD]
[/TR]
[TR]
[TD]872/3[/TD]
[/TR]
</tbody>[/TABLE]
It's almost as if they are nesting multiple separate tables for each and every entry that they have (the headers get re-copied each time [County, Sale Date, etc.]). I could not ascertain a formula to extract necessary data in adjacent rows and have very limited VBA skills although I love to learn. I essentially labored over it by selecting all blank rows and deleting to compact the data and then used "Find an Replace" function to get rid of all the header info before selecting each entry down to the county data which starts the next entry.
Don't know if I have explained well, appreciate any assistance as there has to be a more efficient way of extracting that data.
[TABLE="width: 700"]
<tbody>[TR]
[TD]County[/TD]
[TD]Sale Date/Time[/TD]
[TD]Opening Bid[/TD]
[TD]State[/TD]
[TD]SP #[/TD]
[TD]File #[/TD]
[TD]Address[/TD]
[TD]Book/Page[/TD]
[/TR]
[TR]
[TD]Person[/TD]
[TD]04/08/2019 100000 AM[/TD]
[TD]69358.36[/TD]
[TD]NC[/TD]
[TD]16 SP 134[/TD]
[TD]16-12617-FC01[/TD]
[TD]500 Virgilina Road Roxboro, North Carolina 27573[/TD]
[TD]872/3[/TD]
[/TR]
</tbody>[/TABLE]
Simple enough and was not difficult before. Now, when copying the information directly from the website (displayed horizontally, btw) each individual entry is copied and pasted vertically in this format:
[TABLE="width: 300"]
<tbody>[TR]
[TD]County:[/TD]
[/TR]
[TR]
[TD]Person[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Sale Date:[/TD]
[/TR]
[TR]
[TD]04/08/201910:00:00 AM[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]State:[/TD]
[/TR]
[TR]
[TD]NC[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Court SP#:[/TD]
[/TR]
[TR]
[TD]16 SP 134[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Case #:[/TD]
[/TR]
[TR]
[TD]16-12617-FC01[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Address:[/TD]
[/TR]
[TR]
[TD]500 Virgilina Road Roxboro, NC 27573[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Opening Bid Amount:[/TD]
[/TR]
[TR]
[TD]69538.36[/TD]
[/TR]
[TR]
[TD][/TD]
[/TR]
[TR]
[TD]Book Page:[/TD]
[/TR]
[TR]
[TD]872/3[/TD]
[/TR]
</tbody>[/TABLE]
It's almost as if they are nesting multiple separate tables for each and every entry that they have (the headers get re-copied each time [County, Sale Date, etc.]). I could not ascertain a formula to extract necessary data in adjacent rows and have very limited VBA skills although I love to learn. I essentially labored over it by selecting all blank rows and deleting to compact the data and then used "Find an Replace" function to get rid of all the header info before selecting each entry down to the county data which starts the next entry.
Don't know if I have explained well, appreciate any assistance as there has to be a more efficient way of extracting that data.