Goal: getting data from 300 web pages (each has stock price data in exactly the same layout / format, a different stock on each page) into an Excel file.
My current process is pretty good, but takes 3-4 minutes total (during which Excel is "thinking" / hourglass-hanging and thus unusable for anything else.) Hoping for a different method that might cut that way down. Current steps:
TLDR: I have 300 local HTML files with table data in the same format. Seems like I should be able to get it into Excel faster than the ~3 minutes it currently takes, but don't know how.
My current process is pretty good, but takes 3-4 minutes total (during which Excel is "thinking" / hourglass-hanging and thus unusable for anything else.) Hoping for a different method that might cut that way down. Current steps:
- Use a Firefox extension to batch scrape all 300 pages (takes ~20 seconds, pretty happy with speed of this step) and download the content to a local directory (which it saves as .aspx files)
- Convert the .aspx files to .html files (via simple windows Rename batch function (1-2 seconds, nearly instantaneous). Each file is ~100kb, so all files together are 25-30MB.
- Here's the time-consuming step that I'm hoping to improve: I have a VBA script that loops through all 300 HTML files, parsing them and pasting the table data on the first empty row of a worksheet, such that it all ends up in a single table. This is the part that takes ~3 minutes during which Excel is hourglassing and can't do anything else. It seems like there should be a much faster way to accomplish this...yes, there are 300 files (~25MB in total), but they're all just sitting there locally in a single directory. Is there some method I haven't thought of that would suck all of them into Excel in less then 3 minutes?
- I know some versions of Excel have a 'get data from web' wizard, but I don't think this is an option for me because the 300 pages I'm scraping are behind a password/login wall. (Moreover, even if they weren't, I don't think this would be much faster; I'm pretty happy with the ~20 seconds it's taking me to download all of the data locally...I just feel like it should be much faster to get that local data into Excel.)
- I gather that it's the HTML formatting / parsing that's causing the slowdown? I don't think I have any option to scrape/download the data in a format other than aspx / html. Is parsing the data locally with perhaps a different, non-excel program better? I tried aggregating the 300 100kb HTML files into a single 25MB HTML file and then pointing Excel at that, but it didn't help much speed-wise.
TLDR: I have 300 local HTML files with table data in the same format. Seems like I should be able to get it into Excel faster than the ~3 minutes it currently takes, but don't know how.