Importing table data from 300 local HTML files. Can I do it faster than 3-4 minutes?

d0rian

Active Member
Joined
May 30, 2015
Messages
313
Office Version
  1. 365
Goal: getting data from 300 web pages (each has stock price data in exactly the same layout / format, a different stock on each page) into an Excel file.

My current process is pretty good, but takes 3-4 minutes total (during which Excel is "thinking" / hourglass-hanging and thus unusable for anything else.) Hoping for a different method that might cut that way down. Current steps:
  1. Use a Firefox extension to batch scrape all 300 pages (takes ~20 seconds, pretty happy with speed of this step) and download the content to a local directory (which it saves as .aspx files)
  2. Convert the .aspx files to .html files (via simple windows Rename batch function (1-2 seconds, nearly instantaneous). Each file is ~100kb, so all files together are 25-30MB.
  3. Here's the time-consuming step that I'm hoping to improve: I have a VBA script that loops through all 300 HTML files, parsing them and pasting the table data on the first empty row of a worksheet, such that it all ends up in a single table. This is the part that takes ~3 minutes during which Excel is hourglassing and can't do anything else. It seems like there should be a much faster way to accomplish this...yes, there are 300 files (~25MB in total), but they're all just sitting there locally in a single directory. Is there some method I haven't thought of that would suck all of them into Excel in less then 3 minutes?
Misc considerations:
  • I know some versions of Excel have a 'get data from web' wizard, but I don't think this is an option for me because the 300 pages I'm scraping are behind a password/login wall. (Moreover, even if they weren't, I don't think this would be much faster; I'm pretty happy with the ~20 seconds it's taking me to download all of the data locally...I just feel like it should be much faster to get that local data into Excel.)
  • I gather that it's the HTML formatting / parsing that's causing the slowdown? I don't think I have any option to scrape/download the data in a format other than aspx / html. Is parsing the data locally with perhaps a different, non-excel program better? I tried aggregating the 300 100kb HTML files into a single 25MB HTML file and then pointing Excel at that, but it didn't help much speed-wise.
Open to any out-of-the-box suggestions.

TLDR: I have 300 local HTML files with table data in the same format. Seems like I should be able to get it into Excel faster than the ~3 minutes it currently takes, but don't know how.
 

Excel Facts

Can you sort left to right?
To sort left-to-right, use the Sort dialog box. Click Options. Choose "Sort left to right"
Excel 2007, though i have access to Excel 365, which if memory serves has enhanced import-data-from-web functionality, though as I wrote in the OP, I'm not sure that will work since I don't think that can grab data from behind a paywall. (Not sure what you mean by 'confidential'; as i mentioned, the HTML pages that i'm scraping are behind a login wall since it's a paid service).

But just taking a step back for a moment, as I wrote in the OP, I've got the scraping/downloading parts of the process completing in a relatively quick ~20 seconds...I just have to think there's some method of importing data from 300 local HTML files that'd be quicker than the 3 mins it's currently taking me.
 
Upvote 0
if they are not confidential could you zip them and share it via onedrive, googledrive, dropbox or any similar service and paste link to this file here
 
Upvote 0
TLDR: I have 300 local HTML files with table data in the same format. Seems like I should be able to get it into Excel faster than the ~3 minutes it currently takes, but don't know how.

In general, it is recommended to get worksheet access for reading en writing as few times as possible. Using arrays can be helpful in this. Within time consuming procedures it is also recommended (in connection with the architecture and behaviour of a multi-tasking OS like the Windows OS) to let the OS know at regular intervals that your VBA is still running by yielding to the OS using the DoEvents statement.
 
Upvote 0

Forum statistics

Threads
1,223,705
Messages
6,173,985
Members
452,540
Latest member
haasro02

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top