extract data tables from html files into excel worksheet

abhay_547

Board Regular
Joined
Sep 12, 2009
Messages
179
I got the below macro from a old thread. I have the lines on the html page files saved on c drive, which are supposed to be in a table format but they are not and there is no separator between each column data except some undefined spaces as shown in below image, so how can i import it as a proper table .i.e. into separate columns in the excel file, also i have multiple such html files so can i import multiple files into one worksheet at one go.


Code:
Sub WebScraping()

    Dim x As Integer, p As Integer, MyTable As Object    'MSHTML.HTMLTable


    ActiveSheet.UsedRange.Clear


    On Error GoTo ErrH


    s = Timer


    With CreateObject("InternetExplorer.Application")


        .Navigate "http://www.beb.de/internet/servlet/content/5868/StartpageEnglish/Gasstorage/Capacities/Storageutilization/Harsefeld.html"


        Do While .Busy: Loop


        'Loop through all the elements in the document
        For Each MyTable In .Document.getElementsByTagName("table")
            With MyTable
                If .className = "inntertab" Then
                    ReDim q(.Rows.Length - 1, 5)
                    For x = 0 To .Rows.Length - 1
                        For p = 0 To .Rows(x).Cells.Length - 1
                            q(x, p) = .Rows(x).Cells(p).innerText
                        Next
                    Next
                End If
            End With


        Next


        [A1].Resize(x, p) = q


ErrH:         .Quit


    End With


    Set MyTable = Nothing: Erase q


    MsgBox Round(Timer - s, 2)


End Sub


gWSJUG
 

Excel Facts

Which Excel functions can ignore hidden rows?
The SUBTOTAL and AGGREGATE functions ignore hidden rows. AGGREGATE can also exclude error cells and more.
I got the below macro from a old thread. I have the lines on the html page files saved on c drive, which are supposed to be in a table format but they are not and there is no separator between each column data except some undefined spaces as shown in below image, so how can i import it as a proper table .i.e. into separate columns in the excel file, also i have multiple such html files so can i import multiple files into one worksheet at one go.


Code:
Sub WebScraping()

    Dim x As Integer, p As Integer, MyTable As Object    'MSHTML.HTMLTable


    ActiveSheet.UsedRange.Clear


    On Error GoTo ErrH


    s = Timer


    With CreateObject("InternetExplorer.Application")


        .Navigate "http://www.beb.de/internet/servlet/content/5868/StartpageEnglish/Gasstorage/Capacities/Storageutilization/Harsefeld.html"


        Do While .Busy: Loop


        'Loop through all the elements in the document
        For Each MyTable In .Document.getElementsByTagName("table")
            With MyTable
                If .className = "inntertab" Then
                    ReDim q(.Rows.Length - 1, 5)
                    For x = 0 To .Rows.Length - 1
                        For p = 0 To .Rows(x).Cells.Length - 1
                            q(x, p) = .Rows(x).Cells(p).innerText
                        Next
                    Next
                End If
            End With


        Next


        [A1].Resize(x, p) = q


ErrH:         .Quit


    End With


    Set MyTable = Nothing: Erase q


    MsgBox Round(Timer - s, 2)


End Sub


gWSJUG

can anyone advise on this
 
Upvote 0

Forum statistics

Threads
1,223,243
Messages
6,170,967
Members
452,371
Latest member
Frana

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top