VBA to pull HTML table info

Scarlacc

New Member
Joined
Apr 29, 2022
Messages
15
Office Version
  1. 2016
  2. 2010
Platform
  1. Windows
Evening,
been working on some VBA to scrape some information from my company website. I'm kinda new to using excell to scrape for me. i got it to pull the tables i want into an excell workbook. i was wondering if someone could give me ideas on how i can just pull the information i want instead of the entire table. The table Rows = 9 cells i,m trying to pull the date crom cells 1 to ColA (This one is blank alot but some contain information), 2 to ColB , and 9 to ColC. Everything iv tried so far jugles info around when its extracting it to excel columns the tables can get decent size at times 50 + rows but notmally around 20 ish rows. will add couple images of HTML the method im using is the XMLHTTP to get the html document. im sure there is a better way to do this but still figuring out web scraping. Thanks in Advance for any ideas.


Here is the section of code that i'm using the pulls all the data in for me.

Set HTMLDiv = HTMLDoc.getElementById("pnlGridview")
Set HTMLTables = HTMLDiv.getElementsByTagName("table")

For Each HTMLTable In HTMLTables

For Each TableSection In HTMLTable.Children

For Each TableRow In TableSection.Children
ColNum = 1

'move col index +1 (Col A to B) if first cell is blank.
If IsEmpty(TableCell.innerText) Then
ColNum = ColNum + 1
GoTo Label1

'using to skip 3rd cell in eatch row.
ElseIf TableCell.innerText = True Then

GoTo Label1
'using to skip 3rd cell in eatch row.
ElseIf TableCell.innerText = "False" Then

GoTo Label1

'this will skip the first row in every table.
ElseIf TableCell.innerText = "Customer" Then

GoTo Label2

'this will exit the Sub nothing needed past the point of Boxes.
ElseIf TableCell.innerText = " Boxes " Then

GoTo Label3

Else

Sheet2.Cells(Rownum, ColNum).Value = TableCell.innerText
ColNum = ColNum + 1

End If
Label1:
Next TableCell

Rownum = Rownum + 1
Label2:
Next TableRow

Next TableSection

Next HTMLTable

Label3:

End Sub
 

Attachments

  • HTML.png
    HTML.png
    127.2 KB · Views: 35
  • HTML2.png
    HTML2.png
    117.9 KB · Views: 34

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
See if this helps. You'll have to make changes as per your actual needs. The code should loop through each table within the HTML document, and fill an array with the desired information. Then it creates a new worksheet within the workbook running the code, and transfers the information from the array to the newly created worksheet...

VBA Code:
    'Your code here
    '
    '

    Dim HTMLTable As Object
    Dim HTMLRow As Object
    Dim ResultsArray() As Variant
    Dim RowCount As Long
    
    RowCount = 0
    For Each HTMLTable In HTMLTables
        ReDim Preserve ResultsArray(0 To 2, 0 To (HTMLTable.Rows.Length + RowCount) - 1)
        For Each HTMLRow In HTMLTable.Rows
            ResultsArray(0, RowCount) = HTMLRow.Cells(0).innertext
            ResultsArray(1, RowCount) = HTMLRow.Cells(1).innertext
            ResultsArray(2, RowCount) = HTMLRow.Cells(8).innertext
            RowCount = RowCount + 1
        Next HTMLRow
    Next HTMLTable
    
    If RowCount > 0 Then
        Dim ws As Worksheet
        Set ws = ThisWorkbook.Worksheets.Add(before:=ThisWorkbook.Sheets(1))
        ws.Range("A1").Resize(UBound(ResultsArray, 2) + 1, UBound(ResultsArray, 1) + 1).Value = Application.Transpose(ResultsArray)
    Else
        MsgBox "No tables found!", vbInformation
    End If

Hope this helps!
 
Upvote 0
Solution
Worked great i had to add a if statment to handle skipping first row in table other then that pulled exactly what i wanted.
Appriciate the help.
 
Upvote 0

Forum statistics

Threads
1,224,820
Messages
6,181,159
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top