Using XMLHttpRequest to scrape data

jryan15

Board Regular
Joined
Jan 27, 2005
Messages
168
Hi all,

I am working on importing some data from the internet and came across a site which had a seemingly good example to start from. I enabled the reference "Microsoft XML 3.0" and then tried "Microsoft XML 6.0", but every time I get an error, "user type not defined" on the header of the WriteTable sub. I think it is a problem with the HTMLtable but not sure. Any help would be very much appreciated.

Code:
Option Explicit

Public Sub GetRates()
    Dim html As HTMLDocument, hTable As HTMLTable '<== Tools > References > Microsoft HTML Object Library
    
    Set html = New HTMLDocument
      
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.barchart.com/options/price-history/daily-prices?symbol=msft&strikePrice=155.00&symbolType=C&expirationDate=2019-12-20", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" 'to deal with potential caching
        .send
        html.body.innerHTML = .responseText
    End With
    
    Application.ScreenUpdating = False
    
    Set hTable = html.getElementById("cr1")
    WriteTable hTable, 1, ThisWorkbook.Worksheets("Sheet1")
    
    Application.ScreenUpdating = True
End Sub

Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
    Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, C As Long, tBody As Object
    r = startRow: If ws Is Nothing Then Set ws = ActiveSheet
    With ws
        Dim headers As Object, header As Object, columnCounter As Long
        Set headers = hTable.getElementsByTagName("th")
        For Each header In headers
            columnCounter = columnCounter + 1
            Select Case columnCounter
            Case 2
                .Cells(startRow, 1) = header.innerText
            Case 8
                .Cells(startRow, 2) = header.innerText
            End Select
        Next header
        startRow = startRow + 1
        Set tBody = hTable.getElementsByTagName("tbody")
        For Each tSection In tBody
            Set tRow = tSection.getElementsByTagName("tr")
            For Each tr In tRow
                r = r + 1
                Set tCell = tr.getElementsByTagName("td")
                C = 1
                For Each td In tCell
                    Select Case C
                    Case 2
                        .Cells(r, 1).Value = td.innerText
                    Case 8
                        .Cells(r, 2).Value = td.innerText
                    End Select
                    C = C + 1
                Next td
            Next tr
        Next tSection
    End With
End Sub
 

Excel Facts

Which Excel functions can ignore hidden rows?
The SUBTOTAL and AGGREGATE functions ignore hidden rows. AGGREGATE can also exclude error cells and more.
1. Did you set this reference ?:
Dim html As HTMLDocument, hTableX As HTMLTable '<== Tools > References > Microsoft HTML Object Library
2. To get historical data (as I suspect you want to), you have to be log in (if I am correctly read this page)
 
Upvote 0
1. Did you set this reference ?:
Dim html As HTMLDocument, hTableX As HTMLTable '<== Tools > References > Microsoft HTML Object Library
2. To get historical data (as I suspect you want to), you have to be log in (if I am correctly read this page)

Awesome, thank you! The code now compiles :-). I wish to adapt this for a different site, but even get it to start so now I can try to trudge forward.
 
Upvote 0
Well, I knew it wouldn't be easy... Hopefully someone might be able to help guide me in locating a table from this link. I do not see a line in the console code that says "table id". I found a few places that reference "table", "tr", "bc-datatable", but when I change the line "Set hTable = html.getElementById("bc-datatable")", the hTable is nothing everytime...

Also to probably complicate things more. When I open a fresh chrome browser and paste the link above, the ticker, date, and strike fields automatically populate and the page refreshes with data. When I put the same link in IE, the page doesn't load the data and seems to not even have the ticker box. I am logged in on IE so it's not that. Even when I try to click the drop down menus in IE none of them even work.

Oh and let me know if it's better that I create separate threads for these issues... I know that the site filters solved items and KOKOSEK certainly solved my initial issue.

Thanks!
 

Attachments

  • more_table_1024x592.jpg
    more_table_1024x592.jpg
    144.3 KB · Views: 49
  • table_not_getting_date_1024x598.jpg
    table_not_getting_date_1024x598.jpg
    139.2 KB · Views: 48
Upvote 0
I can't get this table as I am not logged in. So I've get total different site content.
 
Upvote 0
I was able to run this code on the site, but all it did was pull in a bunch of meta data and did not have any of the data from the table. If I could just get a screen scrape, I could probably parse the data in code, but I can't seem to extract it.

VBA Code:
Private Sub HTML_VBA_Extract_Data_From_Website_To_Excel()
    Dim oXMLHTTP As Object
    Dim sPageHTML  As String
    Dim sURL As String

    'Change the URL before executing the code. URL to Extract data from.
    sURL = "http://WWW.WebSiteName.com"
    sURL = "https://www.barchart.com/options/price-history/daily-prices?symbol=msft&strikePrice=155.00&symbolType=C&expirationDate=2019-12-20"

    'Extract data from website to Excel using VBA
    Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    oXMLHTTP.Open "GET", sURL, False
    oXMLHTTP.send
    sPageHTML = oXMLHTTP.responseText

    'Get webpage data into Excel
    ThisWorkbook.Worksheets("Sheet3").Cells(1, 1) = sPageHTML

    MsgBox "XMLHTML Fetch Completed"

End Sub
 
Upvote 0
What if you check all TABLE tags (data will be show in immediate window in VBA editor) :
VBA Code:
Sub grabMyData()

    Dim http As New XMLHTTP60, html As New HTMLDocument, x As Long
    sURL = "https://www.barchart.com/options/price-history/daily-prices?symbol=msft&strikePrice=155.00&symbolType=C&expirationDate=2019-12-20"
    
    With http
        .Open "GET", sURL, False
        .send
        html.body.innerHTML = .responseText
    End With

    Do
        x = x + 1
        On Error Resume Next
        Debug.Print x, html.getElementsByTagName("TABLE")(x).innerText
    Loop Until Err.Number = 91
End Sub

You need references as below:

1636099728235.png
 
Upvote 0
Hi again! I ran the code, but nothing returned in debug.print. I even moved the output to a worksheet and nothing. The http.responseText did come back with a lot of stuff, but the data from the table was not present.

FYI - I created a temp. login and PM'd you. If others are interested in helping, let me know and I can PM the temp login.
 
Upvote 0

Forum statistics

Threads
1,225,626
Messages
6,186,093
Members
453,337
Latest member
fiaz ahmad

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top