How to Scrape HTTP to Excel with VBA

jonathanwang003

Board Regular
Joined
May 9, 2014
Messages
133
I saw a snippet of a macro where data from a website was scraped into a spreadsheet. I wasn't sure how it worked so I I added a sample URL and the section of code that could somewhat understand. How would we print the data to a spreadsheet?

VBA Code:
Public HttpRequest As WinHttp.WinHttpRequest
Public qURL As String

Sub Download_Table()
    
    Set HttpRequest = New WinHttp.WinHttpRequest
    
    'Sample URL
    qURL = "https://query1.finance.yahoo.com/v7/finance/download/ANET?period1=1619755602&period2=1651291602&interval=1d&events=history&includeAdjustedClose=true"
    
    With HttpRequest
    .Open "GET", qURL, False
    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
    .send
    .waitForResponse (10)
    End With

End Sub
 

Excel Facts

Spell Check in Excel
Press F7 to start spell check in Excel. Be careful, by default, Excel does not check Capitalized Werds (whoops)
Hello

You need to declare an additional variable below the line "Sub Download_Table()" as follows to store the value that will be returned from the URL:
Dim MyResult as String

Then below the line ".waitForResponse (10)" add the following line to actually store the value returned from the URL:
MyResult = .ResponseText

To then write the value into say cell "A2" on "Sheet1" then below the line "End With", add this line:
ThisWorkbook.Sheets("Sheet1").Range("A2").Value = MyResult

Based on that website you will get a header line and a lot of data below that, all in the same cell. What are your Excel and/or VBA skills like for splitting that out into multiple lines? I presume you will want the data in multiple lines....

...by my reckoning that has returned 254 lines of data - it will be easier to split that in VBA than within Excel. In addition, it is common practice to check the response text for errors before relying on the data.

I trust that helps

VBA Code:
Public HttpRequest As WinHttp.WinHttpRequest
Public qURL As String

Sub Download_Table()
   
    Dim MyResult As String
   
    Set HttpRequest = New WinHttp.WinHttpRequest
   
    'Sample URL
    qURL = "https://query1.finance.yahoo.com/v7/finance/download/ANET?period1=1619755602&period2=1651291602&interval=1d&events=history&includeAdjustedClose=true"
   
    With HttpRequest
    .Open "GET", qURL, False
    .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
    .send
    .waitForResponse (10)
    MyResult = .ResponseText
    End With

ThisWorkbook.Sheets("Sheet1").Range("A2").Value = MyResult

End Sub
 
Upvote 0
Hi Andrew,

That's very helpful. I appreciate learning best practices because it helps to know why each step is done the way it is rather than what I did above--grab would I could decipher--and make an attempt at achieving the task.

You were right, it comes in all into one cells. I'm not familiar with splitting. I did hear the folks talking about this snippet that it's important to identify the number of columns and rows in the table we're scraping. How would I be able to do that?
 
Upvote 0
Hello Jonathan

Following is some code that does what you want. I'm glad I found this thread because I have been looking at doing the exact same thing for myself.

For anyone else looking at this, the OP did not state this and nor have I (yet), but you must enable the reference to "Microsoft WinHTTP Services, version 5.1" under Tools > References in the VBA screen.

I think I can guess the next question so I am already working on that for myself.

Here is the adapted code that extracts the data from Yahoo using a sample URL and outputs the results into Sheet1. Change the constant "Sheet1" towards the top of the code if you want the data to be output to another sheet.

Hopefully that helps.
Andrew

VBA Code:
Option Explicit
Option Compare Text

Sub Download_Table()
    
Const UnixDate As Date = #1/1/1970#
Const shtOutput As String = "Sheet1"          '<--- CHANGE THIS IF YOU WANT THE OUTPUT ELSEWHERE

Dim HttpRequest As WinHttp.WinHttpRequest, qURL As String
'variables for string manipulation:
Dim MyResult As String, arrResult() As Variant, tmpArr As Variant
Dim tmpStr As String, Pos1 As Long, Pos2 As Long
'loop counters:
Dim LC As Long, LL As Long, LC2 As Long, colCount As Long
'output variables:
Dim Out1 As String, Out2 As Double, Out3 As Double, rngOut As Range
    
    Set HttpRequest = New WinHttp.WinHttpRequest
    
    'Sample URL
    qURL = "https://query1.finance.yahoo.com/v7/finance/download/ANET?period1=1619755602&period2=1651291602&interval=1d&events=history&includeAdjustedClose=true"
    
    'Extract data from webpage
    With HttpRequest
        .Open "GET", qURL, False
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
        .send
        .waitForResponse (10)
        MyResult = .ResponseText
    End With

    'Reset variables
    LL = Len(MyResult) - Len(Replace(MyResult, Chr(10), "")) 'number of rows (is actually 1 more but array starts at zero)
    tmpStr = Left$(MyResult, InStr(1, MyResult, Chr(10)))
    colCount = Len(tmpStr) - Len(Replace(tmpStr, ",", "")) 'number of columns (ditto)
    ReDim arrResult(LL, colCount)
    Pos1 = 1
    
    'Extract data from the return value into a 2D array
    For LC = 0 To LL
        Pos2 = InStr(Pos1 + 1, MyResult, Chr(10))
        If Pos2 = 0 Then
            Pos2 = Len(MyResult) + 1
        End If
        tmpStr = Mid$(MyResult, Pos1, Pos2 - Pos1)
        tmpArr = Split(tmpStr, ",")
        For LC2 = 0 To colCount
            arrResult(LC, LC2) = tmpArr(LC2)
        Next LC2
        Pos1 = Pos2 + 1
    Next LC

    'Get the details of what was extracted from Yahoo Finance
    Pos1 = InStr(1, qURL, "download") + 8
    Pos2 = InStr(Pos1, qURL, "?") - 1
    Out1 = Mid$(qURL, Pos1 + 1, Pos2 - Pos1)
    Pos1 = InStr(Pos2, qURL, "=")
    Pos2 = InStr(Pos1, qURL, "&")
    Out2 = CDbl(Mid$(qURL, Pos1 + 1, Pos2 - Pos1 - 1))
    Pos1 = InStr(Pos2, qURL, "=")
    Pos2 = InStr(Pos1, qURL, "&")
    Out3 = CDbl(Mid$(qURL, Pos1 + 1, Pos2 - Pos1 - 1))
    
    'Output the results
    With ThisWorkbook.Sheets(shtOutput)
        .Range("A1").Value = "Code:"
        .Range("B1").Value = Out1
        .Range("A2").Value = "From"
        .Range("B2").Value = CDate(UnixDate + (Out2 / 86400))
        .Range("A3").Value = "To"
        .Range("B3").Value = CDate(UnixDate + (Out3 / 86400))
        Set rngOut = .Range("A4")
        rngOut.Resize(LL + 1, colCount + 1) = arrResult
    End With
    
    MsgBox "Finished", vbInformation, "Done."

Exit_Here:
    Exit Sub

Error_Handler:
    MsgBox "There has been an unexepcted error." & vbCrLf & vbCrLf & Err.Description, vbCritical, "VBA runtime error " & Err.Number
    Exit Sub

End Sub
 
Upvote 0
Solution
You're welcome. If this solves your question then please tag the relevant post as being the solution.
 
Upvote 0

Forum statistics

Threads
1,223,908
Messages
6,175,304
Members
452,633
Latest member
DougMo

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top