How to clean scrape a website for financial data?

jonathanwang003

Board Regular
Joined
May 9, 2014
Messages
130
I used to scrape financial statements cleanly into spreadsheets by examining the HTML as a table and ensuring there is data by checking the number of columns and rows. This code used to work, but I'm redirecting to advfn.com and I'm having trouble properly grabbing the table.

Code:
'Download Info
Public oXML As MSXML2.XMLHTTP
Public oHTML As MSHTML.HTMLDocument
Public lngRow As Long
Public lngCol As Long
Public lngTable As Long
Public lngX As Long
Public strUrl(1 To 6) As String
Public rngPasteDest As Range

sub Download_Financials()

    Set oXML = New MSXML2.XMLHTTP
    Set oHTML = New MSHTML.HTMLDocument
    Set rngPasteDest = Range("A1")
    
    strUrl(1) = "https://www.advfn.com/stock-market/NASDAQ/SMCI/financials"
    
    With oXML
        .Open "GET", strUrl(1), False
        .send
        oHTML.body.innerHTML = .ResponseText
    End With
    lngX = 0
    
    With oHTML.getElementsByTagName("Table 1")
    
        If .Length <= 1 Then
            If AttemptCount >= 10 Then
                AttemptCount = 0
                sheets(1).Cells.Clear
                Exit Sub
            End If
        
            AttemptCount = AttemptCount + 1
            GoTo ParsePage
        End If
        
        For lngTable = 0 To .Length - 1
            For lngRow = 0 To .Item(lngTable).Rows.Length - 1
                For lngCol = 0 To .Item(lngTable).Rows(lngRow).Cells.Length - 2
                
                rngPasteDest.Offset(lngRow + lngX, lngCol).Value = .Item(lngTable).Rows(lngRow).Cells(lngCol).innerText
                
                Next lngCol
            Next lngRow
            lngX = lngRow + lngX
        Next lngTable
        
    End With
    
    Set rngPasteDest = Nothing

End Sub



This code is a recording of the macro where I can dump the table I want using the data import from web feature in Excel, but it's incredibly sloppy. Is this something that can be cleaned up?


VBA Code:
    ActiveWorkbook.Queries.Add Name:="Table 1 (2)", Formula:= _
        "let" & Chr(13) & "" & Chr(10) & "    Source = Web.BrowserContents(""https://www.advfn.com/stock-market/NASDAQ/SMCI/financials"")," & Chr(13) & "" & Chr(10) & "    #""Extracted Table From Html"" = Html.Table(Source, {{""Column1"", ""DIV[id='quarterIncomeStatement'] > DIV.table-responsive > TABLE.table.financial-table:nth-child(1) > * > TR > :nth-child(1)""}, {""Column2"", ""DIV[id='quarterIncomeStatement'] > DIV.table-res" & _
        "ponsive > TABLE.table.financial-table:nth-child(1) > * > TR > :nth-child(2)""}, {""Column3"", ""DIV[id='quarterIncomeStatement'] > DIV.table-responsive > TABLE.table.financial-table:nth-child(1) > * > TR > :nth-child(3)""}, {""Column4"", ""DIV[id='quarterIncomeStatement'] > DIV.table-responsive > TABLE.table.financial-table:nth-child(1) > * > TR > :nth-child(4)""}, " & _
        "{""Column5"", ""DIV[id='quarterIncomeStatement'] > DIV.table-responsive > TABLE.table.financial-table:nth-child(1) > * > TR > :nth-child(5)""}, {""Column6"", ""DIV[id='quarterIncomeStatement'] > DIV.table-responsive > TABLE.table.financial-table:nth-child(1) > * > TR > :nth-child(6)""}}, [RowSelector=""DIV[id='quarterIncomeStatement'] > DIV.table-responsive > TABLE." & _
        "table.financial-table:nth-child(1) > * > TR""])," & Chr(13) & "" & Chr(10) & "    #""Changed Type"" = Table.TransformColumnTypes(#""Extracted Table From Html"",{{""Column1"", type text}, {""Column2"", type text}, {""Column3"", type text}, {""Column4"", type text}, {""Column5"", type text}, {""Column6"", type text}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    #""Changed Type"""
    With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=""Table 1 (2)"";Extended Properties=""""" _
        , Destination:=Range("$A$1")).QueryTable
        .CommandType = xlCmdSql
        .CommandText = Array("SELECT * FROM [Table 1 (2)]")
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .PreserveColumnInfo = True
        .ListObject.DisplayName = "Table_1__2"
        .Refresh BackgroundQuery:=False
    End With
 
Follow up question. I adapted this code from our conversation above and it had worked until today. I checked the URL to confirm that it is still correct, but I keep getting a debug error. Can you teach me?

VBA Code:
Function GetResponse(url As String) As String
    With CreateObject("WinHttp.WinHttpRequest.5.1")
        .Open "GET", url, False
        .send
        GetResponse = .responseText
    End With
End Function

Sub test()
    
    Dim rngPasteDest As Range
    Dim strURL As String
    Dim ws As Worksheet
    Set ws = ActiveSheet
    
    ws.Activate
    ws.Cells.Clear
    
    Set rngPasteDest = ws.Cells(1, 1)
    
    strURL = "https://finance.yahoo.com/quote/^DJI/history/?period1=1695686400&period2=1730246400"
    
    With CreateObject("htmlfile")
        .body.innerHTML = GetResponse(strURL)
        Dim tableRows As Object, row As Object, cell As Object
        Dim r As Long, c As Long
        r = 1 ' start at first row
        For Each row In .getElementsByTagName("table")(0).Rows
            c = 1 ' start at first col
            For Each cell In row.Cells
                With rngPasteDest.Offset(r, c)
                    .Value = cell.innerText
                    .WrapText = False
                End With
                c = c + 1
            Next cell
            r = r + 1
        Next row
    End With
    
End Sub
 
Upvote 0

Excel Facts

How to total the visible cells?
From the first blank cell below a filtered data set, press Alt+=. Instead of SUM, you will get SUBTOTAL(9,)

Forum statistics

Threads
1,223,099
Messages
6,170,114
Members
452,302
Latest member
TaMere

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top