Speed of multiple URL web scraping

johnnyL

Well-known Member
Joined
Nov 7, 2011
Messages
4,546
Office Version
  1. 2007
Platform
  1. Windows
Need some help with this code.

1) The current code leaves a bunch of IE windows open.
2) The current code leads to a Run-time error '-2147437259 (80004005)':
3) It takes forever to run, Hopefully someone can assist me in converting it to use MSXML2.XMLHTTP60 for example, I heard that works faster.


Code:
'
'-----------------------------------------------------
'   Run-time error '-2147437259 (80004005)':    ' This Error Occurs, eventually, in the 'Yahoo_One_Year_Estimates_Scrape_Error' section \/ \/ \/
'                                                   Also many internet explorer windows are left open that should have been closed
'
'   Automation Error
'   Unspecified Error
'-----------------------------------------------------
'
'
'   Global Variables That will be used
'
    Public Doc                                      As HTMLDocument
'
    Public StockMainPageURL                         As String       ' This will be the main portion of the URL that we send to Internet Explorer
    Public TotalURL                                 As String       ' This will be the complete URL that we send to Internet Explorer
'
    Public CellCounter                              As Integer      ' This will be used to adjust left to right on web site cells
    Public RowCounter                               As Integer      ' This adjusts the offset from the top of the spreadsheet to the start of the columns
    Public StockCount                               As Integer      ' This counts the actual stocks being analyzed currently
    Public TotalStocksToLoad                        As Integer      ' This counts the stocks that should be analyzed right now
'
    Public PageLoadAttempt                          As Long         ' This counts the number of times we have tried to load a page
'
'-------------------------------------------------------------------------------------------------------------------------------
'
Private Sub RefreshEntireDocument_Click()
'
'   This will Clear certain cell values in the spreadsheet when the $B$1 'Refresh' cell is clicked
'
    Range("$B$5:$K$254").Select                                 ' Select the range of $B$5 thru $J$254
    Selection.ClearContents                                     ' Delete the contents of this range
'
'
' -------------------------------------------------------------------------------------------------------------------------
'
'   Scrape stocks to consider looking into further from 1st URL page
'
    RowCounter = 5                                              ' Start loading stock values recieved into the 5th row of Excel
    MaxYahooDelay = 0                                           ' Initialize MaxYahooDelay = 0
'
    CellCounter = 0                                             ' Left to right cell counter
    PageLoadAttempt = 0                                         ' Initialize PageLoadAttempt = 0
    TotalStocksToLoad = 100                                     ' we will Scrape this amount of stocks from the 1st loaded page of stocks
'
    Call Scrape_BarChart_Stock_Page_1                           ' Scrape the amount of TotalStocksToLoad into excel
'
' -------------------------------------------------------------------------------------------------------------------------
'
'   Scrape stocks to consider looking into further from 2nd URL page
'
    CellCounter = 0                                             ' Left to right cell counter
    PageLoadAttempt = 0                                         ' Initialize PageLoadAttempt = 0
    TotalStocksToLoad = 100                                     ' we will Scrape this amount of stocks from the 2nd loaded page of stocks
'
    Call Scrape_BarChart_Stock_Page_2                           ' Scrape the amount of TotalStocksToLoad into excel
'
' -------------------------------------------------------------------------------------------------------------------------
'
'   Scrape stocks to consider looking into further from 3rd URL page
'
    CellCounter = 0                                             ' Left to right cell counter
    PageLoadAttempt = 0                                         ' Initialize PageLoadAttempt = 0
    TotalStocksToLoad = 50                                      ' we will Scrape this amount of stocks from the 3rd loaded page of stocks
'
    Call Scrape_BarChart_Stock_Page_3                           ' Scrape the amount of TotalStocksToLoad into excel
'
' -------------------------------------------------------------------------------------------------------------------------
' -------------------------------------------------------------------------------------------------------------------------
'
'   Scrape values from Yahoo to Update the one year estimates from previous pages of stocks scraped
'
    RowCounter = 5                                              ' Start loading stock values recieved into the 5th row of Excel
    PageLoadAttempt = 0                                         ' Initialize PageLoadAttempt = 0
    TotalYahooDelay = 0                                         ' Initialize TotalYahooDelay = 0
    TotalYahooPageAttempts = 0                                  ' Initialize TotalYahooPageAttempts = 0
    TotalStocksToLoad = 250                                     ' we will Scrape this amount of stocks from the 3rd loaded page of stocks

    Call Scrape_Yahoo_One_Year_Estimates                        ' Scrape the amount of TotalStocksToLoad into excel
'
' -------------------------------------------------------------------------------------------------------------------------
'
'   Display some final results in the status bar
    Application.StatusBar = "Spreadsheet Refreshing Complete :)" ' & "    Avg Yahoo Delay = " & AvgYahooDelay & "     Avg Yahoo Page Attempts = " & AvgYahooPageAttempts
'
End Sub
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'
Private Sub Scrape_Yahoo_One_Year_Estimates()                       ' *** Good up to here ***
'
'
    For StockCount = 1 To TotalStocksToLoad                         ' Grab One Year stock price estimate
'
'
ReloadScrape_Yahoo_One_Year_Estimates:
'
'       Load all of the Update one year estimates
        DelaySeconds = 0                                            '   Initialize DelaySeconds to zero
        PageLoadAttempt = PageLoadAttempt + 1                       '   Add 1 to our PageLoadAttempt counter
''''        TotalYahooPageAttempts = TotalYahooPageAttempts + 1         '   This will be the total yahoo Page Attempts
'
        StockMainPageURL = "finance.yahoo.com/quote/"               '   This will be the main portion of the URL that we send to Internet Explorer
        CurrentStockSymbol = Trim(Range("B" & RowCounter).Value)    '   This is the stock symbol that we will be addressing
'
'       Setup and Load the Internet Explorer Page ...
''''        Dim IE As New SHDocVw.InternetExplorer  ' This works
        Dim IE As New InternetExplorer
''      Dim IE As MSXML2.XMLHTTP60
''      Set IE = New MSXML2.XMLHTTP60
'
        TotalURL = "https://" & StockMainPageURL & CurrentStockSymbol   ' This will be the complete URL that we send to Internet Explorer
'
        If CurrentStockSymbol = 0 Or CurrentStockSymbol = "" Or IsEmpty(CurrentStockSymbol) = True Then ' If no stock symbol found @ $B?  then ...
            PageLoadAttempt = 0                                                                         '   Reset PageLoadAttempt = 0
            StockCount = TotalStocksToLoad                                                              '   Indicate no more stocks to load
'
            IE.Quit                                                                                     '   Close Internet Explorer Window
            Set IE = Nothing                                                                            '   Clear Internet Explorer Memory
'
            Exit Sub                                                                                    '   Exit this sub
        Else
'
            On Error GoTo Yahoo_One_Year_Estimates_Scrape_Error                                         '   If Error occurs then goto Yahoo_One_Year_Estimates_Scrape_Error
'
            Set IE = New InternetExplorer                                                               '   Open Internet Explorer Browser
'
'           Browser address that we will be scraping values from
            IE.navigate TotalURL                                                                        '   Load the Internet Explorer URL
'
'           Make the Browser window, that we will be scraping values from, visible
            IE.Visible = True                                           '   Make Internet Explorer Windows Visible
'
'           Allow mouse clicks and such while browser window is loading ... Loop until browser window is fuilly loaded, ie. READYSTATE_COMPLETE
            Do While IE.readyState <> 4 And DelaySeconds <= 19                                          '   Loop while IE is still loading and <= 19 seconds delayed
''              Application.Wait DateAdd("s", 1, Now)
                Application.Wait (Now + TimeValue("00:00:01"))                                          '   Delay for 1 second
                DoEvents                                                                                '   Enable Mouse Clicks
'
'               Update status bar to inform the user of what is occurring
                Application.StatusBar = "Loading website … " & TotalURL & "    Stock # " & (RowCounter - 4) ''''& _
''''                                "   Delay Seconds =  " & DelaySeconds & "    Page Load Attempts = " & PageLoadAttempt & _
''''                                "   Avg Yahoo Delay = " & AvgYahooDelay & "     AvgYahooPageAttempts = " & AvgYahooPageAttempts
'
                DelaySeconds = DelaySeconds + 1                         '   Add 1 to our DelaySeconds Counter
'
''''                If DelaySeconds > MaxYahooDelay Then MaxYahooDelay = DelaySeconds   '   Save the MaxYahooDelay
''                  TotalYahooDelay = TotalYahooDelay + 1
'
            Loop                                                        ' Loop back
'
'           Allow mouse clicks and such while browser window is loading ... Loop until browser window is fuilly loaded, ie. READYSTATE_COMPLETE
            Do While IE.Busy And DelaySeconds <= 19 ' Or IE.readyState <> 4 And DelaySeconds <= 19  ' Loop while IE is still loading and <= 19 seconds delayed
''              Application.Wait DateAdd("s", 1, Now)
                Application.Wait (Now + TimeValue("00:00:01"))          '   Delay for 1 second
                DoEvents                                                '   Enable Mouse Clicks
'
'               Update status bar to inform the user of what is occurring
                Application.StatusBar = "Loading website … " & TotalURL & "    Stock # " & (RowCounter - 4) ''''& _
''''                                "   Delay Seconds =  " & DelaySeconds & "    Page Load Attempts = " & PageLoadAttempt & _
''''                                "   Avg Yahoo Delay = " & AvgYahooDelay & "     AvgYahooPageAttempts = " & AvgYahooPageAttempts
'
                DelaySeconds = DelaySeconds + 1                         '   Add 1 to our DelaySeconds Counter
'
''''                If DelaySeconds > MaxYahooDelay Then MaxYahooDelay = DelaySeconds   '   Save the MaxYahooDelay
            Loop                                                        ' Loop back
'
'
            If DelaySeconds > 19 Then                                   ' If we have delayed for > 19 seconds to allow the page to load then ...
                IE.Quit                                                 '   Close Internet Explorer Window
'
                If PageLoadAttempt <= 4 Then GoTo ReloadScrape_Yahoo_One_Year_Estimates '   If we have'nt tried 4 reloads of this page then reload page again
            End If                                                      ' End If
'
            If PageLoadAttempt > 4 Then                                 ' If we have tried 4 reloads of the URL page then Display a message box & Exit program
                MsgBox "We've reloaded the same web page  " & PageLoadAttempt & " times without success so we're going to pause the program" & _
                " so you can investigate.", , "Multiple errors detected"
'
                PageLoadAttempt = 0                                     '   Reset PageLoadAttempt = 0
'
                Stop                                                    '   Stop this Excel program!
            End If
'
            Set Doc = IE.document
'
        End If
'
'
''''        TotalYahooDelay = TotalYahooDelay + DelaySeconds
''''        AvgYahooDelay = TotalYahooDelay / (RowCounter - 4)
''''        AvgYahooPageAttempts = TotalYahooPageAttempts / (RowCounter - 4)
'
'       Update status bar to inform the user of what is occurring
        Application.StatusBar = "Gathering Data from website … " & TotalURL & "    Stock # " & (RowCounter - 4) ''''& _
''''                                "   Delay Seconds =  " & DelaySeconds & "    Page Load Attempts = " & PageLoadAttempt & _
''''                                "   Avg Yahoo Delay = " & AvgYahooDelay & "    AvgYahooPageAttempts = " & AvgYahooPageAttempts
'
        Range("J" & RowCounter).Value = Doc.getElementsByTagName("td")(11).innerText        '   Scrape the Yahoo 52 Week Price Range
        Range("K" & RowCounter).Value = Doc.getElementsByTagName("td")(31).innerText        '   Scrape the Yahoo One Year Price Estimate
'
        On Error GoTo 0                                                                     '   Clear Errors & Set Excel Error handling to Default
'
        RowCounter = RowCounter + 1                                                         '   Advance to next row in Excel sheet
'
        IE.Quit                                                                             '   Close Internet Explorer Window
        Set IE = Nothing                                                                    '   Clear Internet Explorer Memory
'
        PageLoadAttempt = 0                                                                 '   Reset PageLoadAttempt = 0
'
    Next                                                                                    '   Load next stock until all are loaded
'
    Exit Sub                                                                                ' Exit this Sub
'
Yahoo_One_Year_Estimates_Scrape_Error:
'
'   Tried this solution from google \/ \/ to solve errors, No luck :(                       ' Shut down all Internet Explorer windows
''    Dim wsh As Object
''    Dim windowStyle As Integer: windowStyle = 1
''    Dim waitOnReturn As Boolean: waitOnReturn = True
'
''    Set wsh = VBA.CreateObject("Wscript.Shell")
''    wsh.Run "taskkill /F /IM iexplore.exe", windowStyle, waitOnReturn
'
'
'
''    IE.Quit                                                                             '   Close Internet Explorer Window
    Set IE = Nothing                                                                    '   Clear Internet Explorer Memory
'
'   This works some what
    Set IE = New InternetExplorer                                                           ' Open Internet Explorer Browser
'
'
    Resume Next                                                                             ' Go back to the next line after the previous error occurred
'
End Sub
'________________________________________________________________________________________________________________________________________________________
 
Shoot!

Stand by, the last code I gave the link to is not the right version. :(

I will do some more changes to the correct file and upload that for review in a short while.

Sorry about that folks.
 
Upvote 0

Excel Facts

Create a Pivot Table on a Map
If your data has zip codes, postal codes, or city names, select the data and use Insert, 3D Map. (Found to right of chart icons).
Here it is:

VBA Code:
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'
' Remember to add reference to Microsoft HTML Object Library ;)
'
'-------------------------------------------------------------------------------------------------------------------------------
'
Public Doc                      As HTMLDocument
'
Public ar1(1 To 250, 1 To 9)                        ' Prep Array for 250 rows, 9 values scraped per web page
Public ar2(1 To 250, 1 To 2)                        ' Prep Array for 250 rows, 2 values scraped per web page
Public ar3(1 To 250, 1 To 4)                        ' Prep Array for 250 rows, 4 values scraped per web page
'
Public ArrayRowNumber           As Integer          ' This will be used to count the row number of the array that we are dealing with
'
Public CellCounter              As Long             ' This will be used to adjust left to right on web site cells
Public DelaySeconds             As Long             ' This will be used to track how long we attempt to load a web page before we retry loading the web page
Public PageLoadAttempt          As Long             ' This will be used to track how many times we have attempted to load a particular web page
Public StockCount               As Long             ' This counts the actual stock being analyzed
Public TotalStocksToLoad        As Long             ' This counts the total stocks being analyzed right now
'
Public IE                       As New InternetExplorer
'
Public currentstocksymbol       As String           ' This is the actual stock symbol that we want to get further info on
Public StockMainPageURL         As String           ' This will be the main portion of the URL that we send to Internet Explorer
Public strHTML                  As String
Public TotalURL                 As String           ' This will be the complete URL that we send to Internet Explorer

'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'
Sub ScrapeSite()
'
'-----------------------------
'
    ProgramStartTime = Now                                                  ' Save the Time that this program started into ProgramStartTime
    PhaseStartTime = Now                                                    ' Save the Time that this Phase started into PhaseStartTime
'
'   This will Clear certain cell values in the Excel spreadsheet when the 'Get Data' button is clicked
    Call ClearSomeCellValues
'
    Range("C2").Value = Format(ProgramStartTime, "hh:mm:ss")                ' Save the ProgramStartTime into $C2
'
'-----------------------------
'
'   Phase_1                                                                 ' Takes about 1 min 13 secs to complete :)
'
    ArrayRowNumber = 0                                                      ' Reset ArrayRowNumber to zero
'
    TotalStocksToLoad = 100                                                 ' we will Scrape this amount of stocks/values from the following web page
    StockMainPageURL = "https://www.barchart.com/stocks/performance/percent-change/declines?timeFrame=3m&viewName=main"
    Call Scrape_BarChart_Stock_Page                                         ' Scrape the stocks/values from the web page
'
'
    TotalStocksToLoad = 100                                                 ' we will Scrape this amount of stocks/values from the following web page
    StockMainPageURL = "https://www.barchart.com/stocks/performance/percent-change/declines?timeFrame=3m&viewName=main&page=2"
    Call Scrape_BarChart_Stock_Page                                         ' Scrape the stocks/values from the web page
'
'
    TotalStocksToLoad = 50                                                  ' we will Scrape this amount of stocks/values from the following web page
    StockMainPageURL = "https://www.barchart.com/stocks/performance/percent-change/declines?timeFrame=3m&viewName=main&page=3"
    Call Scrape_BarChart_Stock_Page                                         ' Scrape the stocks/values from the web page
'
    Range("A5").Resize(UBound(ar1, 1), UBound(ar1, 2)).Value = ar1          ' Dump 250 Rows of saved array values into Excel cells, 9 cells wide at a time
'
    PhaseEndTime = Now                                                      ' Stop our timer that is timing this phase
    Phase_1_Timer = Format(PhaseEndTime - PhaseStartTime, "hh:mm:ss")       ' Subtract the PhaseEndTime from the PhaseStartTime to get the time elapsed
    Range("A2").Value = Phase_1_Timer                                       ' Save the Phase_1_Timer into $A2
'
'-----------------------------
'
'   Phase_2                                                                 ' Takes about 6 min 03 secs to complete :)
'
    PhaseStartTime = Now                                                    ' Save the Time that this Phase started into PhaseStartTime
'
    ArrayRowNumber = 0                                                      ' Reset ArrayRowNumber to zero
'
    Call Dan_W_Scrape_Yahoo_One_Year_Estimates                              ' Thank you Dan_W !!!
'
    Range("K5").Resize(UBound(ar2, 1), UBound(ar2, 2)).Value = ar2          ' Load 250 Rows of array values into Excel cells, 2 cells wide at a time
'
    PhaseEndTime = Now                                                      ' Stop our timer that is timing this phase
    Phase_2_Timer = Format(PhaseEndTime - PhaseStartTime, "hh:mm:ss")       ' Subtract the PhaseEndTime from the PhaseStartTime to get the time elapsed
    Range("B2").Value = Phase_2_Timer                                       ' Save the Phase_2_Timer into $B2
'
'-----------------------------
'
'   Phase_3                                                                 ' Takes about 52 min 30 secs to complete :)
'
    PhaseStartTime = Now                                                    ' Save the Time that this Phase started into PhaseStartTime
'
    ArrayRowNumber = 0                                                      ' Reset ArrayRowNumber to zero
'
    PageLoadAttempt = 0                                                     ' Reset the PageLoadAttempt counter to zero
    TotalStocksToLoad = 250                                                 ' Number of stocks to load
'
'   Update status bar to inform the user of what is occurring
    Application.StatusBar = "Loading BarChart Analyst Ratings … "
'
    Application.ScreenUpdating = False                                      ' Test1 to see if we can speed things up / Turn ScreenUpdating Off
''    Application.Calculation = xlCalculationManual                           ' Test2 to see if we can speed things up / Turn AutoCalculation Off
'
    For StockCount = 1 To TotalStocksToLoad                                 ' we will load this amount of stock symbols from the stocks that were loaded
        Call GetBarChartAnalystRatings                                      '   Scrape more values for each stock symbol
    Next                                                                    ' Load next stock symbol to scrape values for
'
    Range("O5").Resize(UBound(ar3), UBound(ar3, 2)).Value = ar3             ' Load 250 Rows of array values into Excel cells, 4 cells wide at a time
'
    PhaseEndTime = Now                                                      ' Stop our timer that is timing this phase
'
''    Application.Calculation = xlCalculationAutomatic                        ' End Test2 / Turn AutoCalculation back on
    Application.ScreenUpdating = True                                       ' End Test1 / Turn ScreenUpdating back on
'
    Phase_3_Timer = Format(PhaseEndTime - PhaseStartTime, "hh:mm:ss")       ' Subtract the PhaseEndTime from the PhaseStartTime to get the time elapsed
    Range("D2").Value = Phase_3_Timer                                       ' Save the Phase_3_Timer into $D2
    Range("E2").Value = Format(PhaseEndTime - ProgramStartTime, "hh:mm:ss") ' Save the total elapsed time of the program into $E2
'
    MsgBox "Phase RunTimes = " & vbCrLf & vbTab & "Time to complete Phase 1 - Scrape stocks to be used = " & Phase_1_Timer & _
                                vbCrLf & vbTab & "Time to complete Phase 2 - Scrape Yahoo 1 Year Estimates = " & Phase_2_Timer & _
                                vbCrLf & vbTab & "Time to complete Phase 3 - GetBarChartAnalystRatings = " & Phase_3_Timer & _
                                vbCrLf & vbCrLf & vbTab & "Total Program RunTime hh:mm:ss = " & Format(PhaseEndTime - ProgramStartTime, "hh:mm:ss")
'
'-----------------------------
'
'   Update status bar to inform the user of what is occurring
    Application.StatusBar = "Done !!!"
'
'-----------------------------
'
End Sub
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'
Private Sub ClearSomeCellValues()
'
'   This will Clear certain cell values in the spreadsheet when the 'Get Data' button is clicked
'
    Application.ScreenUpdating = False                                      ' Test1 to see if we can speed things up / Turn ScreenUpdating Off
    Application.Calculation = xlCalculationManual                           ' Test2 to see if we can speed things up / Turn AutoCalculation Off
'
    Range("$A2").Select                                                     ' Select the range of $B$5 thru $I$254
    Selection.ClearContents                                                 ' Delete the contents of this range
'
    Range("$B2").Select                                                     ' Select the range of $B$5 thru $I$254
    Selection.ClearContents                                                 ' Delete the contents of this range
'
    Range("$C2").Select                                                     ' Select the range of $B$5 thru $I$254
    Selection.ClearContents                                                 ' Delete the contents of this range
'
    Range("$D2").Select                                                     ' Select the range of $B$5 thru $I$254
    Selection.ClearContents                                                 ' Delete the contents of this range
'
    Range("$E2").Select                                                     ' Select the range of $B$5 thru $I$254
    Selection.ClearContents                                                 ' Delete the contents of this range
'
    Range("$B$5:$I$254").Select                                             ' Select the range of $B$5 thru $I$254
    Selection.ClearContents                                                 ' Delete the contents of this range
'
    Range("$K$5:$L$254").Select                                             ' Select the range of $K$5 thru $L$254
    Selection.ClearContents                                                 ' Delete the contents of this range
'
    Range("$O$5:$R$254").Select                                             ' Select the range of $O$5 thru $R$254
    Selection.ClearContents                                                 ' Delete the contents of this range
'
    Application.Calculation = xlCalculationAutomatic                        ' End Test2 / Turn AutoCalculation back on
    Application.ScreenUpdating = True                                       ' End Test1 / Turn ScreenUpdating back on
'
End Sub
'
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'
Private Sub Scrape_BarChart_Stock_Page()                                    ' Using Internet Explorer
'
'   Scrape stocks to consider looking into further, from 3 URL pages
'
'
    Application.ScreenUpdating = False                                      ' Test1 to see if we can speed things up / Turn ScreenUpdating Off
    Application.Calculation = xlCalculationManual                           ' Test2 to see if we can speed things up / Turn AutoCalculation Off
'
    CellCounter = 0                                                         ' Left to right cell counter on the web page that is being scraped
    PageLoadAttempt = 0                                                     ' Reset the PageLoadAttempt counter to zero
'
    Set IE = New InternetExplorer                                           ' Open Internet Explorer Browser
'
'   Update status bar to inform the user of what is occurring
    Application.StatusBar = "Gathering Data of stocks to load from website … " & StockMainPageURL
'
ReloadBarChartStockPage:
'
'   Scrape stocks to consider looking into further from Barchart
'
    DelaySeconds = 0                                                        ' Initialize DelaySeconds to zero
    PageLoadAttempt = PageLoadAttempt + 1                                   ' Add 1 to our PageLoadAttempt counter
'
'   Browser address that we will be scraping values from
    IE.navigate StockMainPageURL                                            ' Load the Internet Explorer URL
'
'   Allow mouse clicks and such while browser window is loading ... Loop until browser window is fuilly loaded, ie. READYSTATE_COMPLETE
    Do While IE.readyState <> 4 And DelaySeconds <= 19                      ' Loop while IE is still loading and <= 19 seconds delayed
        Application.Wait (Now + TimeValue("00:00:01"))                      '   Delay for 1 second
        DoEvents                                                            '   Enable Mouse Clicks
'
        DelaySeconds = DelaySeconds + 1                                     '   Add 1 to our DelaySeconds Counter
    Loop                                                                    ' Loop back
'
'   Allow mouse clicks and such while browser window is loading ... Loop until browser window is fuilly loaded, ie. READYSTATE_COMPLETE
    Do While IE.Busy And DelaySeconds <= 19
        Application.Wait (Now + TimeValue("00:00:01"))                      '   Delay for 1 second
        DoEvents                                                            '   Enable Mouse Clicks
'
        DelaySeconds = DelaySeconds + 1                                     '   Add 1 to our DelaySeconds Counter
    Loop                                                                    ' Loop back
'
    If DelaySeconds > 19 Then                                               ' If we have delayed for > 19 seconds to allow the page to load then ...
        IE.Quit                                                             '   Close Internet Explorer Window
'
        If PageLoadAttempt <= 4 Then                                        '   If we haven't tried to reload the web page > 4 times then ...
            Set IE = New InternetExplorer                                   '       Open Internet Explorer Browser
            GoTo ReloadBarChartStockPage                                    '       Reload the web page again
        End If                                                              '   End If
    End If                                                                  ' End If
'
    If PageLoadAttempt > 4 Then                                         ' If we have tried 4 reloads of the URL page then Display a message box & Exit program
        MsgBox "We've reloaded the same web page  " & PageLoadAttempt & " times without success so we're going to pause the program" & _
            " so you can investigate.", , "Multiple errors detected"
'
        PageLoadAttempt = 0                                                 '   Reset the PageLoadAttempt counter to zero
        Stop                                                                '   Stop this Excel program!
    End If                                                                  ' End If
'
    Set Doc = IE.document
'
'   Make the Browser window, that we will be scraping values from, visible/Invisible
''  IE.Visible = True                                                       ' Default ... Make Internet Explorer Windows Visible
    IE.Visible = False                                                      ' Make Internet Explorer Windows InVisible
'
    On Error GoTo RetryErroredLineFix                                       ' See if we exceeded the speed limit :)
'
    For StockCount = 1 To TotalStocksToLoad                                 ' Grab One Year stock price estimate
'
        ArrayRowNumber = ArrayRowNumber + 1                                                         '   Add 1 to our ArrayRowNumber Counter
'
        ar1(ArrayRowNumber, 1) = ArrayRowNumber                                                     '   Stock number Counter
        ar1(ArrayRowNumber, 2) = Trim(Doc.getElementsByTagName("td")(CellCounter).innerText)        '   Stock Symbol
        ar1(ArrayRowNumber, 3) = Trim(Doc.getElementsByTagName("td")(CellCounter + 1).innerText)    '   Stock Name
        ar1(ArrayRowNumber, 4) = Doc.getElementsByTagName("td")(CellCounter + 2).innerText          '   3 Month % Change
        ar1(ArrayRowNumber, 5) = Doc.getElementsByTagName("td")(CellCounter + 3).innerText          '   Last Price
        ar1(ArrayRowNumber, 6) = Doc.getElementsByTagName("td")(CellCounter + 4).innerText          '   Change
        ar1(ArrayRowNumber, 7) = Doc.getElementsByTagName("td")(CellCounter + 5).innerText          '   % Change
        ar1(ArrayRowNumber, 8) = Doc.getElementsByTagName("td")(CellCounter + 7).innerText          '   3 Month High
        ar1(ArrayRowNumber, 9) = Doc.getElementsByTagName("td")(CellCounter + 8).innerText          '   3 Month Low
'
        CellCounter = CellCounter + 12                                                              '   Advance to next row on URL page
'
    Next                                                                    ' Get Next if more remain
'
    IE.Quit                                                                 ' Close Internet Explorer Window
    Set IE = Nothing                                                        ' Clear Internet Explorer Memory
'
    Application.Calculation = xlCalculationAutomatic                        ' End Test2 / Turn AutoCalculation back on
    Application.ScreenUpdating = True                                       ' End Test1 / Turn ScreenUpdating back on
'
    Exit Sub                                                                ' Exit this Sub
'
RetryErroredLineFix:                                                        ' See if we exceeded the speed limit :)
    On Error GoTo 0                                                         '   Set Excel Error handling to Default
    On Error GoTo -1                                                        '   Clear Excel Error flag
    On Error GoTo RetryErroredLineFix                                       '   Turn back on our Error handling
    Resume                                                                  '  Return back to the line that originally caused the error to see if all good now
'
End Sub

'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'
Private Sub Dan_W_Scrape_Yahoo_One_Year_Estimates()
'
    Application.ScreenUpdating = False                                      ' Test1 to see if we can speed things up / Turn ScreenUpdating Off
    Application.Calculation = xlCalculationManual                           ' Test2 to see if we can speed things up / Turn AutoCalculation Off
'
    Set Doc = New HTMLDocument
'
'   Update status bar to inform the user of what is occurring
    Application.StatusBar = "Loading values ..."
'
    StockMainPageURL = "https://finance.yahoo.com/quote/"                   ' This will be the main portion of the URL
'
    TotalStocksToLoad = 250                                                 ' Number of stocks to load
'
    For StockCount = 1 To TotalStocksToLoad                                 ' Grab One Year stock price estimate
'
        ArrayRowNumber = ArrayRowNumber + 1                                 '   Add 1 to our ArrayRowNumber Counter
'
Resume_From_Error:
'
        currentstocksymbol = Range("B" & (ArrayRowNumber + 4)).Value        '   This is the stock symbol that we will be addressing
'
        If currentstocksymbol = vbNullString Then Exit For                  '   If no stock symbol found then stop looking
'
        On Error GoTo Yahoo_One_Year_Estimates_Scrape_Error                 '   If Error occurs then goto Yahoo_One_Year_Estimates_Scrape_Error
'
        TotalURL = StockMainPageURL & currentstocksymbol                    '   This will be the complete URL
        strHTML = GetHTML(TotalURL)
'
        Doc.body.innerhtml = strHTML
'
        ar2(ArrayRowNumber, 1) = Doc.getElementsByTagName("td")(11).innerText   '   Past 1 year price range
        ar2(ArrayRowNumber, 2) = Doc.getElementsByTagName("td")(31).innerText   '   Yahoo 1 year price estimate
'
    Next                                                                    ' Get Next if more remain
'
    Set Doc = Nothing
'
    Application.Calculation = xlCalculationAutomatic                        ' End Test2 / Turn AutoCalculation back on
    Application.ScreenUpdating = True                                       ' End Test1 / Turn ScreenUpdating back on
'
    Exit Sub                                                                ' Exit this Sub
'
Yahoo_One_Year_Estimates_Scrape_Error:
'
    On Error GoTo 0                                                         ' Set Excel Error handling to Default
    On Error GoTo -1                                                        ' Clear Excel Error flag
'
    GoTo Resume_From_Error                                                  ' Go back to see if error has been eliminated
'
End Sub

'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'
Private Sub GetBarChartAnalystRatings() '*
'
GetBarChartAnalystRatingsStart:
'
    ArrayRowNumber = ArrayRowNumber + 1                                     ' Add 1 to our ArrayRowNumber Counter
'
    currentstocksymbol = Range("B" & (ArrayRowNumber + 4)).Value            ' Set CurrentStockSymbol = $B?
'
    If currentstocksymbol = vbNullString Then                               ' If no stock symbol found @ $B?  then ...
        PageLoadAttempt = 0                                                 '   Reset the PageLoadAttempt counter to zero
        StockCount = TotalStocksToLoad                                      '   Indicate that all stocks have been loaded
'
        IE.Quit                                                             '   Close Internet Explorer Window
        Set IE = Nothing                                                    '   Clear Internet Explorer memory
'
        Exit Sub                                                            '   Exit this Sub
    End If                                                                  ' End If
'
    PageLoadAttempt = PageLoadAttempt + 1                                   ' Add 1 to our PageLoadAttempt counter
'
'   Set up the URL to load
    StockMainPortionURL = "https://www.barchart.com/stocks/quotes/"
    TotalURL = StockMainPortionURL & currentstocksymbol & "/analyst-ratings"
'
    DelaySeconds = 0                                                        ' Initialize DelaySeconds to zero
'
    On Error GoTo ReloadGetBarChartAnalystRatingsSubRoutine                 ' If Error occurs then goto ReloadGetBarChartAnalystRatingsSubRoutine
'
'   Browser address that we will be scraping values from
    IE.navigate TotalURL
'
'   Allow mouse clicks and such while browser window is loading ... Loop until browser window is fuilly loaded, ie. READYSTATE_COMPLETE
    Do While IE.readyState <> 4 And DelaySeconds <= 29                      ' Loop While Internet Explorer Not Fully Loaded and <= 29 seconds delayed
        Application.Wait (Now + TimeValue("00:00:01"))                      '   Delay 1 second
        DoEvents                                                            '   Enable Mouse Clicks
'
        DelaySeconds = DelaySeconds + 1                                     '   Add 1 to our DelaySeconds Counter
    Loop                                                                    ' Loop back
'
'   Allow mouse clicks and such while browser window is loading ... Loop until browser window is fuilly loaded, ie. READYSTATE_COMPLETE
    Do While IE.Busy And DelaySeconds <= 29                                 ' Loop While Internet Explorer Not Fully Loaded and <= 29 seconds delayed
        Application.Wait (Now + TimeValue("00:00:01"))                      '   Delay 1 second
        DoEvents                                                            '   Enable Mouse Clicks
'
        DelaySeconds = DelaySeconds + 1                                     '   Add 1 to our DelaySeconds Counter
    Loop                                                                    ' Loop back
'
    If DelaySeconds > 29 Then                                               ' If we have delayed for 30 seconds to allow the page to load then ...
        IE.Quit                                                             '   Close Internet Explorer Window
'
        If PageLoadAttempt <= 4 Then Call GetBarChartAnalystRatings         '   If we have'nt tried 5 reloads of page then reload entire subroutine
    End If                                                                  ' End If
'
    If PageLoadAttempt > 4 Then                                             ' If we have tried 5 reloads of the URL page then Display a message box &...
        MsgBox "We've reloaded the same web page  " & PageLoadAttempt & " times without success so we're going to halt the program" & _
            " so you can investigate.", , "Multiple errors detected"
'
        PageLoadAttempt = 0                                                 '   Reset the PageLoadAttempt counter to zero
        Stop                                                                '   Stop this Excel program!
    End If                                                                  ' End If
'
    On Error GoTo 0                                                         ' Set Excel Error handling to Default
'
    Set Doc = IE.document
'
'   Make the Browser window, that we will be scraping values from, visible/Invisible
''  IE.Visible = True                                                       ' Default ... Make Internet Explorer Windows Visible
    IE.Visible = False                                                      ' Make Internet Explorer Windows InVisible
'
    On Error GoTo NoAnalystRatings
'
    ar3(ArrayRowNumber, 1) = num(Doc.getElementsByClassName("right-border-separator")(1).innerText) ' Avg Analyst 1 yr price
    ar3(ArrayRowNumber, 2) = Doc.getElementsByClassName("block__colored-header")(3).innerText       ' Analyst stock strength
    ar3(ArrayRowNumber, 3) = Doc.getElementsByClassName("block__average_value")(3).innerText        ' Analyst rating 1 - 5
    ar3(ArrayRowNumber, 4) = Doc.getElementsByClassName("bold")(3).innerText                        ' # of analysts
'
    PageLoadAttempt = 0                                                     ' Reset the PageLoadAttempt counter to zero
'
    IE.Quit                                                                 ' Close Internet Explorer Window
    Set IE = Nothing                                                        ' Clear Internet Explorer memory
'
    Exit Sub                                                                ' Exit this Sub
'
NoAnalystRatings:
'
''  Indicate that no analysts have made any estimates
    ar3(ArrayRowNumber, 1) = "No Estimates"                                 ' Avg Analyst 1 yr price
    ar3(ArrayRowNumber, 2) = "No Estimates"                                 ' Analyst stock strength
    ar3(ArrayRowNumber, 3) = "No Estimates"                                 ' Analyst rating 1 - 5
    ar3(ArrayRowNumber, 4) = "No Estimates"                                 ' # of analysts
'
    PageLoadAttempt = 0                                                     ' Reset the PageLoadAttempt counter to zero
'
    IE.Quit                                                                 ' Close Internet Explorer Window
    Set IE = Nothing                                                        ' Clear Internet Explorer memory
'
    Exit Sub                                                                ' Exit this Sub
'
ReloadGetBarChartAnalystRatingsSubRoutine:
'
    On Error GoTo 0                                                         ' Set Excel Error handling to Default
    On Error GoTo -1                                                        ' Clear Excel Error flag
    Call GetBarChartAnalystRatings                                          ' Reload this Sub
'
End Sub

'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
Function GetHTML(strURL As String) As String
'
    Dim objHTTP As Object, strTemp As String
'
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
'
    objHTTP.Open "GET", strURL, False
    objHTTP.send
'
    If objHTTP.Status = 200 Then
        strTemp = objHTTP.responseText
    Else
        'There has been an error
        strTemp = ""
    End If
'
    GetHTML = strTemp
End Function

'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'-------------------------------------------------------------------------------------------------------------------------------
'
Function num(myStr As String)
'
    Dim b() As Byte, I As Long, myNum As String
'
    b = StrConv(myStr, vbFromUnicode)
'
    For I = LBound(b) To UBound(b)
        If IsNumeric(ChrW$(b(I))) Or (ChrW$(b(I)) = ".") Or (ChrW$(b(I)) = "-") Then
            myNum = myNum & ChrW$(b(I))
        End If
    Next
'
    num = myNum
'
End Function

I added the arrays as Dan_W suggested ;)

I still don't see any speed increase with any of the arrays added. Maybe I am not doing it right. I dunno.
 
Upvote 0
Hi. Thank you for this. I've just started to look at it now. Can I ask - what is the rationale behind PageLoadAttempt and DelaySeconds? I've read your comments in the code (it's very well commented - you're far better at it than I am!), and I understand what you've written and conceptually what it is that you've done, but I don't understand why. Have you, for example, ever hit the PageLoadAttempt = 5 trigger?
 
Upvote 0
Hi Dan_W!

The DelaySeconds was put in there because many times the web page hasn't fully loaded within 20 seconds and it will just sit there waiting for the web page to load. I found that if I reload the web page, (PageLoadAttempt), often it will load on the 2nd or 3rd attempt within 8 - 15 seconds.

The PageLoadAttempt counter stops the program from running if it has tried to load the web page 5 times with no success. Yes, I have had that triggered, but on very rare occasions. The PageLoadAttempt and DelaySeconds variables are basically included to deter endless looping of waiting for a page to completely load.

Hope that helps.

I could eliminate the updating of the status bar in the 3rd phase, but if I do that, basically the screen will just sit there for 50 mi nutes or so and I will have no way of knowing if the program is still running or is just stuck in a loop somewhere. Basically would have to wait like an hour and then hope that everything was working properly and it completes properly.
 
Upvote 0
That does help, but do you know how often the DelaySeconds trigger happens?
 
Upvote 0
Hi Dan_W!

Not sure what you are asking there. :( I thought I covered that in my previous response.

The Do while/loop delays, that test for the web page fully loaded, delay for 1 second prior to the DoEvents, increment the DelaySeconds counter, then loops back to see if the web page is fully loaded. Within that loop, there is also a check to see if that loop has been running(delayed) for 20 seconds, if so, then it increments the PageLoadAttempt counter and then reloads the web page and tries again.

If you are asking how often the PageLoadAttempts counter gets incremented, ie. the 20 second delay is reached to trigger the reload of the webpage, I would have to guess probably 40% of the time. I can run a test if you would like. Normally all web pages load within 2 tries, rarely 3 times.

Is that what you were inquiring about? If not, please let me know.

Thank you.
 
Upvote 0
I ran a test that deleted all of the DelaySeconds & PageLoadAttempts coding, in thinking that Dan_W possibly believes that those variables might be slowing the program down ...

Well it took a few minutes longer to run without that code included. :(
 

Attachments

  • 4.14Capturetry.JPG
    4.14Capturetry.JPG
    47.8 KB · Views: 13
Upvote 0
Here is a second run of that ...

I think it is clear why I instituted the DelaySeconds and the PageLoadAttempt approach I took.
 

Attachments

  • 4.14TryV2Capture.JPG
    4.14TryV2Capture.JPG
    45.1 KB · Views: 15
Upvote 0
Dan_W said:
Hi
I see that the timer has been applied to Phases 1, 2 and 3 - what I mean is that you should apply it within Phase 3, because it appears to be broken down in to a number of elements, and those elements are being looped over 250 times, so it would be helpful to know what part of that is consuming the most time/energy.

Hey Dan_W!

Can you be more specific which sections that you would like to see the timings for within Phase 3?

I would be more than happy to time them. Would you like it broken down to avg per loop, or total time, or both, or something else?
 
Upvote 0
johnnyL said:
...
From my limited understanding, the loop is the 'CPU cycle burner'. The DoEvents is a very small(fast) window to allow mouse clicks and such. The loop appears to be the problem being that it fires as fast as it can (Computer processor speed dependent) to run over and over again, unless an additional delay is included in that loop via sleep/wait,etc.

I could set up a timer to see how many times the Loop/DoEvents/Loop back is triggered each second, but I think it would be a very large amount, especially when some of the web pages I am loading appear to take upwards of 10 seconds or more.
...

In the Mean time, I decided to test the Delay loop for pages to load, without, and with, an additional delay of 1 second per loop to see how many times the Delay loop was executed. I decided to test Phase 1 for this trial.

The attached images are what I discovered ...

Quite a difference between the two, 180k vs 30, not 30k, just 30.
 

Attachments

  • CPU_Battery_Burner_Test_No_Additional_Delay_V1.JPG
    CPU_Battery_Burner_Test_No_Additional_Delay_V1.JPG
    28.1 KB · Views: 11
  • CPU_Battery_Burner_Test_With_1_Second_Delay_Added_V1.JPG
    CPU_Battery_Burner_Test_With_1_Second_Delay_Added_V1.JPG
    27.8 KB · Views: 11
Upvote 0

Forum statistics

Threads
1,225,741
Messages
6,186,763
Members
453,370
Latest member
juliewar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top