Pull entire web table not just what is visible on webpage

mrmmickle1

Well-known Member
Joined
May 11, 2012
Messages
2,461
I have been using some code I found originally posted by Norie to pull down web tables off of a page:

Rich (BB code):
Option Explicit
 
Sub TableExample()
    Dim IE As Object
    Dim doc As Object
    Dim strURL As String
     
    strURL = "Player Rater - Free Fantasy Basketball - ESPN?" ' replace with URL of your choice
     
    Set IE = CreateObject("InternetExplorer.Application")
    With IE
         '.Visible = True
         
        .Navigate strURL
        Do Until .ReadyState = 4: DoEvents: Loop
            Do While .Busy: DoEvents: Loop
                Set doc = IE.Document
                GetAllTables doc
                 
                .Quit
            End With
        End Sub
         
        Sub GetAllTables(doc As Object)
             
             ' get all the tables from a webpage document, doc, and put them in a new worksheet
             
            Dim ws As Worksheet
            Dim rng As Range
            Dim tbl As Object
            Dim rw As Object
            Dim cl As Object
            Dim tabno As Long
            Dim nextrow As Long
            Dim I As Long
             
            Set ws = Worksheets.Add
             
            For Each tbl In doc.getElementsByTagName("TABLE")
                tabno = tabno + 1
                nextrow = nextrow + 1
                Set rng = ws.Range("B" & nextrow)
                rng.Offset(, -1) = "Table " & tabno
                For Each rw In tbl.Rows
                    For Each cl In rw.Cells
                        rng.Value = cl.outerText
                        Set rng = rng.Offset(, 1)
                        I = I + 1
                    Next cl
                    nextrow = nextrow + 1
                    Set rng = rng.Offset(1, -I)
                    I = 0
                Next rw
            Next tbl
             
            ws.Cells.ClearFormats
             
        End Sub

However this only pulls down only first 50 rows of the table....or what is visible on the web page.

Is there a way to pull down the entire table?

The table ID is "playerrater_0"

When I navigate to the next 50 results the url changes to:

Player Rater - Free Fantasy Basketball - ESPN

The problem is there are 905 rows in the table

Any help would be appreciated.
 

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
Try this instead. It retrieves all the pages and is much quicker than using IE.
Code:
Public Sub Get_Player_Data()
    
    Dim URL As String
    Dim XMLreq As Object
    Dim HTMLdoc As Object
    Dim playerTable As Object
    Dim tableRows As Object
    Dim tableCell As Object
    Dim nextLink As Object
    Dim dest As Range
    Dim playerData As Variant
    Dim HTMLrow As Integer, i As Integer, c As Integer
    
    Set dest = ActiveSheet.Range("A1")
    dest.Parent.Activate
    dest.Parent.cells.Clear

    URL = "http://games.espn.go.com/fba/playerrater"
    Set XMLreq = CreateObject("MSXML2.XMLhttp")

    'For first page of results start at HTML row index 1 to extract column headings
    
    HTMLrow = 1
    
    'Request all pages
    
    Do
        With XMLreq
            Debug.Print Now, URL
            .Open "GET", URL, False
            .send
            Set HTMLdoc = CreateObject("HTMLFile")
            HTMLdoc.body.innerHTML = .responseText
        End With
        
        'Extract player table into array
        '< table id="playertable_0" class="playerTableTable
        
        Set playerTable = HTMLdoc.getElementById("playertable_0")
        Set tableRows = playerTable.Rows
        ReDim playerData(1 To tableRows.Length - HTMLrow, 1 To tableRows(HTMLrow).cells.Length)
        
        i = 1
        While HTMLrow < tableRows.Length
            c = 1
            For Each tableCell In tableRows(HTMLrow).cells
                If tableCell.innerText <> "" Then
                    playerData(i, c) = tableCell.innerText
                    c = c + 1
                End If
            Next
            i = i + 1
            HTMLrow = HTMLrow + 1
        Wend
        
        'Copy array to sheet cells
        
        dest.Resize(UBound(playerData, 1), UBound(playerData, 2)).Value = playerData
        Set dest = dest.Offset(UBound(playerData, 1))
        dest.Select
        DoEvents
        
        'For subsequent pages start at HTML row index 2 to ignore column headings
    
        HTMLrow = 2
        
        'Find NEXT» link
        '< a href="http://games.espn.go.com/fba/playerrater?startIndex=50">
        'Next
        '< span style="font-size:12px;">»< /span>
        '< /a>
        
        Set nextLink = Nothing
        i = 0
        While i < HTMLdoc.Links.Length And nextLink Is Nothing
            If HTMLdoc.Links(i).innerText = "NEXT»" Then
                Set nextLink = HTMLdoc.Links(i)
                URL = nextLink.href
            End If
            i = i + 1
        Wend
        
    Loop Until nextLink Is Nothing
    
    MsgBox "Finished"
    
End Sub
 
Upvote 0
John,

Wow! Thank you so much! I appreciate you adding the comments in as well! Just what the doctor ordered!
 
Upvote 0
I am ofcourse very appreciative for your work. I was trying to pull down a different table from ESPN and it also appears to have the id "player table_0"

If I change the URL to
Code:
http://games.espn.go.com/fba/freeagency?leagueId=306149&teamId=2&seasonId=2013#&seasonId=2013&gamesInScoringPeriodId=66&scoringPeriodId=65&view=stats&context=freeagency&version=last7"

however I now get an error on this line:
Code:
Set tableRows = playerTable.Rows

Is there anyway to resolve this [h=1]Object variable or With block variable not set Error?[/h]If you could help I would be grateful, however you have already given me a lot. THank you.
 
Upvote 0
That web page requires a slightly different technique. Note - you must remove the underscore in "o_nclick" in the code (and comments) for it to work. I've had to add the underscore to prevent the forum displaying *******.
Code:
Public Sub Get_Player_Data2()
    
    Dim baseURL As String, URL As String, params As String
    Dim XMLreq As Object
    Dim HTMLdoc As Object
    Dim playerTable As Object
    Dim tableRows As Object
    Dim tableCell As Object
    Dim dest As Range
    Dim playerData As Variant
    Dim HTMLrow As Integer, i As Integer, c As Integer
    Dim p1 As Long, p2 As Long
    
    Set dest = ActiveSheet.Range("A1")
    dest.Parent.Activate
    dest.Parent.cells.Clear

    baseURL = "http://games.espn.go.com/fba/playertable/prebuilt/freeagency"
    params = "leagueId=306149&teamId=2&seasonId=2013&=undefined&gamesInScoringPeriodId=66&scoringPeriodId=65&view=stats&context=freeagency&version=last7&startIndex=0"
       
    Set XMLreq = CreateObject("MSXML2.XMLhttp")

    'For first page of results start at HTML row index 1 to include column headings in extracted data
    
    HTMLrow = 1
    
    'Request all pages
    
    Do
        With XMLreq
            URL = baseURL & "?" & params & "&r=" & CLng(Rnd() * 99999999)
            Debug.Print Now, URL
            .Open "POST", URL, False
            .send
            Set HTMLdoc = CreateObject("HTMLFile")
            HTMLdoc.body.innerHTML = .responseText
        End With
        
        'Extract player table into array
        '< table id="playertable_0" class="playerTableTable
        
        Set playerTable = HTMLdoc.getElementById("playertable_0")
        Set tableRows = playerTable.Rows
        ReDim playerData(1 To tableRows.Length - HTMLrow, 1 To tableRows(HTMLrow).cells.Length)
        
        i = 1
        While HTMLrow < tableRows.Length
            c = 1
            For Each tableCell In tableRows(HTMLrow).cells
                If tableCell.innerText <> "" Then
                    playerData(i, c) = tableCell.innerText
                    c = c + 1
                End If
            Next
            i = i + 1
            HTMLrow = HTMLrow + 1
        Wend
        
        'Copy array to sheet cells
        
        dest.Resize(UBound(playerData, 1), UBound(playerData, 2)).Value = playerData
        Set dest = dest.Offset(UBound(playerData, 1))
        dest.Select
        DoEvents
        
        'For subsequent pages start at HTML row index 2 to ignore column headings
    
        HTMLrow = 2
        
        'Find NEXT» link and extract parameters from o_nclick
        '< a href="#" o_nclick="players('leagueId=306149&teamId=2&seasonId=2013&=undefined&gamesInScoringPeriodId=66&scoringPeriodId=65&view=stats&context=freeagency&version=last7&startIndex=50'); return false;">NEXT<span style="font-size:12px;">»< /span>< /a>
        
        params = ""
        i = 0
        While i < HTMLdoc.Links.Length And params = ""
            If HTMLdoc.Links(i).innerText = "NEXT»" Then
                p1 = InStr(HTMLdoc.Links(i).o_nclick, "'") + 1
                p2 = InStr(p1, HTMLdoc.Links(i).o_nclick, "'")
                params = Mid(HTMLdoc.Links(i).o_nclick, p1, p2 - p1)
            End If
            i = i + 1
        Wend
        
    Loop Until params = ""
    
    MsgBox "Finished"
    
End Sub
 
Upvote 0
John,

Thank you again. This is very impressive. It worked perfectly when I removed the underscores. I was wondering. Is there any text that provides more knowledge about the HTML Object library. I am unfamiliar with these objects and their properties. This powerful display leaves me yearning for more. You are an inspiration for me. I appreciate your efforts.

So I assume when pulling data off the web there is no particular solution. It all depends on how the website has been coded?
 
Last edited:
Upvote 0
Correct. It depends on how the web site is designed and coded. The main methods for retrieving/scraping/pulling data off web sites are Excel File->Open (if the site offers a direct .csv file link for example), Excel web query, VBA XMLhttp, VBA XML DOM and VBA IE automation or Excel WebBrowser. That list is roughly in ascending order of complexity and the actual choice of method and VBA coding varies according to the particular web site.

Some useful references:

Internet Explorer Development - Hosting and Reuse

WebBrowser Control

InternetExplorer object (Internet Explorer)

HTML Elements

HTML/XHTML Reference (Internet Explorer)

HTML and DHTML Reference

VBA web services

Using the XML HTTP Request object

The two routines I posted use late binding of XMLHTTP and HTML library objects for ease of use. If you want to learn more about these objects and their properties and methods I recommend you use early binding instead. To do this, set a reference to Microsoft XML v6.0 and MS HTML Object Library via the VBE Tools-References menu, and change each Object data type to its appropriate data type (e.g. Dim HTMLdoc As Object becomes Dim HTMLdoc As HTMLDocument). Set a breakpoint on the End Sub line, run the code and look in the VBE Locals window to determine the specific XMLHTTP or HTML data type of each object.
 
Upvote 0
Hate to be the guy to resurrect an 18-month old thread, but this thread is spot-on for what I am trying to accomplish and I didn't want to start from scratch.

John_W, I am trying to use your "Get_Player_Data_2" code to pull data from a similar series of ESPN web pages. The URL of the first page of data is:
Code:
http://games.espn.go.com/ffl/freeagency?leagueId=228988&teamId=10&seasonId=2014#&seasonId=2014&=undefined&avail=-1&context=freeagency&view=overview

The URL of the second page of data is:
Code:
http://games.espn.go.com/ffl/freeagency?leagueId=228988&teamId=10&seasonId=2014#&seasonId=2014&=undefined&avail=-1&context=freeagency&view=overview&startIndex=50

Subsequent pages increment by 50. I have updated your code by (1) replacing the instances of "o_nclick" as you suggested; (2) defining baseURL = "http://games.espn.go.com/ffl/freeagency"; and (3) defining params = "leagueId=228988&teamId=10&seasonId=2014#&seasonId=2014&=undefined&avail=-1&context=freeagency&view=overview&startIndex=". The code throws the following error at the line “Set tableRows = playerTable.Rows”:
Run-time error ‘91’:
Object variable or With block variable not set

I have also tried (1) changing the definition of URL = baseURL & "?" & params & CLng(Rnd() * 99999999), and (2) changing "Set playerTable = HTMLdoc.getElementById("playertable_0")" to "Set playerTable = HTMLdoc.getElementById("playertable_1")". Neither changed the resulting error.

Any help would be greatly appreciated!
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,181
Members
453,022
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top