I am looking for assistance with some VBA code intended to collect information from a web site that displays data in groups of 50 rows, requiring you to click the "Next" button each time you want to view the next set of rows. I am trying to craft some code that will grab all of the rows at once and dump them into Excel, or at least loop through the various pages and consolidate the results in Excel. I even tried just setting up 20 different tabs in Excel that would each use the Excel Web Query to grab 50 rows of data, but no matter how I play with the target URL every tab grabs the first 50 rows. That is why I ended up working on the approach below, which unfortunately is resulting in errors.
The URL of the first page of data I am trying to scrape is:
The URL of the second page of data is:
Subsequent pages increment by 50.
Here is the VBA code I have so far (which is based on the work of John_W in this earlier thread):
Please note that this message board removes every instance of "o_nclick" without the underscore from the above code and inserts "*******", but rest assured that my VBA contains "o_nclick" without the underscore.
The code above is throwing the following error at the line “Set tableRows = playerTable.Rows”:
Run-time error ‘91’:
Object variable or With block variable not set
I have tried (1) changing the definition of URL = baseURL & "?" & params & CLng(Rnd() * 99999999), and (2) changing "Set playerTable = HTMLdoc.getElementById("playertable_0")" to "Set playerTable = HTMLdoc.getElementById("playertable_1")". Neither changed the resulting error.
Any help would be greatly appreciated!
The URL of the first page of data I am trying to scrape is:
Code:
http://games.espn.go.com/ffl/freeagency?leagueId=228988&teamId=10&seasonId=2014#&seasonId=2014&=undefined&avail=-1&context=freeagency&view=overview
The URL of the second page of data is:
Code:
http://games.espn.go.com/ffl/freeagency?leagueId=228988&teamId=10&seasonId=2014#&seasonId=2014&=undefined&avail=-1&context=freeagency&view=overview&startIndex=50
Subsequent pages increment by 50.
Here is the VBA code I have so far (which is based on the work of John_W in this earlier thread):
Code:
Public Sub ScrapeData()
'based on solution in http://www.mrexcel.com/forum/excel-questions/677031-pull-entire-web-table-not-just-what-visible-webpage.html
Dim baseURL As String, URL As String, params As String
Dim XMLreq As Object
Dim HTMLdoc As Object
Dim playerTable As Object
Dim tableRows As Object
Dim tableCell As Object
Dim dest As Range
Dim playerData As Variant
Dim HTMLrow As Integer, i As Integer, c As Integer
Dim p1 As Long, p2 As Long
Set dest = ActiveSheet.Range("A1")
dest.Parent.Activate
dest.Parent.Cells.Clear
baseURL = "http://games.espn.go.com/ffl/freeagency"
params = "leagueId=228988&teamId=10&seasonId=2014#&seasonId=2014&=undefined&avail=-1&context=freeagency&view=overview&startIndex="
Set XMLreq = CreateObject("MSXML2.XMLhttp")
'For first page of results start at HTML row index 1 to include column headings in extracted data
HTMLrow = 1
'Request all pages
Do
With XMLreq
URL = baseURL & "?" & params & CLng(Rnd() * 99999999)
Debug.Print Now, URL
.Open "POST", URL, False
.send
Set HTMLdoc = CreateObject("HTMLFile")
HTMLdoc.body.innerHTML = .responseText
End With
'Extract player table into array
'< table id="playertable_0" class="playerTableTable
Set playerTable = HTMLdoc.getElementById("playertable_0")
Set tableRows = playerTable.Rows
ReDim playerData(1 To tableRows.Length - HTMLrow, 1 To tableRows(HTMLrow).Cells.Length)
i = 1
While HTMLrow < tableRows.Length
c = 1
For Each tableCell In tableRows(HTMLrow).Cells
If tableCell.innerText <> "" Then
playerData(i, c) = tableCell.innerText
c = c + 1
End If
Next
i = i + 1
HTMLrow = HTMLrow + 1
Wend
'Copy array to sheet cells
dest.Resize(UBound(playerData, 1), UBound(playerData, 2)).Value = playerData
Set dest = dest.Offset(UBound(playerData, 1))
dest.Select
DoEvents
'For subsequent pages start at HTML row index 2 to ignore column headings
HTMLrow = 2
'Find NEXT» link and extract parameters from *******
'< a href="#" *******="players('leagueId=306149&teamId=2&seasonId=2013&=undefined&gamesInScoringPeriodId=66&scoringPeriodId=65&view=stats&context=freeagency&version=last7&startIndex=50'); return false;">NEXT»< /span>< /a>
params = ""
i = 0
While i < HTMLdoc.Links.Length And params = ""
If HTMLdoc.Links(i).innerText = "NEXT»" Then
p1 = InStr(HTMLdoc.Links(i).*******, "'") + 1
p2 = InStr(p1, HTMLdoc.Links(i).*******, "'")
params = Mid(HTMLdoc.Links(i).*******, p1, p2 - p1)
End If
i = i + 1
Wend
Loop Until params = ""
MsgBox "Finished"
End Sub
Please note that this message board removes every instance of "o_nclick" without the underscore from the above code and inserts "*******", but rest assured that my VBA contains "o_nclick" without the underscore.
The code above is throwing the following error at the line “Set tableRows = playerTable.Rows”:
Run-time error ‘91’:
Object variable or With block variable not set
I have tried (1) changing the definition of URL = baseURL & "?" & params & CLng(Rnd() * 99999999), and (2) changing "Set playerTable = HTMLdoc.getElementById("playertable_0")" to "Set playerTable = HTMLdoc.getElementById("playertable_1")". Neither changed the resulting error.
Any help would be greatly appreciated!