VBA - Pull Html, href links

jamescooper

Well-known Member
Joined
Sep 8, 2014
Messages
840
Hello,

I am trying to pull the data from this link:

http://greyhoundbet.racingpost.com/#card/race_id=1592960&r_date=2018-02-27&tab=form

My code so far below, gives all the URLs and loops through, placing them in sheet1 below one another.

I only require the links of each dog in the form, so there are 6 links which for the above URL are:

http://greyhoundbet.racingpost.com/#dog/race_id=1592960&r_date=2018-02-27&dog_id=491247
http://greyhoundbet.racingpost.com/#dog/race_id=1592960&r_date=2018-02-27&dog_id=503888
http://greyhoundbet.racingpost.com/#dog/race_id=1592960&r_date=2018-02-27&dog_id=516549
http://greyhoundbet.racingpost.com/#dog/race_id=1592960&r_date=2018-02-27&dog_id=516104
http://greyhoundbet.racingpost.com/#dog/race_id=1592960&r_date=2018-02-27&dog_id=494253
http://greyhoundbet.racingpost.com/#dog/race_id=1592960&r_date=2018-02-27&dog_id=512296

Does anyone have any idea how I would adapt my code, the href appears to be linked in an 'a class' called "gh DogName" when I inspect the code.

As ever - many thanks.

Code:
Sub Dog_URLs()

Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim ElementCol As Object
Dim Link As Object
Dim erow As Long
Dim DLine As Range
Dim LastRow As Long


Application.Calculation = xlCalculationManual


Sheets("Sheet1").Select
LastRow = Sheets("Races").Range("C" & Rows.Count).End(xlUp).row
For Each DLine In Sheets("Races").Range("C1:C" & LastRow)
Set ie = New InternetExplorer
ie.Visible = True
With ie
.Navigate DLine.Value


Do While ie.ReadyState <> READYSTATE_COMPLETE


Application.StatusBar = "Trying to go to website…"
DoEvents
Loop
Application.Wait Now + TimeValue("00:00:01")


Set html = ie.Document
Set ElementCol = html.getElementsByTagName("a")


For Each Link In ElementCol
erow = Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(erow, erow).row
Cells(erow, 2).Value = Link
Cells(erow, 2).Columns.AutoFit
Next


ie.Quit
  Set ie = Nothing


End With


Next DLine


Application.Calculation = xlCalculationAutomatic


End Sub
 

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
You want something like this (don't automate Internet Explorer, it's slow, error prone and icky ;))

Test in a new workbook:
Rich (BB code):
Option Explicit

Sub test()

 Const race_id = "1592960"
 Const date_id = "2018-02-27"
 
 Dim json   As String
 Dim spl()  As String
 Dim outp() As String
 Dim x      As Long
 
 With CreateObject("msxml2.xmlhttp")
    .Open "GET", "http://greyhoundbet.racingpost.com/card/blocks.sd?race_id=" & race_id & "&r_date=" & date_id & "&tab=form&blocks=form", False
    .send
    json = .responsetext
 End With
 
 spl = Split(json, "dogId"":""")
 
 If UBound(spl) > 1 Then
    ReDim outp(1 To UBound(spl), 1 To 1)
    For x = 1 To UBound(spl)
        outp(x, 1) = formatLink(Val(spl(x)), race_id, date_id)
    Next x
 End If
 
 With Sheets(1).Cells(1, 1).Resize(UBound(outp))
    .Value = outp
 End With
 
End Sub

Private Function formatLink(ByVal dogId As Long, ByVal raceId As String, ByVal dateId As String) As String
    formatLink = "http://greyhoundbet.racingpost.com/#dog/race_id=" & raceId & "&r_date=" & dateId & "&dog_id=" & dogId
End Function
 
Upvote 0
You want something like this (don't automate Internet Explorer, it's slow, error prone and icky ;))

Test in a new workbook:
Rich (BB code):
Option Explicit

Sub test()

 Const race_id = "1592960"
 Const date_id = "2018-02-27"
 
 Dim json   As String
 Dim spl()  As String
 Dim outp() As String
 Dim x      As Long
 
 With CreateObject("msxml2.xmlhttp")
    .Open "GET", "http://greyhoundbet.racingpost.com/card/blocks.sd?race_id=" & race_id & "&r_date=" & date_id & "&tab=form&blocks=form", False
    .send
    json = .responsetext
 End With
 
 spl = Split(json, "dogId"":""")
 
 If UBound(spl) > 1 Then
    ReDim outp(1 To UBound(spl), 1 To 1)
    For x = 1 To UBound(spl)
        outp(x, 1) = formatLink(Val(spl(x)), race_id, date_id)
    Next x
 End If
 
 With Sheets(1).Cells(1, 1).Resize(UBound(outp))
    .Value = outp
 End With
 
End Sub

Private Function formatLink(ByVal dogId As Long, ByVal raceId As String, ByVal dateId As String) As String
    formatLink = "http://greyhoundbet.racingpost.com/#dog/race_id=" & raceId & "&r_date=" & dateId & "&dog_id=" & dogId
End Function


This is so quick, thanks a lot.

How does it pull the data then without opening IE? How do you adapt for a list of URLs instead of just one? So to add in the loop again?

Many thanks, this is amazing - wish I knew this much on pulling data. Thanks.
 
Upvote 0
Please don't quote whole posts, it clutters up the thread.

Based on my understanding of your code, you'd want:
Rich (BB code):
Option Explicit

Private Enum URLPart
    race_id = 0
    date_id = 1
End Enum

Public Sub PopulateDogUrls()

    Dim lastRow     As Long
    Dim x           As Long
    Dim urls        As Variant
    Dim dogLinks    As Variant
    
    
    lastRow = Sheets("Races").Range("C" & Rows.Count).End(xlUp).Row
    urls = Sheets("races").Range("C1:C" & lastRow).Value
    
    For x = LBound(urls) To UBound(urls)
        dogLinks = getDogLinksFromUrl(urls(x, 1))
        Sheets("Sheet1").Cells(Sheets(1).Rows.Count, 1).End(xlUp).Offset(1).Resize(UBound(dogLinks), 1).Value2 = dogLinks
    Next x
    
    
End Sub


Private Function getDogLinksFromUrl(ByVal url As String) As Variant

    Dim json   As String
    Dim spl()  As String
    Dim ret() As String
    Dim x      As Long
    
    Dim urlParts() As String
     
    Dim raceId As String
    Dim dateId As String
    
    urlParts = getRaceIdAndDateFromUrl(url)
    
    raceId = urlParts(URLPart.race_id)
    dateId = urlParts(URLPart.date_id)
    
    
    With CreateObject("msxml2.xmlhttp")
       .Open "GET", "http://greyhoundbet.racingpost.com/card/blocks.sd?race_id=" & raceId & "&r_date=" & dateId & "&tab=form&blocks=form", False
       .send
       json = .responsetext
    End With
     
    spl = Split(json, "dogId"":""")
     
    If UBound(spl) > 1 Then
       ReDim ret(1 To UBound(spl), 1 To 1)
       For x = 1 To UBound(spl)
           ret(x, 1) = formatLink(Val(spl(x)), raceId, dateId)
       Next x
    End If
     
    getDogLinksFromUrl = ret
 
End Function

Private Function formatLink(ByVal dogId As Long, ByVal raceId As String, ByVal dateId As String) As String

    formatLink = "http://greyhoundbet.racingpost.com/#dog/race_id=" & raceId & "&r_date=" & dateId & "&dog_id=" & dogId
    
End Function

Private Function getRaceIdAndDateFromUrl(ByVal url As String) As String()
    
    Dim ret(0 To 1) As String
    
    ret(0) = Split(Split(url, "race_id=")(1), "&")(0)
    ret(1) = Split(Split(url, "r_date=")(1), "&")(0)
    
    getRaceIdAndDateFromUrl = ret
    
End Function

You never need to use IE for extracting data from the web. You simply need to make the same request that a web browser does and handle the response, this is what we're doing here, racingpost is using an API to return dog data, all we're doing is calling that directly rather than displaying the page
 
Last edited:
Upvote 0
No, not really. You need to learn how web requests are made and the relevant parts that they consist of (this isn't VBA specific), so you need to understand HTTP, something like this would be a good place to start https://code.tutsplus.com/tutorials...ery-web-developer-must-know-part-1--net-31177

Once you've got your head round that, it's just a case of looking at the network requests in your browser tools and working out where the data is coming from (use Chrome for this, not IE, the dev tools make it much easier).
 
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,337
Members
452,637
Latest member
Ezio2866

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top