Web Scrape Assistance getting data from table

SQUIDD

Well-known Member
Joined
Jan 2, 2009
Messages
2,126
Office Version
  1. 2019
  2. 2016
Platform
  1. Windows
Hello All

Thanks for looking.

Please see the example of webpage i am trying to extract data from.
My code to load the page below.


VBA Code:
Sub get_dogs_a()

Dim Doc     As HTMLDocument
Dim ie      As InternetExplorer
Set ie = New InternetExplorer

  ie.Navigate "https://www.timeform.com/greyhound-racing/greyhound-form/standby-ashley/66258"
  Do While ie.ReadyState = 4: DoEvents: Loop
  Do Until ie.ReadyState = 4: DoEvents: Loop
  Do While Doc Is Nothing: Set Doc = ie.Document: DoEvents: Loop
  DOGNAME = Trim(Replace(Doc.getElementsByClassName("w-dog-ledger-header w-content")(0).getElementsByTagName("h1")(0).innerText, "Greyhound Profile", ""))
  Do While DOGNAME = DOGNAMEOLD: DOGNAME = Doc.getElementById("dogHeaderTitle").innerText: Loop
  
  Worksheets("pos " & C - 3).Cells.ClearContents
  Worksheets("app").Cells(C, 11) = DOGNAME: Worksheets("pos " & C - 3).Cells(1, 1) = DOGNAME
  COUNTER = 0
  resultcount = Doc.getElementsByClassName("recent-form-meeting-date").Length
  
 'its at this point i would like to get info out of the table.How would i handle getting for instance the 4th column info header as GDE. in this example row 1 is A2 as is row 2.
 
'as an example if we could simple output all the GDE into column A. I cannot seem to work out how.

End Sub

I have tried several methods, get by class name then by tagname, a mixture of them all, but i just do not understand the structure of this table.

thanks for looking, any help huge thanks, happy to move to xmlhttp if easier/better, although my understanding of this is 0.

But with an example im sure i can finish my project.

thanks

dave
 

Excel Facts

Last used cell?
Press Ctrl+End to move to what Excel thinks is the last used cell.
This is my proposal:
Rich (BB code):
Sub get_dogs_a()

Dim Doc     As HTMLDocument
Dim ie      As InternetExplorer
Set ie = New InternetExplorer
ie.Visible = True       'IE visible
  ie.Navigate "https://www.timeform.com/greyhound-racing/greyhound-form/standby-ashley/66258"
  Do While ie.Busy: DoEvents: Loop            'Busy Check
  Do Until ie.ReadyState = 4: DoEvents: Loop    'Document Complete check
  Do While Doc Is Nothing: Set Doc = ie.Document: DoEvents: Loop
  DOGNAME = Trim(Replace(Doc.getElementsByClassName("w-dog-ledger-header w-content")(0).getElementsByTagName("h1")(0).innerText, "Greyhound Profile", ""))
  Do While DOGNAME = DOGNAMEOLD: DOGNAME = Doc.getElementById("dogHeaderTitle").innerText: Loop
 
  Worksheets("pos " & C - 3).Cells.ClearContents
  Worksheets("app").Cells(C, 11) = DOGNAME: Worksheets("pos " & C - 3).Cells(1, 1) = DOGNAME
  COUNTER = 0

'My Code:
'Range("A:V").ClearContents        'Clear Columns A:V (now disabled)
Set mytab = Doc.getElementsByClassName("w-dog-ledger-table w-dog-ledger-performances recent-form")(0).getElementsByTagName("table")(0)
For Each trtr In mytab.Rows
    i = i + 1
    For Each tdtd In trtr.Cells
        j = j + 1
        If InStr(1, trtr.outerHTML, "display: none", vbTextCompare) = 0 Then
            Cells(i, j).Value = tdtd.innerText
        Else
            i = i - 1
            Exit For
        End If
    Next tdtd
    j = 0
Next trtr
'Close Ie:
ie.Quit
Set ie = Nothing
End Sub
I didn't touch the first part of your code, except for checking IE.Busy before checking for "ie.ReadyState = 4" and showing the IE session (so that if something goes wrong you can close it manually)

Bye
 
Upvote 0
Hi Anthony

Many thanks for your time on that, code works perfectly.
However, i really did not want to be grabbing all of the data if i did not have to.

How could i point to Column "FIN" and output just them for instance. say in the code below column "FIN" row 4

VBA Code:
Sub get_6_dogs()

Dim XMLReq As New MSXML2.XMLHTTP60
Dim Doc As New HTMLDocument
XMLReq.Open "GET", Worksheets("DAY").Cells(3, 4), False
    XMLReq.send
    Doc.body.innerHTML = XMLReq.responseText
    Set XMLReq = Nothing
    
   
    DOGNAME = Trim(Replace(Doc.getElementsByClassName("w-dog-ledger-header w-content")(0).getElementsByTagName("h1")(0).innerText, "Greyhound Profile", ""))
    resultcount = Doc.getElementsByClassName("recent-form-meeting-date").Length

'now output FIN column row 4 which shoud be "3rd"

''''''''A = Doc.getElementsByTagName("tr")(1).getElementsByTagName("td")(4).innerText

End Sub
 
Upvote 0
When the whole table has been imported, it's a matter of playing with INDEX & MATCH to extract whichever line /column /cell you prefer; so I would not modify the macro as the request can be matched with simple formulas.
For example, if you need to extract only Column "Fin" then you may use the formula
VBA Code:
=INDEX(A1:Z100,0,MATCH("Fin",A1:Z1,0))
This is an array formula, if your XL version doesn't support "Dynamic Array" you have to confirm it using Contr-Shift-Enter keys, non only Enter

To make easier to address the table I suggest you modify this part of the code (1 added line, 1 modified one):
Rich (BB code):
Range("A:V").ClearContents
Range("A:A").NumberFormat = "@"                         'Added line
Set mytab = Doc.getElementsByClassName("w-dog-ledger-table w-dog-ledger-performances recent-form")(0).getElementsByTagName("table")(0)
For Each trtr In mytab.Rows
    i = i + 1
    For Each tdtd In trtr.Cells
        j = j + 1
        If InStr(1, trtr.outerHTML, "display: none", vbTextCompare) = 0 Then
            Cells(i, j).Value = Trim(Replace(Replace(tdtd.innerText, Chr(10), "", , , vbTextCompare), Chr(13), "", , , vbTextCompare))         'Modified
        Else
            i = i - 1
            Exit For
        End If
    Next tdtd
    j = 0
Next trtr
Bye
 
Upvote 0

Forum statistics

Threads
1,223,909
Messages
6,175,310
Members
452,634
Latest member
cpostell

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top