VBA Scraping from dynamic web page

Bruzio

Board Regular
Joined
Aug 20, 2020
Messages
85
Office Version
  1. 2016
Platform
  1. Windows
  2. Mobile
Hi everybody. I'm trying to download soccer results from Football results 21 august 2020, I can't as I'd like.
My question was posted on a italian forum, but we were unable to resolve.
I trust in your knowledge.


What do i need?

6 columns:

LEAGUE - START - HOME - AWAY - SCORE- STATUS

What are the problems?

1) Macro doesn't download all scheduled events.
2) The content of the rows isn't correct
2) The web page has a more results button that is always visible and clickable, and i need to load all hidden pages.

I expect this, see screen:

This is my current code:

VBA Code:
Sub fromweb()
    Dim IE As Object
    Dim Doc As Object
    Dim x As Object
    Dim i As Long
    Dim myArray As Variant
    Set IE = CreateObject("InternetExplorer.Application")
    Application.ScreenUpdating = False
    Cells.Clear
    Const myURL As String = "fctables.com/livescore/21_08_2020/"
    With IE
        .navigate myURL
        .Visible = True
        Do While .Busy: DoEvents: Loop
        Do While .readyState <> 4: DoEvents: Loop
        Application.Wait Now + TimeValue("0:00:10")
    End With
    myArray = Array("LEAGUE", "START", "HOME", "AWAY", "SCORE", "STATUS")
    With Range("A1:F1")
        .Value = myArray
    End With
    Set Doc = IE.Document
    For Each x In Doc.getElementsByClassName("league")
        i = i + 1: j = 0
        Cells(i + 1, 1) = Replace(Replace(x.innerText, "Table", ""), Chr(10), "")
        For Each y In Doc.getElementsByClassName("col-xs-8 col-sm-9 col-lg-10 truncate")
            Cells(i + 1, 1) = y.innerText
            Cells(i + 1, 2) = Doc.getElementsByClassName("date")(j).innerText
            Cells(i + 1, 3) = Doc.getElementsByClassName("home")(j).innerText
            Cells(i + 1, 4) = Doc.getElementsByClassName("away")(j).innerText
            Cells(i + 1, 5) = Doc.getElementsByClassName("score")(j).innerText
            Cells(i + 1, 6) = Doc.getElementsByClassName("status_name")(j).innerText
            j = j + 1: i = i + 1
        Next
    Next
    IE.Quit
    Set IE = Nothing
    Set Doc = Nothing
    Application.ScreenUpdating = True
End Sub

How to proceed correctly?
Thanks for the attention
 

Attachments

  • test.jpg
    test.jpg
    109.2 KB · Views: 155
You're welcome.

Use the following version slightly modified for the odds.

VBA Code:
Sub crawlForMe()

Dim sht As Worksheet
Dim rng As Range
Dim i As Integer
Dim strDate As String

Dim http As MSXML2.XMLHTTP60
Dim html As HTMLDocument
Dim league As HTMLDivElement
Dim game As HTMLDivElement
Dim status As HTMLDivElement
Dim teams As HTMLDivElement
Dim odds As HTMLDivElement
    
    strDate = "08-11-2020"
    
    Set sht = ActiveSheet
    Set rng = sht.Range("A1")
    rng.CurrentRegion.ClearContents
    rng.Resize(, 9) = Array("LEAGUE", "START", "HOME", "AWAY", "SCORE", "STATUS", "WIN HOME", "DRAW", "WIN AWAY")
    
    Set http = New MSXML2.XMLHTTP60
    Set html = New HTMLDocument
    
    Do
        DoEvents
        i = i + 1
        
        http.Open "POST", "https://www.fctables.com/xml/livescore/", True
        http.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        http.send "page=" & i & "&date=" & strDate
        
        Do Until http.readyState = 4
            DoEvents
        Loop
        html.body.innerHTML = http.responseText
        
        If Len(http.responseText) = 0 Then Exit Do
        
        For Each league In html.getElementsByClassName("league")
            For Each game In league.getElementsByClassName("games").Item(0).Children
                Set rng = rng.Offset(1)
                Set status = game.getElementsByClassName("status").Item(0)
                Set teams = game.getElementsByClassName("name game_hover_info").Item(0)
                Set odds = game.getElementsByClassName("godds").Item(0)
                
                rng.Cells(, 1).Value = league.getElementsByClassName("panel-title row").Item(0).getElementsByTagName("a").Item(0).innerText
                rng.Cells(, 2).Value = status.getElementsByClassName("date").Item(0).innerText
                rng.Cells(, 3).Value = teams.getElementsByClassName("home").Item(0).ChildNodes(teams.getElementsByClassName("home").Item(0).ChildNodes.Length - 1).NodeValue
                rng.Cells(, 4).Value = teams.getElementsByClassName("away").Item(0).ChildNodes(0).NodeValue
                rng.Cells(, 5).Value = "'" & teams.getElementsByClassName("score").Item(0).innerText
                rng.Cells(, 6).Value = status.getElementsByClassName("status_name").Item(0).innerText
                
                If odds.ChildNodes.Length = 3 Then
                    rng.Cells(, 7).Value = odds.ChildNodes(0).innerText
                    rng.Cells(, 8).Value = odds.ChildNodes(1).innerText
                    rng.Cells(, 9).Value = odds.ChildNodes(2).innerText
                End If
            Next game
            
        Next league
    Loop
    
End Sub

Just follow my code, and familiarize how I find the parent elements first and then the child elements in these parent elements. Then you can retrieve any information exists in the returned data. The key point is finding the XHR end-point, and make the call in your code as necessary, no matter which scripting language you are using.
 
Upvote 0

Excel Facts

Wildcard in VLOOKUP
Use =VLOOKUP("Apple*" to find apple, Apple, or applesauce
Thank you, fantastic.
Have you tried to change date to 07-11-2020? I' ve an error here " rng.Cells(, 4).Value = teams.getElementsByClassName("away").Item(0).ChildNodes(0).NodeValue"
 
Upvote 0
Have you tried to change date to 07-11-2020? I' ve an error here " rng.Cells(, 4).Value = teams.getElementsByClassName("away").Item(0).ChildNodes(0).NodeValue"
There are missing opponents in some games, see the two samples in the list below:

1604862362729.png


You can check the existence of the items and avoid the error, but again, this is HTML which you can't trust. There might be further errors as well. I just assumed they provide a consistent HTML content. In the real world, you would want to check every element then proceed only if they are valid.

Make the following change to handle this particular error:

VBA Code:
                If teams.getElementsByClassName("home").Item(0).ChildNodes.Length > 0 Then
                    rng.Cells(, 3).Value = teams.getElementsByClassName("home").Item(0).ChildNodes(teams.getElementsByClassName("home").Item(0).ChildNodes.Length - 1).NodeValue
                End If
                If teams.getElementsByClassName("away").Item(0).ChildNodes.Length > 0 Then
                    rng.Cells(, 4).Value = teams.getElementsByClassName("away").Item(0).ChildNodes(0).NodeValue
                End If

You can also use the following code, and get the innerText of the element, but there are other numbers in brackets next to the team name and they will be also printed in that case:

VBA Code:
                rng.Cells(, 3).Value = teams.getElementsByClassName("home").Item(0).innerText
                rng.Cells(, 4).Value = teams.getElementsByClassName("away").Item(0).innerText
 
Upvote 0
Solution
Upvote 0
I'm trying.... I've added this part rng.Cells(, 10).Value = game.getElementsByTagName("a")

This is the result.

Cartel1 (1).xlsm
BCDEFGHIJ
1STARTHOMEAWAYSCORESTATUSWIN HOMEDRAWWIN AWAYLINK
217:55Qarabag FKSivasspor-Not started2,83,252,6about:/livescore/qarabag-fk-sivasspor-g1858828/
317:55LASK LinzAntwerp-Not started1,853,84,15about:/livescore/lask-linz-antwerp-g1858839/
417:55Maccabi Tel AvivVillarreal-Not started4,84,11,68about:/livescore/maccabi-tel-aviv-villarreal-g1858827/
Foglio6
 
Upvote 0
In this new page there are all available odds, you can tell me how i should do to download all odds for every event?
It could be done by extending the macro I wrote above. I believe that the new column can have a clickable link that will create a new sheet with the odds for the selected game. I can see that the game's odds page's HTML source has a good structure to parse necessary data.

However, I won't be able to look at it until later today or tomorrow. If you are in a hurry, then you can create a new thread for this related but actually separated question by also referring to the macro you used in this page, and other helpers might be available sooner than I would be or the other helpers in this thread.

Otherwise, if you are not in a hurry, and would like to follow it here, then I'll see what I can do as soon as I am available.
 
Upvote 0
Update:

For the links i've used a new column:

Cartel1 (1).xlsm
JK
6about:/livescore/monagas-sc-caracas-g1862174/https://www.fctables.com/livescore/monagas-sc-caracas-g1862174/
7about:/livescore/alianza-atletico-union-huaral-g1864016/https://www.fctables.com/livescore/alianza-atletico-union-huaral-g1864016/
8about:/livescore/figueirense-botafogo-sp-g1779723/https://www.fctables.com/livescore/figueirense-botafogo-sp-g1779723/
9about:/livescore/guarani1_2-gremio-g1863581/https://www.fctables.com/livescore/guarani1_2-gremio-g1863581/
Foglio6
Cell Formulas
RangeFormula
K6:K9K6=SUBSTITUTE(J6,"about:/","https://www.fctables.com/")
 
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,871
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top