VBA Scraping from dynamic web page

Bruzio

Board Regular
Joined
Aug 20, 2020
Messages
85
Office Version
  1. 2016
Platform
  1. Windows
  2. Mobile
Hi everybody. I'm trying to download soccer results from Football results 21 august 2020, I can't as I'd like.
My question was posted on a italian forum, but we were unable to resolve.
I trust in your knowledge.


What do i need?

6 columns:

LEAGUE - START - HOME - AWAY - SCORE- STATUS

What are the problems?

1) Macro doesn't download all scheduled events.
2) The content of the rows isn't correct
2) The web page has a more results button that is always visible and clickable, and i need to load all hidden pages.

I expect this, see screen:

This is my current code:

VBA Code:
Sub fromweb()
    Dim IE As Object
    Dim Doc As Object
    Dim x As Object
    Dim i As Long
    Dim myArray As Variant
    Set IE = CreateObject("InternetExplorer.Application")
    Application.ScreenUpdating = False
    Cells.Clear
    Const myURL As String = "fctables.com/livescore/21_08_2020/"
    With IE
        .navigate myURL
        .Visible = True
        Do While .Busy: DoEvents: Loop
        Do While .readyState <> 4: DoEvents: Loop
        Application.Wait Now + TimeValue("0:00:10")
    End With
    myArray = Array("LEAGUE", "START", "HOME", "AWAY", "SCORE", "STATUS")
    With Range("A1:F1")
        .Value = myArray
    End With
    Set Doc = IE.Document
    For Each x In Doc.getElementsByClassName("league")
        i = i + 1: j = 0
        Cells(i + 1, 1) = Replace(Replace(x.innerText, "Table", ""), Chr(10), "")
        For Each y In Doc.getElementsByClassName("col-xs-8 col-sm-9 col-lg-10 truncate")
            Cells(i + 1, 1) = y.innerText
            Cells(i + 1, 2) = Doc.getElementsByClassName("date")(j).innerText
            Cells(i + 1, 3) = Doc.getElementsByClassName("home")(j).innerText
            Cells(i + 1, 4) = Doc.getElementsByClassName("away")(j).innerText
            Cells(i + 1, 5) = Doc.getElementsByClassName("score")(j).innerText
            Cells(i + 1, 6) = Doc.getElementsByClassName("status_name")(j).innerText
            j = j + 1: i = i + 1
        Next
    Next
    IE.Quit
    Set IE = Nothing
    Set Doc = Nothing
    Application.ScreenUpdating = True
End Sub

How to proceed correctly?
Thanks for the attention
 

Attachments

  • test.jpg
    test.jpg
    109.2 KB · Views: 158
In this new page there are all available odds, you can tell me how i should do to download all odds for every event?
@Bruzio - In your latest mini-sheet, you list the URLs that you extracted. Does it mean you solved it? Is it all you need to get the hyperlinks? In this case, you can use HYPERLINK() function to convert column K links to actual hyperlinks that you can click.

If not, then I see the following content in the game's odds page.
1606498178374.png

  • This pages should be parsed individually for each game, so it doesn't make sense to do this during the actual games download solved in the previous step. I recommend doing this step on demand by using a trigger like clicking to a hyperlink saying "Odds" that will be placed in a new column in the big table for each game. So, you click on it, a new macro parses that game's odds page, and create a new table in a new sheet that shows the data above. Is this something that will help your project?
  • Otherwise please explain how you need it to work.
  • In either case, please explain how you need this data to be placed in the worksheet. I have zero experience and interest about these games/odds, so I can't visualize how this data should be listed in a worksheet.
 
Upvote 0

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
I'm proceeding step by step.
In the column K i used a formula becouse using
VBA Code:
rng.Cells(, 10).Value = game.getElementsByTagName("a")
doesn't download the hyperlink. Is it correct this procedure?

This is my idea:
1) I use your powerfull macro (thanks again) to download scheduled events.
2) I should to use a temporary sheet where to download all odds individually for each event, paste to main sheet, clean temp sheet and repeat for all events.

I need to learn to download directly hyerlinks, then simulate click and lastly to download individually odds for events.
I'm good to copy cells and to clean a sheet :ROFLMAO:

What do you think?
 
Last edited:
Upvote 0
2) I should to use a temporary sheet where to download all odds individually for each event, paste to main sheet, clean temp sheet and repeat for all events.

There are many odds for each game. What is your plan to locate them in the main sheet for each game?
Again - please show us how you want to write the game odds in the corresponding column(s).

Edit: Look at the following sample: How do you want to place the odds starting from column J? I am trying to understand the final table structure that you need.

1143549-2.xlsm
ABCDEFGHIJ
1LEAGUESTARTHOMEAWAYSCORESTATUSWIN HOMEDRAWWIN AWAYODDS
2England - Premier League20:00Crystal PalaceNewcastle United-Not started2.133.223.83????
3Spain - La Liga20:00ValladolidLevante-Not started2.543.272.85????
4Germany - 1. Bundesliga19:30WolfsburgWerder Bremen-Not started1.93.654????
5France - Ligue 120:00StrasbourgRennes-Not started2.893.22.55????
Sheet1

There are hundreds of games, and if you need to get all odds instead of getting them individually on demand, then it means hundreds of other web calls that will slow your macro dramatically.
However, if that's what you need, then you should be ok about the slowness.

I need to learn to download directly hyerlinks, then simulate click and lastly to download individually odds for events.
I mostly write comments in my codes to explain what that section is actually doing like many other coders. because we want you to learn instead of just copy and paste action. So, you can follow the comments to learn, and ask questions if you don't understand something.

Side note: I don't know about this particular football website's request monitoring policy, and if they would block your access by suspecting that you are trying to make a DDOS attack. It won't be a massive request in this scenario, but if you try to access thousands of pages of a website in a short amount of time, there is almost a possibility that the responding server would think that it is an automated attack, and try to stop your access. Just as a note for automated access to the website.
 
Upvote 0
Hi, my target is to have an archive, for this reason i need to download automatically.

Cartel1 (1).xlsm
JKLMNOPQRSTUV
4HTFTOVER/UNDER 1,5OVER/UNDER 2,5OVER/UNDER 3,5
5LINK1X21X2OVER 1,5UNDER 1,5OVER 2,5UNDER 2,5OVER 3,5UNDER 3,5
6about:/livescore/monagas-sc-caracas-g1862174/
7about:/livescore/alianza-atletico-union-huaral-g1864016/
8about:/livescore/figueirense-botafogo-sp-g1779723/
9about:/livescore/guarani1_2-gremio-g1863581/
10about:/livescore/atletico-junior-union-la-calera-g1866559/
11about:/livescore/defensa-y-justicia-vasco-da-gama1-g1866556/
12about:/livescore/bucaramanga-alianza-petrolera-g1871896/
13about:/livescore/zhetysu-taldykorgan-fc-astana-g1765294/
14about:/livescore/tianjin-teda-guizhou-hengfeng-zhicheng-g1867215/
15about:/livescore/kaisar-kyzylorda-fc-kyzylzhar-petropavlovsk-g1765308/
Foglio6



I've to study how to replace "about" via vba and how to have hyperlinks without formula.
I need to proceed step by step.

Note: odds are only available when the match hasn't started
 

Attachments

  • ODDS.jpg
    ODDS.jpg
    83.8 KB · Views: 13
Last edited:
Upvote 0
Just like before, the following approach might fail if the source document structure is changed.
And there is no error handling for missing elements. However, it worked without problems for many days I tested. So, the source looks to be consistent.

Please apply the steps carefully below.
  • Create a new workbook.
  • Go to VBE, and insert a new Class Module. Name this new class module as clsOddGroup.
  • Copy and paste the following code into the empty class module you just created.
VBA Code:
Option Explicit
' Class module for Odd Group object
' Odds are grouped as 1X2, HT, Under/Over 1.5, etc.
' And each group has odds for a various number of different results.
' Therefore, I am using a class object to store the group name
' and also the odds for different results in an encapsulated object.
' We can also use the Dictionary object instead of the Collection object
' to avoid using a class object in this project.
' However, Collection is a native object type
' that doesn't require an additional library.

' Odd group name, 1X2, HT, Under/Over 1.5, etc.
Private m_name As String
' Collection object to store the odds for different results, 1, X, 2, etc.
Private m_items As New Collection

' Assign the Name property of the group
Property Let Name(strName As String)
    m_name = strName
End Property

' Retrieve the Name property of the group
Property Get Name() As String
    Name = m_name
End Property

' Function to add an odd result name
Public Sub addItem(itemName As String)
    m_items.add itemName
End Sub

' Retrieve the odd result names collection of the odd group
Property Get items() As Collection
    Set items = m_items
End Property
  • Insert a new Standard Module, and name it as however you like.
  • Copy and paste the following code into the empty standard module you just created.
VBA Code:
Option Explicit

Private Function createOddGroup(strName As String, oddItems As Variant) As clsOddGroup
' An odd group is defined with a name,
' and odds for different results
Dim i As Integer
    ' Create a new odd group class object
    Set createOddGroup = New clsOddGroup
    With createOddGroup
        ' Define the name of the odd group
        .Name = strName
        ' Define the odds for different results
        For i = 0 To UBound(oddItems)
            .addItem CStr(oddItems(i))
        Next i
    End With
End Function

Public Sub crawlForMe()
Dim sht As Worksheet
Dim rng As Range
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim strDate As String

Dim http As MSXML2.XMLHTTP60
Dim html As HTMLDocument
Dim htmlOdds As HTMLDocument
Dim league As HTMLDivElement
Dim game As HTMLDivElement
Dim status As HTMLDivElement
Dim teams As HTMLDivElement
Dim odds As HTMLDivElement
Dim gameOddGroups As HTMLDivElement
Dim gameOdd As HTMLDivElement

Dim collOddGroups As Collection
Dim clsOddGroupItem As clsOddGroup
Dim oddsUrl As String

    strDate = "29-11-2020"
    
    Set sht = ActiveSheet
    sht.Columns.Delete

    Set rng = sht.Range("A1")
    
    ' Create a collection for odd groups
    ' and add each group by providing the group name,
    ' and an array containing the result names.
    ' The beauty is working with objects,
    ' new group names can be added easily.
    Set collOddGroups = New Collection
    With collOddGroups
        .add createOddGroup("1X2", Array("1", "X", "2"))
        .add createOddGroup("HT", Array("1", "X", "2"))
        .add createOddGroup("Under/Over 1.5", Array("Under 1.5", "Over 1.5"))
        .add createOddGroup("Under/Over 2.5", Array("Under 2.5", "Over 2.5"))
    End With
    
    ' The rest is basically the same with the previous code
    ' except parsing the odd group pages
    With rng
        With .Resize(, 9)
            .Merge
            .Value = "GAMES"
            .HorizontalAlignment = xlHAlignCenter
        End With
        .Offset(1).Resize(, 9) = Array("LEAGUE", "START", "HOME", "AWAY", "SCORE", "STATUS", "WIN HOME", "DRAW", "WIN AWAY")
        
        j = 1
        For Each clsOddGroupItem In collOddGroups
            With .Offset(, j)
                With .Resize(, clsOddGroupItem.items.Count)
                    .Merge
                    .Value = clsOddGroupItem.Name
                    For k = 1 To clsOddGroupItem.items.Count
                        .Offset(1).Cells(, k) = clsOddGroupItem.items.Item(k)
                    Next k
                    .Resize(2).HorizontalAlignment = xlHAlignCenter
                End With
            End With
            j = j + clsOddGroupItem.items.Count
        Next clsOddGroupItem
        
    End With
    
    Set rng = rng.Offset(1)
    
    
    Set http = New MSXML2.XMLHTTP60
    Set html = New HTMLDocument
    Set htmlOdds = New HTMLDocument
    
    Do
        DoEvents
        i = i + 1
        
        http.Open "POST", "https://www.fctables.com/xml/livescore/", True
        http.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        http.send "page=" & i & "&date=" & strDate
        
        Do Until http.readyState = 4
            DoEvents
        Loop
        html.body.innerHTML = http.responseText
        
        If Len(http.responseText) = 0 Then Exit Do
        
        For Each league In html.getElementsByClassName("league")
            For Each game In league.getElementsByClassName("games").Item(0).Children
                Set rng = rng.Offset(1)
                Set status = game.getElementsByClassName("status").Item(0)
                Set teams = game.getElementsByClassName("name game_hover_info").Item(0)
                Set odds = game.getElementsByClassName("godds").Item(0)
                
                
                
                rng.Cells(, 1).Value = league.getElementsByClassName("panel-title row").Item(0).getElementsByTagName("a").Item(0).innerText
                rng.Cells(, 2).Value = status.getElementsByClassName("date").Item(0).innerText
                If teams.getElementsByClassName("home").Item(0).ChildNodes.Length > 0 Then
                    rng.Cells(, 3).Value = teams.getElementsByClassName("home").Item(0).ChildNodes(teams.getElementsByClassName("home").Item(0).ChildNodes.Length - 1).NodeValue
                End If
                If teams.getElementsByClassName("away").Item(0).ChildNodes.Length > 0 Then
                    rng.Cells(, 4).Value = teams.getElementsByClassName("away").Item(0).ChildNodes(0).NodeValue
                End If
                rng.Cells(, 5).Value = "'" & teams.getElementsByClassName("score").Item(0).innerText
                rng.Cells(, 6).Value = status.getElementsByClassName("status_name").Item(0).innerText
                
                If odds.ChildNodes.Length = 3 Then
                    rng.Cells(, 7).Value = odds.ChildNodes(0).innerText
                    rng.Cells(, 8).Value = odds.ChildNodes(1).innerText
                    rng.Cells(, 9).Value = odds.ChildNodes(2).innerText
                End If
                
                ' This is where we start processing the odd results page.
                ' Open the odd results page only if the game is not started yet.
                ' We do this by checking a class name defined in the game element accordingly.
                If InStr(game.className, "status_not_started") Then
                    ' URL for the odds page
                    oddsUrl = Replace(teams.getElementsByTagName("a").Item(0).href, "about:", "https://www.fctables.com")
                    http.Open "GET", oddsUrl, True
                    http.send
                    Do Until http.readyState = 4
                        DoEvents
                    Loop
                    htmlOdds.body.innerHTML = http.responseText
                    
                    'Self-descriptive, althought it requires understanding
                    ' the cell references, html document model objects, etc.
                    j = 0
                    For Each clsOddGroupItem In collOddGroups
                        For Each gameOddGroups In htmlOdds.getElementsByClassName("odds_type")
                            If gameOddGroups.getElementsByClassName("type_header").Item(0).innerText = clsOddGroupItem.Name Then
                                For k = 1 To clsOddGroupItem.items.Count
                                    For Each gameOdd In gameOddGroups.getElementsByClassName("odd")
                                        If gameOdd.getElementsByClassName("name").Item(0).innerText = clsOddGroupItem.items.Item(k) Then
                                            rng.Cells(, 9).Offset(, j).Offset(, k) = gameOdd.getElementsByClassName("value").Item(0).innerText
                                            Exit For
                                        End If
                                    Next gameOdd
                                Next k
                                Exit For
                            End If
                        Next gameOddGroups
                        j = j + clsOddGroupItem.items.Count
                    Next clsOddGroupItem
                End If
            Next game
        Next league
    Loop
End Sub

Execute crawlForMe sub procedure. The following shows the sample result.
1143549-2.xlsm
ABCDEFGHIJKLMNOPQRS
1GAMES1X2HTUnder/Over 1.5Under/Over 2.5
2LEAGUESTARTHOMEAWAYSCORESTATUSWIN HOMEDRAWWIN AWAY1X21X2Under 1.5Over 1.5Under 2.5Over 2.5
3England - Premier League14:00SouthamptonManchester United-Not started3.73.652.013.853.62.044.12.232.553.951.252.021.8
4England - Premier League16:30ChelseaTottenham-Not started2.113.63.472.123.63.62.632.233.94.11.242.061.76
5England - Premier League19:15ArsenalWolverhampton Wanderers-Not started2.23.273.582.183.353.752.82.024.22.951.41.642.28
6Spain - La Liga13:00BarcelonaOsasuna-Not started1.295.959.21.266.4111.582.957.56.11.122.81.44
Sheet1

I am not telling that this is an easy code to understand, however, it is also not difficult if you carefully analyze it "step by step". In fact, the relatively more difficult part is building the underlying logic. I consider this project is completed here. However, if you have questions about the code or the logic, then let me know.
 
Upvote 0
Wow, you did it again ?
It's perfect, now i have to understand what you did, i've to add over/under 3.5 column.
Thank you so much.
 
Upvote 0
I added array
VBA Code:
.add createOddGroup("Under/Over 3.5", Array("Under 3.5", "Over 3.5"))
and i changed position
Code:
rng.Cells(, 11).Offset(, j).Offset(, k) = gameOdd.getElementsByClassName("value").Item(0).innerText

It's ok. Now i've to download HT results when match has finished, it's only visible when has finished.
 
Upvote 0
Hi @smozgur , sorry to bother you again, I've a request.
I realized that when the events are too many macro works slowly and I don't need to download events that don't have odds, so i added <> "" here
VBA Code:
If gameOdd.getElementsByClassName("name").Item(0).innerText = clsOddGroupItem.items.Item(k) Then
but doesn't work. I tried > 0 with the same result.

May it be because it's a class group?
 
Upvote 0
As I mentioned, I have no idea about how the betting logic works, and even the terms on this website. So, if you could explain how you determine if an event doesn't have odds, then I would try to help.
 
Upvote 0
Hi, the website doesn't provide all odds for all matches, I'm referring to the ht and over odds, for this reason there are empty records in my file.
I don't need to download events that haven't these odds, download time are long when there are too many events.
For example here, Atletico Guemes - Sarmiento de Resistencia livescores result Primera B Metropolitana 21 dec 2020
For this event there aren't ht and over odds, it can skip download?

"If clsOddGroupItem.items.Item is empty go next game" a thing like that. As I said i tried with <>"" and > 0 but doesn't work

Ecuador - Ecuadorian Serie A21-12-202001:00LDU de QuitoTecnico UniversitarioNot started2,082,325,301,21
Ecuador - Ecuadorian Serie A21-12-202001:00CSD MacaraIndependiente del ValleNot started3,002,253,051,18
Chile - Primera Division21-12-202001:30Coquimbo UnidoO'HigginsNot started3,402,103,001,32
Costa Rica - Primera Division21-12-202003:00LD AlajuelenseClub Sport HeredianoNot started2,432,204,401,25
Mexico - Ascenso MX21-12-202003:00AtlanteJaibos Tampico MaderoNot started3,101,844,101,55
Saudi Arabia - 1. Division21-12-202013:10Al BatinAl Nassr FCNot started
Egypt - 1. Division21-12-202013:30National BankEl Entag El HarbyNot started
England - Premier Reserve League21-12-202014:00Crystal Palace U23Newcastle U21Not started
Kuwait - 1. Division21-12-202014:00Al-SalibikhaetAl Yarmouk KuwaitNot started
Turkey - 1. Lig21-12-202014:00SamsunsporBolusporNot started2,022,236,251,3
England - Premier Reserve League21-12-202014:30Stoke U21West Bromwich Albion U21Not started
England - Premier Reserve League21-12-202015:00Leeds U23Fulham U21Not started
India - ISL21-12-202015:00Atletico de KolkataBengaluru FCNot started
DR Congo - Super League21-12-202015:30Sanga BalendeManiema UnionNot started
Saudi Arabia - 1. Division21-12-202015:55Al-TaawonAl ShababNot started4,402,232,331,25
Egypt - 1. Division21-12-202016:00El GounahIsmaily SCNot started
Ghana - 1. Division21-12-202016:00Berekum ChelseaAduana StarsNot started
Turkey - 1. Lig21-12-202016:00Adana DemirsporTuzlasporNot started2,552,184,101,22
Greece - Super League21-12-202016:15PAS GianninaAsteras TripolisNot started3,201,853,951,41
Romania - Liga I21-12-202016:30UTA Batrana DoamnaAstra GiurgiuNot started3,551,913,451,45
 
Last edited:
Upvote 0

Forum statistics

Threads
1,225,743
Messages
6,186,777
Members
453,370
Latest member
juliewar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top