Web Scraping using vba

Abdmujib

Board Regular
Joined
May 15, 2022
Messages
123
Office Version
  1. 2021
Platform
  1. Windows
Pls, I want to write a vba code for web scaraping. I want to access betexplorer.com soccer section, for the matches that would be played today (if possible with option to be extend it to to other days i.e tomorrow and any other days).

The program looks at all the games being played that day. It checks each game's record to see if certain pattern happened before in previous games between those teams. For example, if Manchester United is playing against Arsenal, the program will check if both teams scored, times they scored more than 1 goal, times they scored more than 2 goals, times they scored more than 3 goals and times they scored more than 4 goals and times when both team did not score. If the program finds these things happening in at least 70% of the previous games between Manchester United and Arsenal. Then the result will be matches that has the pattern occurrence at least 70%. with the pattern that occurred listed against it and the link to the match.

@Anthony47 pls can you help. thanks for the usual help

pls anybody is free to help.
Thanks

These are the codes suggested by chatgtp. pls kindly help to modify

VBA Code:
Sub CheckMatches()
    Dim xmlHttp As Object
    Dim htmlDoc As Object
    Dim table As Object
    Dim rows As Object
    Dim row As Object
    Dim link As Object
    Dim h2hLink As String
    Dim h2hDoc As Object
    Dim h2hTable As Object
    Dim h2hRows As Object
    Dim h2hRow As Object
    Dim h2hCells As Object
    Dim i As Long
    Dim score As String
    Dim goals As Long
    Dim btts As Boolean
    Dim matches As Collection
    
    ' Create a new XMLHTTP object
    Set xmlHttp = CreateObject("MSXML2.XMLHTTP")
    
    ' Send a GET request to the website
    xmlHttp.Open "GET", "https://www.betexplorer.com/soccer/", False
    xmlHttp.send
    
    ' Parse the HTML response using the HTMLDocument object
    Set htmlDoc = CreateObject("HTMLFile")
    htmlDoc.body.innerHTML = xmlHttp.responseText
    
    ' Extract the links to the matches
    Set table = htmlDoc.getElementById("sortable-1")
    Set rows = table.getElementsByTagName("tr")
    
    ' Loop through the rows of the table
    For Each row In rows
        ' Get the link to the match
        Set link = row.getElementsByTagName("a")(0)
        h2hLink = "https://www.betexplorer.com" & link.getAttribute("href") & "&h2h=1"
        
        ' Send a GET request to the H2H page
        xmlHttp.Open "GET", h2hLink, False
        xmlHttp.send
        
        ' Parse the HTML response using the HTMLDocument object
        Set h2hDoc = CreateObject("HTMLFile")
        h2hDoc.body.innerHTML = xmlHttp.responseText
        
        ' Extract the rows of the H2H table
        Set h2hTable = h2hDoc.getElementById("h2h-matches")
        Set h2hRows = h2hTable.getElementsByTagName("tr")
        
        ' Loop through the rows of the H2H table
        For Each h2hRow In h2hRows
            ' Skip the header row
            If h2hRow.className = "hl" Then
                Continue For
            End If
            
            ' Extract the cells of the H2H row
            Set h2hCells = h2hRow.getElementsByTagName("td")
            
            ' Check if the H2H row matches the pattern
            score = h2hCells(2).innerText
            goals = CLng(h2hCells(3).innerText) + CLng(h2hCells(4).innerText)
            btts = (h2hCells(5).innerText = "BTS")
            
            ' Update the match list if the H2H row matches the pattern
            If (score = "1:1" Or score = "2:2") And goals >= 3 And btts Then
                matches.Add link.getAttribute("href")
                Exit For
            End If
        Next h2hRow
    Next row
    
    ' Print the links to the matches that match the pattern
    For i = 1 To matches.Count
        Debug.Print matches.Item(i)
    Next i
End Sub


The second code

VBA Code:
Sub CheckMatchesForPatterns()
    ' Set up variables
    Dim url As String
    Dim response As String
    Dim soup As Object
    Dim today_matches As Object
    Dim match As Object
    Dim match_link As String
    Dim match_url As String
    Dim h2h_stats As Variant
    Dim patterns As Variant
    
    ' Set the URL to scrape
    url = "https://www.betexplorer.com/soccer/"
    
    ' Create a new HTTP request and send it to the URL
    Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
    xmlhttp.Open "GET", url, False
    xmlhttp.send
    
    ' Get the response text from the request
    response = xmlhttp.responseText
    
    ' Parse the HTML response text with BeautifulSoup
    Set soup = CreateObject("HTMLFile")
    soup.body.innerHTML = response
    
    ' Get all the matches in today's list
    Set today_matches = soup.getElementById("sortable-1").getElementsByClassName("dark")
    
    ' Loop through today's matches and check h2h stats and patterns
    For Each match In today_matches
        ' Get the match link and URL
        match_link = match.getElementsByTagName("a")(0).getAttribute("href")
        match_url = "https://www.betexplorer.com" & match_link
        
        ' Call the function to calculate the h2h stats
        h2h_stats = GetH2HStats(match_url)
        
        ' Check if there are any h2h stats for this match
        If Not IsEmpty(h2h_stats) Then
            ' Call the function to check for patterns in the h2h stats
            patterns = CheckH2HPatterns(h2h_stats)
            
            ' Check if there are any patterns for this match
            If Not IsEmpty(patterns) Then
                ' Print the match link and patterns to the immediate window
                Debug.Print "Match: " & match_link
                Debug.Print "Patterns: " & Join(patterns, ", ")
            End If
        End If
    Next match
End Sub

Function GetH2HStats(match_url As String) As Variant
    ' Set up variables
    Dim match_response As String
    Dim match_soup As Object
    Dim h2h_table As Object
    Dim h2h_rows As Object
    Dim total_matches As Long
    Dim both_scored As Long
    Dim total_over2 As Long
    Dim total_under2 As Long
    Dim no_score As Long
    Dim row As Object
    Dim columns As Object
    Dim goals_home As Long
    Dim goals_away As Long
    Dim both_scored_pct As Double
    Dim total_over2_pct As Double
    Dim total_under2_pct As Double
    Dim no_score_pct As Double
    
    ' Create a new HTTP request and send it to the match URL
    Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
    xmlhttp.Open "GET", match_url, False
    xmlhttp.send
    
    ' Get the response text from the request
    match_response = xmlhttp.responseText
    
    ' Parse the HTML response text with BeautifulSoup
    Set match_soup = CreateObject("HTMLFile")
    match_soup.body.innerHTML = match_response
    
    ' Get the h2h table from the match page
    Set h2h_table = match_soup.getElementsByClassName("h2h")(0).getElementsByTagName("tbody")(0)
    
    ' Get all the rows in the h2h table
    Set h2h_rows = h2h_table.getElementsByTagName("tr")
 

Excel Facts

Fastest way to copy a worksheet?
Hold down the Ctrl key while dragging tab for Sheet1 to the right. Excel will make a copy of the worksheet.
Please don't ask us to test ChatGpt dreamlike visions, often plausible, more than often just invented

If you need help specify which are the url of the pages and which information you would like to catch.
 
Upvote 0
Please don't ask us to test ChatGpt dreamlike visions, often plausible, more than often just invented

If you need help specify which are the url of the pages and which information you would like to catch.
Thank you so much

I want a vba code that will scrape through this link "https://www.betexplorer.com/next/soccer/" the link contains matches to be played today. (if possible, I would need an option to be able to go to other days like tomorrow).

I want a program that will enter into each of the matches there and only return the link to the matches that satisfy the conditions.

I want the program to access each matches listed and check the head to head section of each particular match for patterns like, when both teams scored at least a goal, when both team did not score at least a goal, when the total goals scored is more than a goal, when the total goals scored is more than two goals, when the total goals scored is more than three goals and when the score more than 4 goals. And bring the result if any of the pattern occured at least 70% in there previous matches.


Then the result would be of those that happened 70% of the time and it will list those matches and the link address of the matches and also the patterns that occurred. A match can have ore than one partern occurrence.


Thank you so much
 
Upvote 0
I can probably help to catch the raw tables, I guess you are able to perform the additional operations.

The code:
VBA Code:
Sub GetBExpl()
Dim WPage As Object         'Selenium.WebDriver
Dim myUrl As String
Dim myTim As Single
Dim TDate As Date
'
Sheets("Foglio5").Select                            '<<< The working sheet
myTim = Timer
'Crea Driver:
Set WPage = CreateObject("Selenium.WebDriver")
If IsDate(Selection) And Selection.Row = 1 Then     'Get the date
    TDate = Selection.Value
Else
    TDate = Int(Now)
End If
myUrl = "https://www.betexplorer.com/next/soccer/?year=" & Year(TDate) & "&month=" & _
   Month(TDate) & "&day=" & Day(TDate) '
WPage.Start "chrome", myUrl
WPage.Get "/"
WPage.Wait 500
Debug.Print "URL=" & myUrl
Debug.Print ">Page loaded", Format(Timer - myTim, "0.0")
With Selection.Range("A2:F10000")
    .ClearContents
    .ColumnWidth = 4
    .NumberFormat = "@"
End With
Debug.Print "Ready; rows:", Application.WorksheetFunction.CountA(Selection.EntireColumn), "Column: " & Selection.Column
WPage.FindElementsByTag("Table")(1).AsTable.ToExcel Selection.Offset(2, 0)
Selection.Range("A1:F1").EntireColumn.AutoFit
Selection.Range("A2:F10000").WrapText = False
Debug.Print "Completed; rows:", Application.WorksheetFunction.CountA(Selection.EntireColumn), "Column: " & Selection.Column
End Sub
The code will check if the selected starting cell contains a Date and will use that date or "today" to query the site.
Then the result table will be imported in the column of the selected cell

I run the macro twice; first starting from A1 (march 11), then starting from H1 (march 12); what the macro collect is shown in the attached image
 

Attachments

  • BE_Immagine 2023-03-11 155749.jpg
    BE_Immagine 2023-03-11 155749.jpg
    226.6 KB · Views: 57
Upvote 0
Thank you so much, I appreciate.

The
I can probably help to catch the raw tables, I guess you are able to perform the additional operations.

The code:
VBA Code:
Sub GetBExpl()
Dim WPage As Object         'Selenium.WebDriver
Dim myUrl As String
Dim myTim As Single
Dim TDate As Date
'
Sheets("Foglio5").Select                            '<<< The working sheet
myTim = Timer
'Crea Driver:
Set WPage = CreateObject("Selenium.WebDriver")
If IsDate(Selection) And Selection.Row = 1 Then     'Get the date
    TDate = Selection.Value
Else
    TDate = Int(Now)
End If
myUrl = "https://www.betexplorer.com/next/soccer/?year=" & Year(TDate) & "&month=" & _
   Month(TDate) & "&day=" & Day(TDate) '
WPage.Start "chrome", myUrl
WPage.Get "/"
WPage.Wait 500
Debug.Print "URL=" & myUrl
Debug.Print ">Page loaded", Format(Timer - myTim, "0.0")
With Selection.Range("A2:F10000")
    .ClearContents
    .ColumnWidth = 4
    .NumberFormat = "@"
End With
Debug.Print "Ready; rows:", Application.WorksheetFunction.CountA(Selection.EntireColumn), "Column: " & Selection.Column
WPage.FindElementsByTag("Table")(1).AsTable.ToExcel Selection.Offset(2, 0)
Selection.Range("A1:F1").EntireColumn.AutoFit
Selection.Range("A2:F10000").WrapText = False
Debug.Print "Completed; rows:", Application.WorksheetFunction.CountA(Selection.EntireColumn), "Column: " & Selection.Column
End Sub
The code will check if the selected starting cell contains a Date and will use that date or "today" to query the site.
Then the result table will be imported in the column of the selected cell

I run the macro twice; first starting from A1 (march 11), then starting from H1 (march 12); what the macro collect is shown in the attached image
Thank you so much sir, I appreciate.

I believe, It populate the sheet with the fixtures on the page but I only want the matches that satisfy the requirements (70% of the occurrence). Like I want it to scan through the h2h section of each matches one by one and check for thses patterns then bring only the matches that satisfy the requirements (70% of the occurrence) together with the link address of the match and the pattern(s) that occurred.

Thank so much, I appreciate.
 
Upvote 0
I don't speak your language, ie don't know what you mean for h2h and where is that section and which information you would like to obtain.
Se please be more explicit on what you are trying to do and we'll see what can be done
 
Upvote 0
I don't speak your language, ie don't know what you mean for h2h and where is that section and which information you would like to obtain.
Se please be more explicit on what you are trying to do and we'll see what can be done
Oh, I'm so sorry. I didn't know.

H2h means head to head. In literal meaning, it means previous matches both teams has played against each other in the past.

For each matches listed on https://www.betexplorer.com/next/soccer/

There is a link address for each matches listed, so i want the program to go through those link and check the h2h section. The first picture is an example of a match, it's h2h is located towards the end of the page. While the second photo is theh2h matches they have played in the past.

So, I want the program to look for pattern that occur 70% in the h2h results. I.e to look for when both team score at least a goal, when both team didn't score, when the total goals is more than a goal, total goal more than 2, total goal more than 3 and total goal more than 4. If any of those occure at least 70% in their last matches in the past. It will bring that match as a result with, the link to the match and also the pattern that occur. The pattern can be more than 1 for a match. Then it will processd to the next match to see if it satisfy this criteria.
 

Attachments

  • Screenshot_20230312-140038_Chrome~2.jpg
    Screenshot_20230312-140038_Chrome~2.jpg
    156.8 KB · Views: 13
Upvote 0
I understand what you mean, and I can show you some of the methods that can be used.

In the given situation I would work with two separate selenium drivers: one for the overall list of matches and the second for h2h comparison.
As in the following demo macro:
Code:
 Sub GetH2H()
Dim wPage As Object         'Selenium.WebDriver
Dim H2H As Object
Dim myUrl As String
Dim myTim As Single, I As Long
Dim TDate As Date, mColl As Object, mLink, liColl As Object
'
Sheets("Foglio5").Select                            '<<< The working sheet
myTim = Timer
'Crea Driver:
Set wPage = CreateObject("Selenium.WebDriver")      '1st container
Set H2H = CreateObject("Selenium.WebDriver")        '2nd container

If IsDate(Selection) And Selection.Row = 1 Then     'Get the date
    TDate = Selection.Value
Else
    TDate = Int(Now)
End If
myUrl = "https://www.betexplorer.com/next/soccer/?year=" & Year(TDate) & "&month=" & _
   Month(TDate) & "&day=" & Day(TDate) '
wPage.Start "chrome", myUrl
wPage.Get "/"
wPage.Wait 500
Debug.Print "URL=" & myUrl
Debug.Print ">Page loaded", Format(Timer - myTim, "0.0")
H2H.Start "chrome"
Set mColl = wPage.FindElementsByClass("table-main__tt")
For I = 1 To mColl.Count
    mLink = mColl(I).FindElementsByTag("a")(1).Attribute("href")
    H2H.Get mLink
    H2H.Wait 300
    If I = 1 Then H2H.FindElementById("onetrust-accept-btn-handler").Click
    '
    'Open h2h panel:
    H2H.FindElementById("mutual_div").FindElementsByTag("A")(1).Click
    H2H.Wait 800
    Set tObj = H2H.FindElementsById("js-mutual-table")
    If tObj.Count > 0 Then
        Set liColl = H2H.FindElementsByCss("li[class='list-breadcrumb__item']")
        Debug.Print ">H2H for: " & liColl(liColl.Count).Text
        Debug.Print ">Listed: " & tObj(1).FindElementsByTag("tbody").Count
        Stop
            '    tObj(1) refers to the list of h2h matches now listed
            '
            'the information are visible
            'look at the information
            'catch what you need to compare
            'import what you need if the match is of interest or skip
        '
    Else
        Debug.Print ">No h2h for: " & Replace(mLink, "https://www.betexplorer.com", "", , , vbTextCompare)
    End If
    DoEvents
Next I
End Sub
This open the main page, then open in the second page each of the matches and expand the h2h section.
Currently the code halts on the Stop line, and tObj(1) refers to the table with the h2h matches.
But I cannot go further: these automations are a matter of trial and error, i.e. they require time and patience; but my time is limited, and my weekly dose of patience generally is over on Monday evening

So it is on you to go on: collect and analyze the information that you wish and import the information you need. I will still be able to help for specific and defined questions when they might arise
 
Upvote 0
I understand what you mean, and I can show you some of the methods that can be used.

In the given situation I would work with two separate selenium drivers: one for the overall list of matches and the second for h2h comparison.
As in the following demo macro:
Code:
 Sub GetH2H()
Dim wPage As Object         'Selenium.WebDriver
Dim H2H As Object
Dim myUrl As String
Dim myTim As Single, I As Long
Dim TDate As Date, mColl As Object, mLink, liColl As Object
'
Sheets("Foglio5").Select                            '<<< The working sheet
myTim = Timer
'Crea Driver:
Set wPage = CreateObject("Selenium.WebDriver")      '1st container
Set H2H = CreateObject("Selenium.WebDriver")        '2nd container

If IsDate(Selection) And Selection.Row = 1 Then     'Get the date
    TDate = Selection.Value
Else
    TDate = Int(Now)
End If
myUrl = "https://www.betexplorer.com/next/soccer/?year=" & Year(TDate) & "&month=" & _
   Month(TDate) & "&day=" & Day(TDate) '
wPage.Start "chrome", myUrl
wPage.Get "/"
wPage.Wait 500
Debug.Print "URL=" & myUrl
Debug.Print ">Page loaded", Format(Timer - myTim, "0.0")
H2H.Start "chrome"
Set mColl = wPage.FindElementsByClass("table-main__tt")
For I = 1 To mColl.Count
    mLink = mColl(I).FindElementsByTag("a")(1).Attribute("href")
    H2H.Get mLink
    H2H.Wait 300
    If I = 1 Then H2H.FindElementById("onetrust-accept-btn-handler").Click
    '
    'Open h2h panel:
    H2H.FindElementById("mutual_div").FindElementsByTag("A")(1).Click
    H2H.Wait 800
    Set tObj = H2H.FindElementsById("js-mutual-table")
    If tObj.Count > 0 Then
        Set liColl = H2H.FindElementsByCss("li[class='list-breadcrumb__item']")
        Debug.Print ">H2H for: " & liColl(liColl.Count).Text
        Debug.Print ">Listed: " & tObj(1).FindElementsByTag("tbody").Count
        Stop
            '    tObj(1) refers to the list of h2h matches now listed
            '
            'the information are visible
            'look at the information
            'catch what you need to compare
            'import what you need if the match is of interest or skip
        '
    Else
        Debug.Print ">No h2h for: " & Replace(mLink, "https://www.betexplorer.com", "", , , vbTextCompare)
    End If
    DoEvents
Next I
End Sub
This open the main page, then open in the second page each of the matches and expand the h2h section.
Currently the code halts on the Stop line, and tObj(1) refers to the table with the h2h matches.
But I cannot go further: these automations are a matter of trial and error, i.e. they require time and patience; but my time is limited, and my weekly dose of patience generally is over on Monday evening

So it is on you to go on: collect and analyze the information that you wish and import the information you need. I will still be able to help for specific and defined questions when they might arise
Thank you @Anthony47.

it takes turn to load each of the game for the second selenium, which i had to do press play manually for it to go to the next match. sometimes there are over 100 matches in a day, so i would have to do it 100 times for each stop.
pls kindly help makes it automatic if posisble
 
Upvote 0
If you need a help in importing the h2h tables, well I can do that; if you need to analyze the results before doing anything else then that 's beyond my availability.

As the former task is concerned, replace the whole block from Set tObj = H2H.FindElementsById("js-mutual-table") up to End Sub with this new block:
Code:
    Set tObj = H2H.FindElementsById("js-mutual-table")
    If tObj.Count > 0 Then
    Debug.Print I & " out of " & mColl.Count
        Set liColl = H2H.FindElementsByCss("li[class='list-breadcrumb__item']")
        Debug.Print ">H2H for: " & liColl(liColl.Count).Text
        Debug.Print ">Listed: " & tObj(1).FindElementsByTag("tbody").Count
'        Stop
        'Import h2h table:
        Selection.Cells(100000, 1).End(xlUp).Offset(2, 0).Value = ">H2H for: " & liColl(liColl.Count).Text
        tObj(1).AsTable.ToExcel Selection.Cells(100000, 1).End(xlUp).Offset(1, 0)
        '
    Else
        Debug.Print ">No h2h for: " & Replace(mLink, "https://www.betexplorer.com", "", , , vbTextCompare)
    End If
    DoEvents
Next I
MsgBox ("H2H imported")
End Sub
Please remember that the cell selected when Sub GetH2H get started determine the date taken into consideration:
-if the selection is row1 and selected cell contains a date than that date will be used for quering Betexplorer
-if not a date or not row1 then "today" will be used

The h2h info will be imported in the selected column, from the first free row
 
Upvote 0

Forum statistics

Threads
1,223,882
Messages
6,175,164
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top