Extracting data from website HTML using VBA

mattadams84

Board Regular
Joined
Oct 30, 2016
Messages
54
Hi all,

I have literally no idea how to do this, but i have seen that it is possible. I am looking for a VBA script that extracts some data (in the

From the following site i wish to extract two different pieces of data:

https://www.betexplorer.com/soccer/south-korea/k-league-1/stats/

I would like to extract Matches Played and Matches Remaining
I would like to extract Home Goals and Away Goals (per match)

In the HTML code i can see the data so i guess it must be possible to grab the data somehow.

Could anyone help me?
 
Last edited:

Excel Facts

How to create a cell-sized chart?
Tiny charts, called Sparklines, were added to Excel 2010. Look for Sparklines on the Insert tab.
Here is the HTML code from the page that contains the data :

HTML:
<div class="box-overflow"><div class="box-overflow__in"><table class="table-main leaguestats"><tr><th>League Progress</th><th>Total</th><th>%</th></tr><tr><td>Matches played</td><td>198</td><td>100 %</td></tr><tr><td>Matches remaining</td><td>0</td><td>0 %</td></tr><tr><th>Outcomes</th><th>Total</th><th>%</th></tr><tr><td>Home team wins</td><td>78</td><td>39 %</td></tr><tr><td>Draws</td><td>55</td><td>28 %</td></tr><tr><td>Away team wins</td><td>65</td><td>33 %</td></tr><tr><th>Goals</th><th>Total</th><th>Per match</th></tr><tr><td>Goals scored</td><td>551</td><td>2.78</td></tr><tr><td>Home goals</td><td>287</td><td>1.45</td></tr><tr><td>Away goals</td><td>264</td><td>1.33</td></tr><tr><th>Over/Under 2.5 stats</th><th>Total</th><th>%</th></tr><tr><td>Over 2.5</td><td>108</td><td>55 %</td></tr><tr><td>Under 2.5</td><td>90</td><td>45 %</td></tr></table></div></div>
 
Last edited:
Upvote 0
The following macro will retrieve the stats from the specified URL, and place them in a newly created worksheet within the active workbook.

Note that the code uses early binding, and that you will need to set a reference (Visual Basic Editor >> Tools >> References) to the following libraries...

Code:
1) Microsoft XML, v6.0
2) Microsoft HTML Object Library

Here's the code, which should be copied/pasted into a regular module (Visual Basic Editor >> Insert >> Module)...

Code:
Option Explicit

Sub GetSoccerStats()


    'Set a reference (VBE > Tools > References) to the following libraries:
    '   1) Microsoft XML, v6.0
    '   2) Microsoft HTML Object Library
    
    Dim xmlReq As New MSXML2.XMLHTTP60
    Dim objDoc As New MSHTML.HTMLDocument
    Dim objTable As MSHTML.htmlTable
    Dim objTableRow As MSHTML.htmlTableRow
    Dim strURL As String
    Dim strResp As String
    Dim strText As String
    Dim rw As Long
    
    strURL = "https://www.betexplorer.com/soccer/south-korea/k-league-1/stats/"
    
    With xmlReq
        .Open "GET", strURL, False
        .send
        If .Status <> 200 Then
            MsgBox "Error " & .Status & ":  " & .statusText
            Exit Sub
        End If
        strResp = .responseText
    End With
    
    Worksheets.Add
    
    objDoc.body.innerHTML = strResp
    
    Set objTable = objDoc.getElementsByClassName("table-main leaguestats")(0)
    
    If Not objTable Is Nothing Then
        rw = 1
        For Each objTableRow In objTable.Rows
            strText = objTableRow.Cells(0).innerText
            Select Case strText
                Case "Matches played", "Matches remaining", "Home goals", "Away goals"
                    Cells(rw, "a").Value = objTableRow.Cells(0).innerText
                    Cells(rw, "b").Value = objTableRow.Cells(1).innerText
                    Cells(rw, "c").Value = objTableRow.Cells(2).innerText
                    rw = rw + 1
            End Select
        Next objTableRow
        Columns("a").AutoFit
    End If
    
    Set xmlReq = Nothing
    Set objDoc = Nothing
    Set objTable = Nothing
    Set objTableRow = Nothing


End Sub

Hope this helps!
 
Upvote 0
This works perfectly, thank you very much!

I have a request: I actually want to run this macro for many different URLS... I have a worksheet already created that has a list of Football Leagues (in the rows), the columns then hold the data.

You can find the file here : https://www.dropbox.com/s/77sol24sty75w5z/Avg Goals.xlsm?dl=0

This is a file where i will add leagues as i go. Is it possible to adapt your code so that it can populate the columns in this sheet? I do not need to pull in the names of the data (matches remaining, home goals, away goals etc) i only need the figures. They would have to populate the columns as per the sheet. As you can see there are many leagues so it would need to loop through each row and then use the corresponding URL

You will notice that there is a column that contains the word CURRENT. This is to indicate that it should use the URL in the Current URL column. If I change that column to LAST i would like it to use the URL in the Last URL column.

For each league it will be different if use CURRENT or LAST.

Any help is greatly appreciated. Many thanks again for your help!!!!

Kind regards
 
Upvote 0
You're very welcome, glad I could help. For your next question, through, please start a new thread and hopefully someone will be able to provide you with the help you need.

Cheers!
 
Upvote 0
Hi Domenic,

I have a very similar situation with the exception of my HTML(file:///C:/Users/HOME/Dropbox/DAILY_SALES_REPORTS/DAY_END.HTML) has anywhere from 28-35 separate tables daily however the file name, location, table headers and structure are the same everyday. I just need to get it into a designated worksheet(TABLES) with a macro where my look ups in a separate worksheet can go to work and disseminate the information I am after. There are no specific rows or columns in the worksheet where it needs to go so I would say if it started at A1 that would be great. I will write a macro to delete all data on workbook open followed by an import of the data.

Any help would be greatly appreciated.
Thank You.
 
Upvote 0
Hi,

I have got basically the same issue, but just adapting the above code will not work. This is table I am trying to retrieve data from

<table id="errorinjections" class="display datatable_custom dataTable no-footer" role="grid" aria-describedby="errorinjections_info" style="width: 1185px;">
<thead>
<tr role="row"><th class="sorting_asc" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="0" aria-sort="ascending" aria-label="Warehouse Id: activate to sort column descending" style="width: 100px;">Warehouse Id</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="1" aria-label="Event date utc: activate to sort column ascending" style="width: 143px;">Event date utc</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="2" aria-label="Associate login: activate to sort column ascending" style="width: 109px;">Associate login</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="3" aria-label="Fcsku: activate to sort column ascending" style="width: 92px;">Fcsku</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="4" aria-label="Fnsku: activate to sort column ascending" style="width: 45px;">Fnsku</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="5" aria-label="Tool: activate to sort column ascending" style="width: 128px;">Tool</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="6" aria-label="Location Id: activate to sort column ascending" style="width: 86px;">Location Id</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="7" aria-label="ProcessPath: activate to sort column ascending" style="width: 89px;">ProcessPath</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="8" aria-label="Symptom: activate to sort column ascending" style="width: 101px;">Symptom</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="9" aria-label="Symptom Detail: activate to sort column ascending" style="width: 116px;">Symptom Detail</th><th class="sorting" tabindex="0" aria-controls="errorinjections" rowspan="1" colspan="1" data-column-index="10" aria-label="Quantity: activate to sort column ascending" style="width: 66px;">Quantity</th></tr>
</thead>


<tbody><tr role="row" class="odd"><td class="sorting_1">MAN1</td><td>2019-07-28 00:01:20</td><td>ifeanyip</td><td>ZZX2M9OE4G</td><td>None</td><td>AFTWatsonService</td><td>tsX00x921jy</td><td>Stow</td><td>Stow Shortage</td><td>aft-ps</td><td>1.0</td></tr><tr role="row" class="even"><td class="sorting_1">MAN1</td><td>2019-07-28 00:05:50</td><td>chdavidn</td><td>X000J91BGH</td><td>None</td><td>AFTWatsonService</td><td>tsX00k1fisc</td><td>Stow</td><td>Stow Shortage</td><td>aft-ps</td><td>1.0</td></tr><tr role="row" class="odd"><td class="sorting_1">MAN1</td><td>2019-07-28 00:11:55</td><td>omotakee</td><td>ZZX7ETYIK4</td><td>None</td><td>AFTWatsonService</td><td>tsX00r5g42p</td><td>Stow</td><td>Stow Shortage</td><td>aft-ps</td><td>1.0</td></tr><tr role="row" class="even"><td class="sorting_1">MAN1</td><td>2019-07-28 00:13:40</td><td>maltaib</td><td>X000ZW3CMJ</td><td>None</td><td>AFTWatsonService</td><td>tsX00nb2q6w</td><td>Stow</td><td>Stow Overage</td><td>aft-ps</td><td>1.0</td></tr><tr role="row" class="odd"><td class="sorting_1">MAN1</td><td>2019-07-28 00:15:23</td><td>mmutasim</td><td>X000W042FZ</td><td>None</td><td>AFTWatsonService</td><td>tsX007j8t2g</td><td>Stow</td><td>Stow Overage</td><td>aft-ps</td><td>1.0</td></tr><tr role="row" class="even"><td class="sorting_1">MAN1</td><td>2019-07-28 00:26:03</td><td>withensn</td><td>X00109JEA5</td><td>None</td><td>AFTWatsonService</td><td>tsX00zef0ae</td><td>Stow</td><td>Stow Shortage</td><td>aft-ps</td><td>1.0</td></tr><tr role="row" class="odd"><td class="sorting_1">MAN1</td><td>2019-07-28 00:35:07</td><td>asgohabe</td><td>B07CWT52XZ</td><td>None</td><td>AFTWatsonService</td><td>tsX00emn1w3</td><td>Stow</td><td>Stow Overage</td><td>aft-ps</td><td>1.0</td></tr><tr role="row" class="even"><td class="sorting_1">MAN1</td><td>2019-07-28 00:35:26</td><td>abdekara</td><td>X00109AASP</td><td>None</td><td>AFTWatsonService</td><td>tsX00nzijvm</td><td>Stow</td><td>Stow Shortage</td><td>aft-ps</td><td>3.0</td></tr><tr role="row" class="odd"><td class="sorting_1">MAN1</td><td>2019-07-28 00:36:19</td><td>jamhamad</td><td>X0010VIDAP</td><td>None</td><td>AFTWatsonService</td><td>tsX00iqavhm</td><td>Stow</td><td>Stow Shortage</td><td>aft-ps</td><td>1.0</td></tr><tr role="row" class="even"><td class="sorting_1">MAN1</td><td>2019-07-28 00:42:21</td><td>mbbereke</td><td>ZZXZND8W00</td><td>None</td><td>AFTWatsonService</td><td>SC006</td><td>Stow</td><td>Stow Shortage</td><td>aft-ps</td><td>1.0</td></tr></tbody></table>


Thank you for the help
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,177
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top