Unable to download html data from specific site

ozex

New Member
Joined
Mar 27, 2024
Messages
7
Office Version
  1. 365
Platform
  1. Windows
Despite lots of research and testing re scraping I'm not able to download data from the target site, Gold Reserves by Country 2021 | World Gold Council
The code is succesful with other sites which are included in the code for testing purposes only.
The eventual aim is not to download all data, but 5 trs namely, USA, GBR, RUS, CHN and POL and 1 td namely, 'Gold Reserves Tonnes'. I guess that can be done with something like element names but I'm stuck at the start. Maybe the issue is a simple one of parent / child but I am not able to resolve and would be grateful of any help to move on.

VBA Code:
Sub Get_Gold_Data()

    Dim oDom As Object
    Dim sURL As String
    Dim oTable As Object, tbl As Object, oBody As Object, bod As Object, oTR As Object, tr As Object, oTD As Object, td As Object
    Dim r As Long, c As Long

    'target site fails using TagName and ClassName identifiers
    sURL = "https://www.gold.org/goldhub/data/gold-reserves-by-country"

    'fails using generic name table as there is no object named table
    '    sURL = "https://www.barchart.com/options/price-history/daily-prices?symbol=msft&strikePrice=155.00&symbolType=C&expirationDate=2019-12-20"
    
    'ok
    '    sURL = "https://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_winners"
    
    'ok
    'sURL = "http://www.bom.gov.au/products/IDN60701/IDN60701.95929.shtml"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", sURL, False
        .send
        If .readyState = 4 And .Status = 200 Then
            Set oDom = CreateObject("htmlfile")
            oDom.body.innerHTML = .responseText
        Else
            MsgBox "Ready state: " & .readyState & vbLf & "HTTP request status: " & .Status, vbExclamation, "Error"
            Exit Sub
        End If
    End With

    r = 1
    Set oTable = oDom.getElementsByTagName("table") 'generic name
    '    Set oTable = oDom.getElementsByClassName("table table-sm table-hover Qr4mW7qUIzc8plircFzmHg==") 'target site table name
    For Each tbl In oTable                       'target site fails here - oTable has been created - tbl is nothing
        Set oBody = tbl.getElementsByTagName("tbody")
        '        Set oBody = oDom.getElementsByTagName("tbody")'other testing
        For Each bod In oBody
            Set oTR = bod.getElementsByTagName("tr")
            '            Set oTR = oDom.getElementsByTagName("tr")'other testing
            For Each tr In oTR
                Set oTD = tr.getElementsByTagName("td")
                c = 1
                For Each td In oTD
                    Sheet1.Cells(r, c).Value = td.innerText
                    c = c + 1
                Next td
                r = r + 1
            Next tr
        Next bod
    Next tbl

    ''tidy up
    Set oTable = Nothing: Set tbl = Nothing: Set oBody = Nothing: Set bod = Nothing
    Set oTR = Nothing: Set oTD = Nothing: Set tr = Nothing: Set td = Nothing

End Sub
 

Excel Facts

What is the last column in Excel?
Excel columns run from A to Z, AA to AZ, AAA to XFD. The last column is XFD.
If I am not mistaken, though I hope I am, I think I have resolved the issue. Although the target WGC site does not alert you, they appear not to permit site interrogation using vba.
In the belief there was a parent / child issue I continued my research and came across a solution to another problem written long ago by Tim Williams in VBA get parent node of element
I adapted the snippet and ran it on the sites listed in my post. All sites,,,, excepting the target WGC site,,, returned a result using any TagName. I can only conclude there must somehow be a restriction on the WGC site.

I sincerely apologise to posters who have read my post and are perhaps trying to find a solution. Looking on the bright side, I have learned a lot in the many hours of research and will now include Tim Williams' snippet in any future HTML work that I do and hope this experience will help others too.

VBA Code:
Dim allLinks As Object, a As Object
Set allLinks = oDom.getElementsByTagName("tbody") 'or any other available tag
For Each a In allLinks
    Debug.Print a.innerText
Next a
 
Upvote 0
If I am not mistaken, though I hope I am, I think I have resolved the issue. Although the target WGC site does not alert you, they appear not to permit site interrogation using vba.
You are mistaken. By looking at the network tab in the developer tools, I got this URL:
https://fsapi.gold.org/api/cbd/v11/charts/getPage?page=snapshot&periodicity=QTD_FULL

If you go to that URL using a browser, you'll get a JSON object. Please browse through it and tell me which nodes you are trying to get. I suspect it's something inside
chardata > countries

Once you identify the nodes, all you have to do is parse the JSON into a dictionary, you can use the VBA-JSON library on github for that. But you have to get the JSON first, for that, you need to make a "Get" HTTP call. The whole thing will look like this:
> identify the right nodes using your web browser
> get json from vba using http call
> parse json into dictionary using vba-json library
> if you wanted something inside CHN, you'd do something like for each item in json("chardata")("countries") if "CHN" then...
 
Upvote 0
Solution
Power Query can do it too, but I do not have it. According to another user here, you have to specify an agent and an accept header to get the data from the json. It should be simpler than the VBA code I would supply, but less control, I guess.
 
Upvote 0
Thank you for replying so quickly. I am pleased I was wrong, it will make life easier.

I first came across json when downloading from the LBMA site (also gold) and worked out a solution to parsing one. With that possibilty in mind I also used Network on the WGC site but could not see any references to a json though I checked several times. How did you find the link?
Incidently at the same time I was checking WGC I used Network to check my existing links to LBMA (who use two jsons, am and pm for their daily price 'fixes') and though same data, I found different indices to the ones I have been using for some time.

Back to my post, I will continue to work on it now I know data is available. Thank you again for your kind reply.
 
Upvote 0
I first came across json when downloading from the LBMA site (also gold) and worked out a solution to parsing one. With that possibilty in mind I also used Network on the WGC site but could not see any references to a json though I checked several times. How did you find the link?
I just pasted the URL from your VBA code into my web browser. There was a table, so I figured it was your data. Then, I went into Dev Tools, clicked on the Network tab, and reloaded the website to see what popped up. After the reload, I sorted the list by type and searched for HTML and JSON files. By going through them one by one, I found the source of the data. If I had come across an HTML file with the table I needed instead of a JSON file, I would've used some query selector to extract the data, similar to what you were attempting in your first post.
 
Upvote 0
Hello Edgar thank you for that info too. Haha you will find a few more listed enquiries in Network now I've got the parsing of the WGC json sorted. No need to use outside resources for what I needed. When I used the link you provided I noticed "pretty-print" in the top left which delimited the whole object and provided the first clue. I found that spiltiing the default json with "}," left any value at the end of each row. Delimited that way it's over 810k rows long but luckily what I required was in a band of only 600 rows starting around 2100. From there it was easy and time to complete the whole query and write what I need to sheet takes between 0.086 and 0. 094 seconds so I'm very happy with that. I've tried to make it as future proof as possible accepting that numbers and therefore orders of country change too.
Always learning and out of curiosity I counted the number of occurences of required data in the json as follows US 8133.46 = 232, UK 310.29 = 108, RU 2332.74 = 9, CN 2235.39 = 3 and PL 358.69 = 3.
Thank you for your great discovery of the json link and kindness in responding.
 
Upvote 0
If doing string manipulation yourself gives you an acceptable result, then fantastic.
I'm glad I was able to help, good luck.
 
Upvote 0

Forum statistics

Threads
1,225,734
Messages
6,186,711
Members
453,369
Latest member
positivemind

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top