Extracting Data From Kitco.com [Using VBA]

netspeedz

New Member
Joined
Aug 11, 2011
Messages
21
I've been struggling to get the below VBA subroutine to fully extract data from a Kitco website page. I can get everything to populate a table on a spreadsheet except the last 3 columns of the two rows of data I'm extracting.

I'm thinking this is a case of 'not seeing the tree because of the forest' and need fresh eyes on the code. Would appreciate any help/insight on what I need to adjust in the code to get the last three columns of data (Change(%), Low and High). VBA code below:

VBA Code:
Sub ScrapeGoldAndSilverData()
    Dim request As Object
    Dim response As String
    Dim html As Object
    Dim metals As Object
    Dim metal As Object
    Dim targetSheet As Worksheet
    Dim rowNum As Integer
    Dim dataPoints As Object
    Dim headerTitles As Variant
    Dim metalNames As Variant
    
    ' Create XMLHTTP request
    Set request = CreateObject("MSXML2.XMLHTTP")
    
    ' URL of the webpage to scrape
    request.Open "GET", "https://www.kitco.com/price/precious-metals", False
    request.send
    
    ' Get response text
    response = request.responseText
    
    ' Create a new HTML document
    Set html = CreateObject("HTMLFile")
    html.body.innerHTML = response
    
    ' Set the target sheet
    Set targetSheet = ThisWorkbook.Sheets("External Data")
    
    ' Define column headers
    headerTitles = Array("Metal", "Date", "Time(EST)", "Bid", "Ask", "Change", "Change(%)", "Low", "High")
    
    ' Write column headers
    For colNum = LBound(headerTitles) To UBound(headerTitles)
        targetSheet.Cells(20, colNum + 1).Value = headerTitles(colNum)
    Next colNum
    
    ' Start writing data from row 21, column 1 (A)
    rowNum = 21
    
    ' Find all the metals data under New York Spot Price section
    Set metals = html.getElementsByClassName("BidAskGrid_listify__1liIU")(0).getElementsByTagName("li")
    
    ' Array of metals to filter
    metalNames = Array("Gold", "Silver")
    
    ' Loop through each metal's data
    For Each metal In metals
        On Error Resume Next ' Continue on error

        ' Write other data points
        Set dataPoints = metal.getElementsByClassName("BidAskGrid_gridifier__l1T1o")(0).Children
        
        ' Extract the metal name
        Dim currentMetal As String
        currentMetal = dataPoints(0).getElementsByTagName("a")(0).innerText
        
        ' Check if it's Gold or Silver
        If UBound(Filter(metalNames, currentMetal)) > -1 Then
            ' Write metal name
            targetSheet.Cells(rowNum, 1).Value = currentMetal
            
            ' Date, Time(EST), Bid, Ask
            targetSheet.Cells(rowNum, 2).Value = dataPoints(1).innerText
            targetSheet.Cells(rowNum, 3).Value = dataPoints(2).innerText
            targetSheet.Cells(rowNum, 4).Value = dataPoints(3).innerText
            targetSheet.Cells(rowNum, 5).Value = dataPoints(4).innerText
            
            ' Extracting Change, Change(%), Low, and High values
            Dim changeText As String
            changeText = Replace(dataPoints(5).innerText, Chr(34), "") ' Remove double quotes
            
            Dim changeParts() As String
            changeParts = Split(changeText, vbCrLf)
            
            If UBound(changeParts) >= 1 Then
                targetSheet.Cells(rowNum, 6).Value = changeParts(1) ' Change(%)
                targetSheet.Cells(rowNum, 7).Value = Replace(changeParts(0), vbCrLf, "|") ' Change
            End If
            
            Dim lowHighText As String
            lowHighText = dataPoints(6).innerText
            
            Dim lowHighParts() As String
            lowHighParts = Split(lowHighText, vbCrLf)
            
            If UBound(lowHighParts) >= 1 Then
                targetSheet.Cells(rowNum, 8).Value = lowHighParts(0) ' Low
                targetSheet.Cells(rowNum, 9).Value = lowHighParts(1) ' High
            End If
            
            ' Move to the next row
            rowNum = rowNum + 1
        End If
        On Error GoTo 0 ' Re-enable error handling
    Next metal
    
    ' Release memory
    Set request = Nothing
    Set html = Nothing
    Set targetSheet = Nothing
End Sub
 

Excel Facts

Does the VLOOKUP table have to be sorted?
No! when you are using an exact match, the VLOOKUP table can be in any order. Best-selling items at the top is actually the best.
Quite often with HTML data it's easier to just read the innerText of an element (which may contain multiple child elements), something like Dim vArray As Variant: vArray = Split(metal.innerText, vbCrLf) to get the separate column values, instead of trying to drill down and parse the HTML structure and its various elements (tags), however in your case this doesn't yield the separate column values.

Another way, again without referring to the HTML structure, is to read all the text nodes of a particular starting element, in your case each 'metal' row. The following function reads all the text nodes recursively and returns a string with each text node value separated by "|".

VBA Code:
Private Function Get_Text_Nodes(node As Object) As String
    
    Dim child As Object
    
    If node.NodeType = 3 Then 'NODE_TEXT
    
        Get_Text_Nodes = Get_Text_Nodes & node.NodeValue & "|"
    
    ElseIf CallByName(node, "HasChildNodes", VbMethod) Then
        
        'Traverse NODE_ELEMENT child nodes
        'Note: with early binding (add reference to Microsoft HTML Object Library; node As IHTMLDOMNode; child As IHTMLDOMNode), the node.HasChildNodes method
        'can be called directly instead of indirectly via the CallByName call.  See https://stackoverflow.com/questions/43662976/late-bind-an-ihtml-element
        
        For Each child In node.ChildNodes
            Get_Text_Nodes = Get_Text_Nodes & Get_Text_Nodes(child)
        Next
    End If
        
End Function

Call the above function like this to extract the separate column values for each metal:

VBA Code:
    ' Array of metals to filter
    metalNames = Array("Gold", "Silver")
    
    Dim rowTextNodesList As String, i As Long
    Dim rowCols As Variant
    Dim currentMetal As String
    
    ' Loop through each metal's data
    For Each metal In metals
                
        rowTextNodesList = Get_Text_Nodes(metal)
        Debug.Print rowTextNodesList
        rowCols = Split(rowTextNodesList, "|")
        
        currentMetal = rowCols(0)
        
        ' Check if it's Gold or Silver
        If UBound(Filter(metalNames, currentMetal)) > -1 Then

            ' Write metal name
            targetSheet.Cells(rowNum, 1).Value = currentMetal
                        
            ' Put values in sheet cells
            For i = 1 To UBound(rowCols) - 1
                targetSheet.Cells(rowNum, i + 1).Value = rowCols(i)
            Next
            
            ' Move to the next row
            rowNum = rowNum + 1
            
        End If

    Next
 
Upvote 0
Solution
Hello - appreciate the quick reply.

Just to clarify, I would replace the code that you have kindly provided with the code I already have, correct?
 
Upvote 0
My code replaces your For Each metal In metals loop and I included this part to show where it starts:

VBA Code:
    ' Array of metals to filter
    metalNames = Array("Gold", "Silver")

So replace this part of your code:
VBA Code:
    ' Array of metals to filter
    metalNames = Array("Gold", "Silver")
    
    ' Loop through each metal's data
    For Each metal In metals
:
:
:
    Next metal
with my code and insert the Get_Text_Nodes function below the End Sub.
 
Upvote 0
Sure do appreciate the clarification. I have added code as instructed and everything seems to work well. I did need to format the 'High' column as it was being extracted as a percentage.

It would be beneficial if the Debug output would clear itself at the beginning of each running of the subroutine as I wouldn't want some huge Debug contents accumulate, however, the Debug contents is cleared when exiting the spreadsheet. I've seen some posts regarding clearing the contents of Debug - some of which are elaborate not sure which one to use. Don't really want to use the 'hot key' method as I've seen that cause issues on the user interface side. If you have any recommendations/suggestions, would be interested in them.

I will test the code and then be back to mark your answer as the solution if all goes well.

In any case, thank you so much for the assistance. I can now see the tree, despite the forest : -)

Final working code is below:

VBA Code:
Sub ScrapeGoldAndSilverData()

  Dim request As Object
  Dim response As String
  Dim html As Object
  Dim metals As Object
  Dim metal As Object
  Dim targetSheet As Worksheet
  Dim rowNum As Integer
  Dim headerTitles As Variant
  Dim metalNames As Variant

  ' Create XMLHTTP request
  Set request = CreateObject("MSXML2.XMLHTTP")

  ' URL of the webpage to scrape
  request.Open "GET", "https://www.kitco.com/price/precious-metals", False
  request.send

  ' Get response text
  response = request.responseText

  ' Create a new HTML document
  Set html = CreateObject("HTMLFile")
  html.body.innerHTML = response

  ' Set the target sheet
  Set targetSheet = ThisWorkbook.Sheets("External Data")

  ' Define column headers
  headerTitles = Array("Metal", "Date", "Time(EST)", "Bid", "Ask", "Change", "Change(%)", "Low", "High")

  ' Write column headers
  For colNum = LBound(headerTitles) To UBound(headerTitles)
    targetSheet.Cells(20, colNum + 1).Value = headerTitles(colNum)
  Next colNum

  ' Start writing data from row 21, column 1 (A)
  rowNum = 21

  ' Find all the metals data under New York Spot Price section
  Set metals = html.getElementsByClassName("BidAskGrid_listify__1liIU")(0).getElementsByTagName("li")

  ' Array of metals to filter
  metalNames = Array("Gold", "Silver")

  Dim rowTextNodesList As String, i As Long
  Dim rowCols As Variant
  Dim currentMetal As String

  ' Loop through each metal's data
  For Each metal In metals

    rowTextNodesList = Get_Text_Nodes(metal)
    Debug.Print rowTextNodesList
    rowCols = Split(rowTextNodesList, "|")

    currentMetal = rowCols(0)

    ' Check if it's Gold or Silver
    If UBound(Filter(metalNames, currentMetal)) > -1 Then

      ' Write metal name
      targetSheet.Cells(rowNum, 1).Value = currentMetal

      ' Put values in sheet cells (excluding the last element, which is now handled separately)
      For i = 1 To UBound(rowCols) - 2
        targetSheet.Cells(rowNum, i + 1).Value = rowCols(i)
      Next

      ' Extract and format the 'High' value
      Dim highValue As String
      highValue = Replace(rowCols(UBound(rowCols) - 1), "%", "") ' Remove the percentage sign
      targetSheet.Cells(rowNum, 9).Value = highValue ' Write the numerical value to the 'High' column
      targetSheet.Cells(rowNum, 9).NumberFormat = "#,##0.00" ' Apply number formatting

      ' Move to the next row
      rowNum = rowNum + 1

    End If

  Next metal

  ' Release memory
  Set request = Nothing
  Set html = Nothing
  Set targetSheet = Nothing
End Sub

Private Function Get_Text_Nodes(node As Object) As String
   
    Dim child As Object
   
    If node.nodeType = 3 Then 'NODE_TEXT
   
        Get_Text_Nodes = Get_Text_Nodes & node.nodeValue & "|"
   
    ElseIf CallByName(node, "HasChildNodes", VbMethod) Then
       
        'Traverse NODE_ELEMENT child nodes
        'Note: with early binding (add reference to Microsoft HTML Object Library; node As IHTMLDOMNode; child As IHTMLDOMNode), the node.HasChildNodes method
        'can be called directly instead of indirectly via the CallByName call.  See https://stackoverflow.com/questions/43662976/late-bind-an-ihtml-element
       
        For Each child In node.childNodes
            Get_Text_Nodes = Get_Text_Nodes & Get_Text_Nodes(child)
        Next
    End If
       
End Function
 
Upvote 0
It would be beneficial if the Debug output would clear itself at the beginning of each running of the subroutine as I wouldn't want some huge Debug contents accumulate, however, the Debug contents is cleared when exiting the spreadsheet. I've seen some posts regarding clearing the contents of Debug - some of which are elaborate not sure which one to use. Don't really want to use the 'hot key' method as I've seen that cause issues on the user interface side. If you have any recommendations/suggestions, would be interested in them.
Just delete the Debug.Print line as it's only there to show you the format and value of the string returned by Get_Text_Nodes.

I did need to format the 'High' column as it was being extracted as a percentage.
The Debug output shows the High value isn't a percentage so that was probably caused by the cell's original format.
 
Upvote 0
Just delete the Debug.Print line as it's only there to show you the format and value of the string returned by Get_Text_Nodes.


The Debug output shows the High value isn't a percentage so that was probably caused by the cell's original format.
Code for your original response tested good. Marked the response as the solution.

Appreciate your input.

------------
 
Upvote 0

Forum statistics

Threads
1,223,880
Messages
6,175,153
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top