Pull Data From Website

mikeypsooner

New Member
Joined
May 16, 2017
Messages
33
I would like to pull the data from the tables on the webpage https://finance.yahoo.com/quote/VTI/holdings?p=VTI

Looking to get the Overall Portfolio Composition (%), Sector Weightings (%), Equity Holdings, Bond Ratings, Top 10 Holdings (14.91% of Total Assets), and Fund Overview in tables similar the one below.

Not to familiar with
getElementsByClassName, getElementsByTagName, etc. I general know how to locate the table information but not sure how to write the code to get getElementsByClassName, getElementsByTagName, etc. When I look at the HTML code I am not sure which one to grab for my specific data to get to a table. Maybe some useful vba and html documents to look at?

Any help is much appreciated.


Option Explicit

Sub GetStockData()

Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLRows As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim wksDest As Worksheet
Dim r As Long

Set wksDest = Sheet2
wksDest.Cells.Clear

Application.ScreenUpdating = False

IE.navigate "https://finance.yahoo.com/quote/GOOGL?p=GOOGL"
IE.Visible = False

With IE
Do While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Loop
End With

Set HTMLDoc = IE.document

With HTMLDoc.getElementsByClassName("D(ib) Fw(200) Mend(20px)")(0)
wksDest.Range("A1").Value = .Children(0).innerText
wksDest.Range("B1").Value = .Children(1).innerText
End With

Set HTMLRows = HTMLDoc.getElementsByClassName("D(ib) W(1/2) Bxz(bb)")(0).getElementsByTagName("tr")

r = 3
For Each HTMLRow In HTMLRows
wksDest.Cells(r, "A").Value = HTMLRow.Cells(0).innerText
wksDest.Cells(r, "B").Value = HTMLRow.Cells(1).innerText
r = r + 1
Next HTMLRow

Set HTMLRows = HTMLDoc.getElementsByClassName("D(ib) W(1/2) Bxz(bb)")(1).getElementsByTagName("tr")

r = 3
For Each HTMLRow In HTMLRows
wksDest.Cells(r, "D").Value = HTMLRow.Cells(0).innerText
wksDest.Cells(r, "E").Value = HTMLRow.Cells(1).innerText
r = r + 1
Next HTMLRow

Sheet2.Activate

Application.ScreenUpdating = True

MsgBox "Completed...", vbInformation

Set IE = Nothing
Set HTMLDoc = Nothing
Set HTMLRows = Nothing
Set HTMLRow = Nothing
Set wksDest = Nothing

End Sub
 
That is awesome and very fast!!!

Could you explain the IIF(switch....) and the data-reactid.... statement? Not sure I understand what it is doing.

Maybe you could add quick comments to explain the Sub titled - "Sub ParseYahooFinanceTable(html As String)" for each line?
 
Upvote 0

Excel Facts

Copy a format multiple times
Select a formatted range. Double-click the Format Painter (left side of Home tab). You can paste formatting multiple times. Esc to stop
Well IIf basically works like the If function in excel. Do you remember the images i pasted for the html source code for the part where the table starts...

This value is unique, there is only 1 instance of this string in the entire html code...

EEUQLsW.png


Do Ctrl+F on the source code and you will only find that once. That is a reliable value to use to jump right to the that first value you want to grab "Category"

Then I saw the associated value for category is right after this...


7N5sUto.png


I figured I could just use that and just edit the number and I could grab all my values but then I noticed there are 7 instances of this value in the source...

SlAQHdv.png


Without digging too deep I just decided to have my loop function two different ways alternating each loop. So that is why I declared a boolean (value that can be true or false) to use as a switch. So my loop can alternate between logic. So first I look for the beginning of the row and then on the next loop it grabs the associated value.


ipdZSoK.png


Anyways I just had to look at the html and identify unique values that would help me get to the index position of where i wanted to go... for this html you can see my parsing code was super easy but sometimes parsing can get messy depending how ugly the page is lol.

So that is what my Iif statement is doing... First loop I Prepend that span tag to the beginning and when i want to get the associated value I just do the search without the leading span tag.

I can add comments to the code if you need still, let me know.
 
Last edited:
Upvote 0
@cerfani

The WinHTTP object is very interesting

Not only is it very fast, it also has comprehensive error handling.

I'll need to take a closer look at it.

Nicely done!

Cheers!
 
Upvote 0
@cerfani

The WinHTTP object is very interesting

Not only is it very fast, it also has comprehensive error handling.

I'll need to take a closer look at it.

Nicely done!

Cheers!

thank you sir, I am not 100% sure but I assume there is a lot of overhead associated with instantiating an internetexplorer object. My assumption is it literally opens an invisible browser window and all that stuff needs to load. This just gets right to the request. :)
 
Upvote 0
thank you sir, I am not 100% sure but I assume there is a lot of overhead associated with instantiating an internetexplorer object. My assumption is it literally opens an invisible browser window and all that stuff needs to load. This just gets right to the request. :)

Yes, that's exactly right. Using Internet Explorer is very inefficient.
 
Upvote 0
How do I pull the entire HTML code similar to this

Code:
wksDest.Range("A1").Value = HTMLDoc.body.outerHTML

Just in case you're still interested in an answer to this question, HTMLDoc.body.outerHTML does in fact pull the entire HTML code. The problem is when you transfer the contents to a worksheet cell. A cell can only contain 32,767 characters. So you'll only get part of the HTML code. You can test this for yourself by adding this line...

PHP:
debug.Print instr(1, HTMLDoc.body.outerHTML, "<span data-reactid=""29"">")
 
Last edited:
Upvote 0
I am trying to get the 3rd column in the Fund Operations table but can parse it correctly. I tried pulling the first 2 columns but no luck.



Code:
Sub DoIt2()
    Dim responseText As String
    
    responseText = MakeGetRequest("https://finance.yahoo.com/quote/VTI/profile?p=VTI")




    Call Fund_Operations_Table(responseText)
End Sub
Sub Fund_Operations_Table(html As String)


    Dim index, tableIndex, removeIndex, row, col As Long
    Dim searchValue, parseValue As String
    Dim switch As Boolean
  
    row = 1
    col = 4
    switch = False
    index = 1
    tableIndex = 70


    Do
        'Searchs a specific string that is unique
        Debug.Print "data-reactid=""" & tableIndex & """>"
        searchValue = IIf(switch, "", "<span ") & "data-reactid=""" & tableIndex & """>"
        'Finds the index (i.e. position of string in the html code)
        index = InStr(index, html, searchValue)
        
        'The search value must be present or else a message box appears
        If index > 0 Then
            'This sets the index to the start of the word/phrase that is desired
            'Hint the searchValue is before the actual word that is wanted
            index = index + Len(searchValue)


            'This strips everything after category off of the html code
            'Finds the first charcter(s) after the desired word and removes the rest of the html code off
            removeIndex = InStr(index, html, "</")
            
            'Places word in the cell desired
            Cells(row, col).Value2 = Mid(html, index, removeIndex - index)
'                Debug.Print "searchvalue: " & searchValue
'                Debug.Print "index: " & index
'                Debug.Print "removeIndex: " & removeIndex
'                Debug.Print Mid(html, index, 3)
'                Debug.Print "------------"
                
            If switch Then
                row = row + 1
                col = col - 1
                tableIndex = tableIndex + 5
                switch = False
                
            Else
                col = col + 1
                tableIndex = tableIndex + 1
                switch = True
            End If
            
            
        Else
            MsgBox "Update your macro"
            Exit Sub
        End If
    Loop While tableIndex < 74
    
    Columns.AutoFit
End Sub


"><span data-reactid="70">Annual Report Expense Ratio (net)</span></span></span><span class="W(20%) D(b) Fl(start) Ta(e)" data-reactid="71">0.04%</span><span class="W(30%) D(b) Fl(start) Ta(e)" data-reactid="72">0.36%</span></div><div class="Bdbw(1px) Bdbc($screenerBorderGray) Bdbs(s) H(25px) Pt(10px)" data-reactid="73"><span class="W(50%) D(b) Fl(start) Ta(s)" data-reactid="74"><span class="Mend(5px) Whs(nw)" data-reactid="75"><span data-reactid="76">Holdings Turnover</span></span></span><span class="W(20%) D(b) Fl(start) Ta(e)" data-reactid="77">4.00%</span><span class="W(30%) D(b) Fl(start) Ta(e)" data-reactid="78">5,076.00%</span></div><div class="Bdbw(1px) Bdbc($screenerBorderGray) Bdbs(s) H(25px) Pt(10px)" data-reactid="79"><span class="W(50%) D(b) Fl(start) Ta(s)" data-reactid="80"><span class="Mend(5px) Whs(nw)" data-reactid="81"><span data-reactid="82">Total Net Assets</span></span></span><span class="W(20%) D(b) Fl(start) Ta(e)" data-reactid="83">48,848.44</span><span class="W(30%) D(b) Fl(start) Ta(e)" data-reactid="84">48,848.44</span></div></
 
Upvote 0
Not all of code posted below.

Code:
Sub DoIt2()
    Dim responseText As String
    
    responseText = MakeGetRequest("https://finance.yahoo.com/quote/VTI/profile?p=VTI")




    Call Fund_Operations_Table(responseText)
End Sub
Sub Fund_Operations_Table(html As String)


    Dim index, tableIndex, removeIndex, row, col As Long
    Dim searchValue, parseValue As String
    Dim switch As Boolean
  
    row = 1
    col = 4
    switch = False
    index = 1
    tableIndex = 70


    Do
        'Searchs a specific string that is unique
        Debug.Print "data-reactid=""" & tableIndex & """>"
        searchValue = IIf(switch, "", "<span ") & "data-reactid=""" & tableIndex & """>"
        'Finds the index (i.e. position of string in the html code)
        index = InStr(index, html, searchValue)
        
        'The search value must be present or else a message box appears
        If index > 0 Then
            'This sets the index to the start of the word/phrase that is desired
            'Hint the searchValue is before the actual word that is wanted
            index = index + Len(searchValue)


            'This strips everything after category off of the html code
            'Finds the first charcter(s) after the desired word and removes the rest of the html code off
            removeIndex = InStr(index, html, "</")
            
            'Places word in the cell desired
            Cells(row, col).Value2 = Mid(html, index, removeIndex - index)
'                Debug.Print "searchvalue: " & searchValue
'                Debug.Print "index: " & index
'                Debug.Print "removeIndex: " & removeIndex
'                Debug.Print Mid(html, index, 3)
'                Debug.Print "------------"
                
            If switch Then
                row = row + 1
                col = col - 1
                tableIndex = tableIndex + 5
                switch = False
                
            Else
                col = col + 1
                tableIndex = tableIndex + 1
                switch = True
            End If
            
            
        Else
            MsgBox "Update your macro"
            Exit Sub
        End If
    Loop While tableIndex < 74
    
    Columns.AutoFit
End Sub
 
Upvote 0
Code:
Sub ParseYahooFinanceTable(html As String)

    Dim index, tableIndex, removeIndex, row, col As Long
    Dim searchValue, parseValue As String
    Dim switch As Boolean


[COLOR=#006400]    'Initialize variables
    'I decided to Begin writing in A1... edit row and col to the number row or column you want
    'switch when declared is set to false by default but i just wanted it to be shown
    'so the source code i saw i am looking for similiar string sequences with a number changing
    'i made a long tableIndex to keep track of the value to grab... the html source code showed I should start at 29 and the last one I want is 55
    'also index is going to keep track of where i am in the html text so i can advance through the text collecting stuff as i go
    'i will start reading from character 1... you could even go past all the header stuff in the html and anything else that is in fixed positions...
    'the html will always be structured the same...InStr internally is going through characters of a character array... smaller arrays are better
    'you could default this code to start near the beginning of the table by hardcoding an approximate index instead of starting at 1... faster ;)
[/COLOR]    row = 1
    col = 1
    switch = False
    index = 1
    tableIndex = 29


[COLOR=#006400]    'so here i use a loop because i noticed a pattern that i could repeat...just look at the images i pasted in previous posts you will see it
[/COLOR]    Do
    
[COLOR=#006400]        'ok so remember the source code, i am going to alternate between searching for the beginning of the row which needs the span tag prepended
        'and then searching for the labels value which requires me to not prepend the span tag. I am doing string concatenation to build the search value
        'dynamic table index value AND to prepend either an empty string or the span tag
        'on the first iteration of this loop switch will be False and the span tag will be prepended... table index will be 29 ...
        'finding this will get you right before the "Category" label
[/COLOR]        searchValue = [COLOR=#ff0000]removed code see previous posts[/COLOR]
        
[COLOR=#006400]        'On the first iteration InStr will find where the search value is first found in the html
        'i will be at the index position that is the exact length away of what i searched for away from that first label value "Category"
[/COLOR]        index = InStr(index, html, searchValue)
        
[COLOR=#006400]        'i circled in green where index is pointing to on first iteration http://i.imgur.com/5Jq22w0.png
        'if the index is not found InStr will return 0 iirc and then you get notified to update your macro
[/COLOR]        If index > 0 Then
[COLOR=#006400]            'now that we know everything is good we incrment the index the length of what you searched for
            'You will be at the label value "Category" on the first iteration
[/COLOR]            index = index + Len(searchValue)
            
[COLOR=#006400]            'this is a visual of adding the length http://i.imgur.com/mA0lLoP.png
[/COLOR]

[COLOR=#006400]            'so after finding where the value you want starts, you must find where it ends
            'values in the table will alway end right before the start of those closing tags
            'we will search for those 2 characters that begin closing tags but not overwrite our index position
[/COLOR]            removeIndex = InStr(index, html, [COLOR=#ff0000]removed see previous posts[/COLOR]
            
[COLOR=#006400]            'you found this position now... http://i.imgur.com/MCISjUD.png
[/COLOR]            
[COLOR=#006400]            'here we will right the parsed value to the current cell, first iteration A1
            'removeIndex holds where that closing tag starts and index holds where your value starts...
            'the difference is the length of your value ;)
[/COLOR]            Cells(row, col).Value2 = Mid(html, index, removeIndex - index)


[COLOR=#006400]            'after parsing we set everything up for the next loop
            'the html code had a pattern that switched back n forth
            'table index needs to first iteration add 1 so we get to the label's associated value
            'then next iteration we add 4 to get to next row of the table, that's just what yahoo did
            'i dont know why but that's how their code works i guess
            'anyways we can just repeat this pattern and we keep jumping to the values that precede what we are looking for
            'we also need to change cells to write every loop so if you see, the first iteration adds one to column so we can write the value in column B same row
            '2nd iteration subtracts column and goes to new row, like a typewriter movement
[/COLOR]            If switch Then
                row = row + 1
                col = col - 1
                tableIndex = tableIndex + 4
                switch = False
            Else
                col = col + 1
                tableIndex = tableIndex + 1
                switch = True
            End If
        Else
            MsgBox "Update your macro"
            Exit Sub
        End If
    Loop While tableIndex < 56 [COLOR=#006400]'i saw the last value was at 56 so we stop looping after grabbing index 55[/COLOR]
    
[COLOR=#006400]    'make readable
[/COLOR]    Columns.AutoFit
    
[COLOR=#006400]    'fin :D
[/COLOR]End Sub

comments ;)

some images that visualize parsing (this would have been the first iteration of the loop)... links to images in comments

Index Found For Search Value:

5Jq22w0.png


Increment Index with length of search value:

mA0lLoP.png


Finding the removeIndex where the closing tag starts

MCISjUD.png
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,164
Messages
6,170,444
Members
452,326
Latest member
johnshaji

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top