Pull Data From Website

mikeypsooner

New Member
Joined
May 16, 2017
Messages
33
I would like to pull the data from the tables on the webpage https://finance.yahoo.com/quote/VTI/holdings?p=VTI

Looking to get the Overall Portfolio Composition (%), Sector Weightings (%), Equity Holdings, Bond Ratings, Top 10 Holdings (14.91% of Total Assets), and Fund Overview in tables similar the one below.

Not to familiar with
getElementsByClassName, getElementsByTagName, etc. I general know how to locate the table information but not sure how to write the code to get getElementsByClassName, getElementsByTagName, etc. When I look at the HTML code I am not sure which one to grab for my specific data to get to a table. Maybe some useful vba and html documents to look at?

Any help is much appreciated.


Option Explicit

Sub GetStockData()

Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLRows As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim wksDest As Worksheet
Dim r As Long

Set wksDest = Sheet2
wksDest.Cells.Clear

Application.ScreenUpdating = False

IE.navigate "https://finance.yahoo.com/quote/GOOGL?p=GOOGL"
IE.Visible = False

With IE
Do While .Busy Or .readyState <> READYSTATE_COMPLETE
DoEvents
Loop
End With

Set HTMLDoc = IE.document

With HTMLDoc.getElementsByClassName("D(ib) Fw(200) Mend(20px)")(0)
wksDest.Range("A1").Value = .Children(0).innerText
wksDest.Range("B1").Value = .Children(1).innerText
End With

Set HTMLRows = HTMLDoc.getElementsByClassName("D(ib) W(1/2) Bxz(bb)")(0).getElementsByTagName("tr")

r = 3
For Each HTMLRow In HTMLRows
wksDest.Cells(r, "A").Value = HTMLRow.Cells(0).innerText
wksDest.Cells(r, "B").Value = HTMLRow.Cells(1).innerText
r = r + 1
Next HTMLRow

Set HTMLRows = HTMLDoc.getElementsByClassName("D(ib) W(1/2) Bxz(bb)")(1).getElementsByTagName("tr")

r = 3
For Each HTMLRow In HTMLRows
wksDest.Cells(r, "D").Value = HTMLRow.Cells(0).innerText
wksDest.Cells(r, "E").Value = HTMLRow.Cells(1).innerText
r = r + 1
Next HTMLRow

Sheet2.Activate

Application.ScreenUpdating = True

MsgBox "Completed...", vbInformation

Set IE = Nothing
Set HTMLDoc = Nothing
Set HTMLRows = Nothing
Set HTMLRow = Nothing
Set wksDest = Nothing

End Sub
 

Excel Facts

Whats the difference between CONCAT and CONCATENATE?
The newer CONCAT function can reference a range of cells. =CONCATENATE(A1,A2,A3,A4,A5) becomes =CONCAT(A1:A5)
Try...

Code:
[COLOR=darkblue]Option[/COLOR] [COLOR=darkblue]Explicit[/COLOR]

[COLOR=darkblue]Sub[/COLOR] GetFundOverview()

    [COLOR=darkblue]Dim[/COLOR] IE [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]New[/COLOR] SHDocVw.InternetExplorer
    [COLOR=darkblue]Dim[/COLOR] HTMLDoc [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]New[/COLOR] MSHTML.HTMLDocument
    [COLOR=darkblue]Dim[/COLOR] HTMLEles [COLOR=darkblue]As[/COLOR] MSHTML.IHTMLElementCollection
    [COLOR=darkblue]Dim[/COLOR] HTMLEle [COLOR=darkblue]As[/COLOR] MSHTML.IHTMLElement
    [COLOR=darkblue]Dim[/COLOR] wksDest [COLOR=darkblue]As[/COLOR] Worksheet
    [COLOR=darkblue]Dim[/COLOR] r [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    
    [COLOR=darkblue]Set[/COLOR] wksDest = Sheet2
    wksDest.Cells.Clear
    
    Application.ScreenUpdating = [COLOR=darkblue]False[/COLOR]
    
    IE.navigate "https://finance.yahoo.com/quote/VTI/profile?p=VTI"
    IE.Visible = [COLOR=darkblue]False[/COLOR]
    
    [COLOR=darkblue]With[/COLOR] IE
        [COLOR=darkblue]Do[/COLOR] [COLOR=darkblue]While[/COLOR] .Busy [COLOR=darkblue]Or[/COLOR] .readyState <> READYSTATE_COMPLETE
            DoEvents
        [COLOR=darkblue]Loop[/COLOR]
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]With[/COLOR]
    
    [COLOR=darkblue]Set[/COLOR] HTMLDoc = IE.document
    
    [COLOR=darkblue]Set[/COLOR] HTMLEles = HTMLDoc.getElementsByClassName("Mb(25px)")
    
    [COLOR=green]'Fund Overview header[/COLOR]
    wksDest.Cells(1, "A").Value = HTMLEles(0).getElementsByTagName("h3")(0).innerText
    
    [COLOR=green]'Fund Overview table[/COLOR]
    r = 2
    [COLOR=darkblue]For[/COLOR] [COLOR=darkblue]Each[/COLOR] HTMLEle [COLOR=darkblue]In[/COLOR] HTMLEles(0).Children(1).Children
        Cells(r, "A").Value = HTMLEle.Children(0).innerText
        Cells(r, "B").Value = HTMLEle.Children(1).innerText
        r = r + 1
    [COLOR=darkblue]Next[/COLOR]
    
    [COLOR=green]'Fund Operations header[/COLOR]
    wksDest.Cells(1, "D").Value = HTMLEles(1).getElementsByTagName("h3")(0).innerText
    
    [COLOR=green]'Fund Operations table[/COLOR]
    r = 2
    [COLOR=darkblue]For[/COLOR] [COLOR=darkblue]Each[/COLOR] HTMLEle [COLOR=darkblue]In[/COLOR] HTMLEles(1).Children(1).Children
        Cells(r, "D").Value = HTMLEle.Children(0).innerText
        Cells(r, "E").Value = HTMLEle.Children(1).innerText
        Cells(r, "F").Value = HTMLEle.Children(2).innerText
        r = r + 1
    [COLOR=darkblue]Next[/COLOR]
    
    Sheet2.Activate
    
    Application.ScreenUpdating = [COLOR=darkblue]True[/COLOR]
    
    MsgBox "Completed...", vbInformation
    
    [COLOR=darkblue]Set[/COLOR] IE = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] HTMLDoc = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] HTMLEles = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] HTMLEle = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] wksDest = [COLOR=darkblue]Nothing[/COLOR]
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Sub[/COLOR]

Hope this helps!
 
Upvote 0
Hey Domenic,

Thanks I will test it out. Just out of curiousity what does the term "Children" represent? Also how do you know when to put the "(0)" at the end of getElementsByTagName
 
Upvote 0
First, here's a revised copy of the code in which I made some amendments (note that I also change the name of the sub from GetFundOver to GetProfile, since it pulls the Funds Operations table as well as Fund Overview)...

Code:
[COLOR=darkblue]Option[/COLOR] [COLOR=darkblue]Explicit[/COLOR]

[COLOR=darkblue]Sub[/COLOR] GetProfile()

    [COLOR=darkblue]Dim[/COLOR] IE [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]New[/COLOR] SHDocVw.InternetExplorer
    [COLOR=darkblue]Dim[/COLOR] HTMLDoc [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]New[/COLOR] MSHTML.HTMLDocument
    [COLOR=darkblue]Dim[/COLOR] HTMLEles [COLOR=darkblue]As[/COLOR] MSHTML.IHTMLElementCollection
    [COLOR=darkblue]Dim[/COLOR] HTMLEle [COLOR=darkblue]As[/COLOR] MSHTML.IHTMLElement
    [COLOR=darkblue]Dim[/COLOR] wksDest [COLOR=darkblue]As[/COLOR] Worksheet
    [COLOR=darkblue]Dim[/COLOR] r [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    
    [COLOR=darkblue]Set[/COLOR] wksDest = Sheet2
    wksDest.Cells.Clear
    
    Application.ScreenUpdating = [COLOR=darkblue]False[/COLOR]
    
    IE.navigate "https://finance.yahoo.com/quote/VTI/profile?p=VTI"
    IE.Visible = [COLOR=darkblue]False[/COLOR]
    
    [COLOR=darkblue]With[/COLOR] IE
        [COLOR=darkblue]Do[/COLOR] [COLOR=darkblue]While[/COLOR] .Busy [COLOR=darkblue]Or[/COLOR] .readyState <> READYSTATE_COMPLETE
            DoEvents
        [COLOR=darkblue]Loop[/COLOR]
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]With[/COLOR]
    
    [COLOR=darkblue]Set[/COLOR] HTMLDoc = IE.document
    
    [COLOR=darkblue]Set[/COLOR] HTMLEles = HTMLDoc.getElementsByClassName("Mb(25px)")
    
    [COLOR=green]'Fund Overview header[/COLOR]
    wksDest.Cells(1, "A").Value = HTMLEles(0).getElementsByTagName("h3")(0).innerText
    
    [COLOR=green]'Fund Overview table[/COLOR]
    r = 2
    [COLOR=darkblue]For[/COLOR] [COLOR=darkblue]Each[/COLOR] HTMLEle [COLOR=darkblue]In[/COLOR] HTMLEles(0).Children(1).Children
        wksDest.Cells(r, "A").Value = HTMLEle.Children(0).innerText
        wksDest.Cells(r, "B").Value = HTMLEle.Children(1).innerText
        r = r + 1
    [COLOR=darkblue]Next[/COLOR]
    
    [COLOR=green]'Fund Operations header[/COLOR]
    wksDest.Cells(1, "D").Value = HTMLEles(1).getElementsByTagName("h3")(0).innerText
    
    [COLOR=green]'Fund Operations table[/COLOR]
    r = 2
    [COLOR=darkblue]For[/COLOR] [COLOR=darkblue]Each[/COLOR] HTMLEle [COLOR=darkblue]In[/COLOR] HTMLEles(1).Children(1).Children
        wksDest.Cells(r, "D").Value = HTMLEle.Children(0).innerText
        wksDest.Cells(r, "E").Value = HTMLEle.Children(1).innerText
        wksDest.Cells(r, "F").Value = HTMLEle.Children(2).innerText
        r = r + 1
    [COLOR=darkblue]Next[/COLOR]
    
    wksDest.Activate
    
    Application.ScreenUpdating = [COLOR=darkblue]True[/COLOR]
    
    MsgBox "Completed...", vbInformation
    
    [COLOR=darkblue]Set[/COLOR] IE = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] HTMLDoc = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] HTMLEles = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] HTMLEle = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] wksDest = [COLOR=darkblue]Nothing[/COLOR]
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Sub[/COLOR]

Secondly, here's the code to get data from your first link...

Code:
[COLOR=darkblue]Option[/COLOR] [COLOR=darkblue]Explicit[/COLOR]

[COLOR=darkblue]Sub[/COLOR] GetHoldings()

    [COLOR=darkblue]Dim[/COLOR] IE [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]New[/COLOR] SHDocVw.InternetExplorer
    [COLOR=darkblue]Dim[/COLOR] HTMLDoc [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]New[/COLOR] MSHTML.HTMLDocument
    [COLOR=darkblue]Dim[/COLOR] HTMLEles [COLOR=darkblue]As[/COLOR] MSHTML.IHTMLElementCollection
    [COLOR=darkblue]Dim[/COLOR] HTMLEle [COLOR=darkblue]As[/COLOR] MSHTML.IHTMLElement
    [COLOR=darkblue]Dim[/COLOR] wksDest [COLOR=darkblue]As[/COLOR] Worksheet
    [COLOR=darkblue]Dim[/COLOR] r [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] c [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] i [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    [COLOR=darkblue]Dim[/COLOR] j [COLOR=darkblue]As[/COLOR] [COLOR=darkblue]Long[/COLOR]
    
    [COLOR=darkblue]Set[/COLOR] wksDest = Sheet2
    wksDest.Cells.Clear
    
    Application.ScreenUpdating = [COLOR=darkblue]False[/COLOR]
    
    IE.navigate "https://finance.yahoo.com/quote/VTI/holdings?p=VTI"
    IE.Visible = [COLOR=darkblue]False[/COLOR]
    
    [COLOR=darkblue]With[/COLOR] IE
        [COLOR=darkblue]Do[/COLOR] [COLOR=darkblue]While[/COLOR] .Busy [COLOR=darkblue]Or[/COLOR] .readyState <> READYSTATE_COMPLETE
            DoEvents
        [COLOR=darkblue]Loop[/COLOR]
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]With[/COLOR]
    
    [COLOR=darkblue]Set[/COLOR] HTMLDoc = IE.document
    
    [COLOR=darkblue]Set[/COLOR] HTMLEles = HTMLDoc.getElementsByClassName("Mb(25px)")
    
    r = 1
    c = 1
    [COLOR=darkblue]For[/COLOR] i = 0 [COLOR=darkblue]To[/COLOR] HTMLEles.Length - 1
        wksDest.Cells(r, c).Value = HTMLEles(i).getElementsByTagName("h3")(0).innerText
        r = r + 1
        [COLOR=darkblue]For[/COLOR] j = 0 [COLOR=darkblue]To[/COLOR] HTMLEles(i).Children(1).Children.Length - 1
            wksDest.Cells(r, c).Value = HTMLEles(i).Children(1).Children(j).Children(0).innerText
            wksDest.Cells(r, c + 1).Value = HTMLEles(i).Children(1).Children(j).Children(IIf(i = 1, 2, 1)).innerText
            r = r + 1
        [COLOR=darkblue]Next[/COLOR] j
        [COLOR=darkblue]If[/COLOR] i Mod 2 = 1 [COLOR=darkblue]Then[/COLOR]
            r = 1
            c = c + 3
        [COLOR=darkblue]Else[/COLOR]
            r = r + 2
        [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]If[/COLOR]
    [COLOR=darkblue]Next[/COLOR] i
    
    [COLOR=darkblue]Set[/COLOR] HTMLEle = HTMLDoc.getElementsByClassName("Mb(20px)")(1)
    
    [COLOR=darkblue]With[/COLOR] wksDest.UsedRange
        r = .Rows.Count + .Rows(1).Row + 2
    [COLOR=darkblue]End[/COLOR] [COLOR=darkblue]With[/COLOR]
    
    wksDest.Cells(r, "A").Value = HTMLEle.getElementsByTagName("h3")(0).innerText
    
    r = r + 1
    [COLOR=darkblue]For[/COLOR] i = 0 [COLOR=darkblue]To[/COLOR] HTMLEle.getElementsByTagName("tr").Length - 1
        [COLOR=darkblue]For[/COLOR] j = 0 [COLOR=darkblue]To[/COLOR] HTMLEle.getElementsByTagName("tr")(i).Cells.Length - 1
            wksDest.Cells(r, j + 1).Value = HTMLEle.getElementsByTagName("tr")(i).Cells(j).innerText
        [COLOR=darkblue]Next[/COLOR] j
        r = r + 1
    [COLOR=darkblue]Next[/COLOR] i
    
    wksDest.Activate
    
    Application.ScreenUpdating = [COLOR=darkblue]True[/COLOR]
    
    MsgBox "Completed...", vbInformation
    
    [COLOR=darkblue]Set[/COLOR] IE = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] HTMLDoc = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] HTMLEles = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] HTMLEle = [COLOR=darkblue]Nothing[/COLOR]
    [COLOR=darkblue]Set[/COLOR] wksDest = [COLOR=darkblue]Nothing[/COLOR]
    
[COLOR=darkblue]End[/COLOR] [COLOR=darkblue]Sub[/COLOR]

Note that getElementsByTagName returns a collection of elements, where the indexing starts at 0. The clue is that "Elements" ends with an "s". Whereas getElementById returns a single element. Again, here the clue is that "Element" doesn't end with an "s".
 
Upvote 0
How come when I enter the code below is doesn't grab the Fund Summary Header to the far right on the web page. Trying to understand it.

Code:
    'Fund Summary header
    wksDest.Cells(1, "H").Value = HTMLEles(2).getElementsByTagName("h3")(0).innerText
 
Upvote 0
I don't know why Fund Overview and Fund Summary get excluded from the collection retrieved by that class. At first I thought Internet Explorer might need more time to load the content. So I added a few lines of code to pause the macro for a number of seconds (ie. 30 seconds) just to make sure that all of the content would be loaded. But that didn't help. Then I tried setting the Visible property to True, just in case that would make a difference. But it didn't help either.
 
Upvote 0
Instead of loops can one pull the data by right clicking on element and coping the "Copy Selector", "Copy XPath", or "Copy element". The examples are below. I just selected Fund Summary Right Clicked and hit Inspect. Then I right clicked on new screen where the html information is and right click under copy. Examples are below. Can I grab them name via anyone of these with 1 line of code instead of looping?

"Copy Selector" - #Col2-2-QuoteModule-Proxy > div > div > h3 > span

"Copy XPath" - //*[@id="Col2-2-QuoteModule-Proxy"]/div/div/h3/span

"Copy element" - <span>Fund Summary</span>
 
Upvote 0
those html parsing api's are a mystery to me... i think it is much easier to just parse the html from a string using functions like InStr and Mid

the full html is at here I believe

Code:
HTMLDoc.outerHTML

then you build a loop to go through a table and you use string functions

if you study the html you can identify unique strings that precede the values you want to scrape and locate the index position and then if you add the length of what you search for you will be at the index of a value. If you want me to show you... then paste the html source code here and include it inside code tags.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,164
Messages
6,170,444
Members
452,326
Latest member
johnshaji

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top