Get data from yahoo finance

Excel Facts

Does the VLOOKUP table have to be sorted?
No! when you are using an exact match, the VLOOKUP table can be in any order. Best-selling items at the top is actually the best.
First of all, I certainly don't like parsing data from HTML pages but if there is no web service that will return proper data as JSON or XML, like Yahoo Finance recently, then there is no other way to solve the problem.

Basically, we can use offset and count query parameters to return data and combine the returned tables (after querying the HTML by using XPATH as you can see in the code below). The downside is the process will be really slow, because Yahoo finance returns max 250 items in one page (defined by count parameter), and you need to load 40 pages to get 9000+ items. I tested for 10 pages, and it takes some time. However, I am working on a virtual Windows machine, so your computer should work much faster than mine.

Another point: If the data is updated quicker than this code then an item might change its place between the page loads, then inaccurate data might be loaded. Let's say the code is calling page 5, and item A is the 250th item in this page. Then during the page 6 call, item A changed its position to be 251, which means that it became the first item in page 6. In this scenario, you'll basically get a duplicate item A on page 6 call, and likely miss another item as it would move to the page 5.

Other than that, feel free to play with the following code which will basically do what you need (but in the way I don't like). Open the Power Query interface, create a new blank query, open the Advanced Editor, and copy and paste the following code. Hope it helps.

Power Query:
let
    // Fuction that returns the requested page
    fnGetTable = (page as number) as table =>
        let
            // Max allowed item count per page
            perpage = 250,
            // Retrieve HTML code for the URL formed by using the page number
            Source = Web.BrowserContents("https://finance.yahoo.com/crypto/?offset=" & Text.From(page * perpage) &  "&count=" & Text.From(perpage)),
            // Parse the HTML - I don't like this part but it works
            // If Yahoo makes a small change in this table, the this code will not work anymore
            TableFromHtml = Html.Table(Source, {{"Column1", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12)"}, {"Column2", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11)"}, {"Column3", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10)"}, {"Column4", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(9), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10) + TD[colspan=""""]:not([rowspan]):nth-child(4):nth-last-child(9)"}, {"Column5", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(9) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(8), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10) + TD[colspan=""""]:not([rowspan]):nth-child(4):nth-last-child(9) + TD[colspan=""""]:not([rowspan]):nth-child(5):nth-last-child(8)"}, {"Column6", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(9) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(8) + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(7), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10) + TD[colspan=""""]:not([rowspan]):nth-child(4):nth-last-child(9) + TD[colspan=""""]:not([rowspan]):nth-child(5):nth-last-child(8) + TD[colspan=""""]:not([rowspan]):nth-child(6):nth-last-child(7)"}, {"Column7", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(9) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(8) + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(7) + TH:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(6), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10) + TD[colspan=""""]:not([rowspan]):nth-child(4):nth-last-child(9) + TD[colspan=""""]:not([rowspan]):nth-child(5):nth-last-child(8) + TD[colspan=""""]:not([rowspan]):nth-child(6):nth-last-child(7) + TD[colspan=""""]:not([rowspan]):nth-child(7):nth-last-child(6)"}, {"Column8", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(9) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(8) + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(7) + TH:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(8):nth-last-child(5), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10) + TD[colspan=""""]:not([rowspan]):nth-child(4):nth-last-child(9) + TD[colspan=""""]:not([rowspan]):nth-child(5):nth-last-child(8) + TD[colspan=""""]:not([rowspan]):nth-child(6):nth-last-child(7) + TD[colspan=""""]:not([rowspan]):nth-child(7):nth-last-child(6) + TD[colspan=""""]:not([rowspan]):nth-child(8):nth-last-child(5)"}, {"Column9", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(9) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(8) + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(7) + TH:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(8):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(9):nth-last-child(4), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10) + TD[colspan=""""]:not([rowspan]):nth-child(4):nth-last-child(9) + TD[colspan=""""]:not([rowspan]):nth-child(5):nth-last-child(8) + TD[colspan=""""]:not([rowspan]):nth-child(6):nth-last-child(7) + TD[colspan=""""]:not([rowspan]):nth-child(7):nth-last-child(6) + TD[colspan=""""]:not([rowspan]):nth-child(8):nth-last-child(5) + TD[colspan=""""]:not([rowspan]):nth-child(9):nth-last-child(4)"}, {"Column10", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(9) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(8) + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(7) + TH:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(8):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(9):nth-last-child(4) + TH:not([colspan]):not([rowspan]):nth-child(10):nth-last-child(3), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10) + TD[colspan=""""]:not([rowspan]):nth-child(4):nth-last-child(9) + TD[colspan=""""]:not([rowspan]):nth-child(5):nth-last-child(8) + TD[colspan=""""]:not([rowspan]):nth-child(6):nth-last-child(7) + TD[colspan=""""]:not([rowspan]):nth-child(7):nth-last-child(6) + TD[colspan=""""]:not([rowspan]):nth-child(8):nth-last-child(5) + TD[colspan=""""]:not([rowspan]):nth-child(9):nth-last-child(4) + TD[colspan=""""]:not([rowspan]):nth-child(10):nth-last-child(3)"}, {"Column11", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(9) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(8) + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(7) + TH:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(8):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(9):nth-last-child(4) + TH:not([colspan]):not([rowspan]):nth-child(10):nth-last-child(3) + TH:not([colspan]):not([rowspan]):nth-child(11):nth-last-child(2), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10) + TD[colspan=""""]:not([rowspan]):nth-child(4):nth-last-child(9) + TD[colspan=""""]:not([rowspan]):nth-child(5):nth-last-child(8) + TD[colspan=""""]:not([rowspan]):nth-child(6):nth-last-child(7) + TD[colspan=""""]:not([rowspan]):nth-child(7):nth-last-child(6) + TD[colspan=""""]:not([rowspan]):nth-child(8):nth-last-child(5) + TD[colspan=""""]:not([rowspan]):nth-child(9):nth-last-child(4) + TD[colspan=""""]:not([rowspan]):nth-child(10):nth-last-child(3) + TD[colspan=""""]:not([rowspan]):nth-child(11):nth-last-child(2)"}, {"Column12", "TABLE.W\(100\%\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(12) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(11) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(10) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(9) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(8) + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(7) + TH:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(8):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(9):nth-last-child(4) + TH:not([colspan]):not([rowspan]):nth-child(10):nth-last-child(3) + TH:not([colspan]):not([rowspan]):nth-child(11):nth-last-child(2) + TH:not([colspan]):not([rowspan]):nth-child(12):nth-last-child(1), TABLE.W\(100\%\) > * > TR > TD[colspan=""""]:not([rowspan]):nth-child(1):nth-last-child(12) + TD[colspan=""""]:not([rowspan]):nth-child(2):nth-last-child(11) + TD[colspan=""""]:not([rowspan]):nth-child(3):nth-last-child(10) + TD[colspan=""""]:not([rowspan]):nth-child(4):nth-last-child(9) + TD[colspan=""""]:not([rowspan]):nth-child(5):nth-last-child(8) + TD[colspan=""""]:not([rowspan]):nth-child(6):nth-last-child(7) + TD[colspan=""""]:not([rowspan]):nth-child(7):nth-last-child(6) + TD[colspan=""""]:not([rowspan]):nth-child(8):nth-last-child(5) + TD[colspan=""""]:not([rowspan]):nth-child(9):nth-last-child(4) + TD[colspan=""""]:not([rowspan]):nth-child(10):nth-last-child(3) + TD[colspan=""""]:not([rowspan]):nth-child(11):nth-last-child(2) + TD[colspan=""""]:not([rowspan]):nth-child(12):nth-last-child(1)"}}, [RowSelector="TABLE.W\(100\%\) > * > TR"]),
            // Set the headers
            Result = Table.PromoteHeaders(TableFromHtml, [PromoteAllScalars=true])
        in
            // Return the batch as a table
            Result,

    // Call the function above for each page to be loaded
    // List.Accumulate is a nice function for this purpose
    Combined = List.Accumulate(
        {0..10}, // For 9000+ records, this needs to be changed to 0..40 - it might load really slow, basically 40 page loads
        #table({}, {}), // Starting with an empty table
        (state, current) => state & fnGetTable(current) // Adding each table to the previous one during this loop
    ),

    // Finally change the column types. Again, when Yahoo decides to add new columns or change a column name, then this will fail
    Result =  Table.TransformColumnTypes(Combined,{{"Symbol", type text}, {"Name", type text}, {"Price (Intraday)", type number}, {"Change", type number}, {"% Change", Percentage.Type}, {"Market Cap", type text}, {"Volume in Currency (Since 0:00 UTC)", type text}, {"Volume in Currency (24Hr)", type text}, {"Total Volume All Currencies (24Hr)", type text}, {"Circulating Supply", type text}, {"52 Week Range", type text}, {"Day Chart", type text}})
in
    Result
 
Upvote 0
Solution
Thank you so much for the quick response and detailed explanation. I think PQ is not best alternative for this case. :)
Thanks again,

Sağlıklı günler dilerim.
 
Upvote 0
As long as the website returns HTML but JSON/XML, you have no other choice to deal with the HTML parsing. And actually, the PQ way is the best way to do that but slow in this case. I am not familiar with Yahoo Finance, I don't think they do but if they have a web service (API) then it would be certainly much faster and reliable.

Sağlıklı günler dilerim.
Size de.
 
Upvote 0

Forum statistics

Threads
1,223,905
Messages
6,175,297
Members
452,633
Latest member
DougMo

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top