Getting Data from a web site

mwollert

New Member
Joined
Jul 13, 2020
Messages
27
Office Version
  1. 365
Platform
  1. Windows
I have a date connection to a website to get a price history from a table. My connection has worked great for a year or so. Now the website has been updated and my connection stopped working. I originally thought I just needed to make a new connection tot he table. Tried to make a new connection but getting error.

The url for the page that contains the table is :

Share price history | Thrift Savings Plan

The table I want is at the bottom. When I create a connection and choose 'Document' it appears that there is no data in the table.

1594655699193.png


But I hit the refresh icon and the table appears :

1594655767922.png


I then choose 'Load to.." and choose 'Table' and 'Existing worksheet = $A$1' on the import data form but get an error "[Expression Error] The Column 'Date' of the table wasn't found."

Any suggestions on importing this data. I could just copy and paste it but really want it to be automated as it needs an update every business day.

Thanks
 

Excel Facts

How to fill five years of quarters?
Type 1Q-2023 in a cell. Grab the fill handle and drag down or right. After 4Q-2023, Excel will jump to 1Q-2024. Dash can be any character.
Welcome to MrExcel forums.

It's because the web page loads the data table using a separate JavaScript request to https://secure.tsp.gov/components/C...ddate=20200714&Lfunds=1&InvFunds=1&download=0

As you can see, the URL contains startdate and enddate query parameters, which I guess are 30 days ago and the current date. It is possible to update this date range in the Power Query so you always import the latest 30 days of data when you refresh the query.

Create a Power Query using that URL instead of your web page URL. Then Edit the Query and open the Advanced Editor and replace the query code with this:
Code:
let

    TimeNow = DateTime.LocalNow(),

    // Subtract 30 days from current date and format as yyyyMMdd
    StartDate = Date.ToText(Date.AddDays(DateTime.Date(TimeNow),-30),"yyyyMMdd"),

    EndDate = Date.ToText(DateTime.Date(TimeNow),"yyyyMMdd"),

    URL = Text.Combine({"https://secure.tsp.gov/components/CORS/getSharePricesRaw.html?startdate=",StartDate,"&enddate=",EndDate,"&Lfunds=1&InvFunds=1&download=0"}),

    Source = Csv.Document(Web.Contents(URL),[Delimiter=",", Columns=16, Encoding=28591, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {"L Income", type number}, {"L 2025", type number}, {"L 2030", type number}, {"L 2035", type number}, {"L 2040", type number}, {"L 2045", type number}, {"L 2050", type number}, {"L 2055", type number}, {"L 2060", type number}, {"L 2065", type number}, {"G Fund", type number}, {"F Fund", type number}, {"C Fund", type number}, {"S Fund", type number}, {"I Fund", type number}}),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Changed Type",2),
    #"Sorted Rows" = Table.Sort(#"Removed Bottom Rows",{{"Date", Order.Descending}})
in
    #"Sorted Rows"
The query also contains steps to remove the bottom 2 rows which were blank and sort by descending Date column.
 
Upvote 0
Solution
Welcome to MrExcel forums.

It's because the web page loads the data table using a separate JavaScript request to https://secure.tsp.gov/components/C...ddate=20200714&Lfunds=1&InvFunds=1&download=0

As you can see, the URL contains startdate and enddate query parameters, which I guess are 30 days ago and the current date. It is possible to update this date range in the Power Query so you always import the latest 30 days of data when you refresh the query.

Create a Power Query using that URL instead of your web page URL. Then Edit the Query and open the Advanced Editor and replace the query code with this:
Code:
let

    TimeNow = DateTime.LocalNow(),

    // Subtract 30 days from current date and format as yyyyMMdd
    StartDate = Date.ToText(Date.AddDays(DateTime.Date(TimeNow),-30),"yyyyMMdd"),

    EndDate = Date.ToText(DateTime.Date(TimeNow),"yyyyMMdd"),

    URL = Text.Combine({"https://secure.tsp.gov/components/CORS/getSharePricesRaw.html?startdate=",StartDate,"&enddate=",EndDate,"&Lfunds=1&InvFunds=1&download=0"}),

    Source = Csv.Document(Web.Contents(URL),[Delimiter=",", Columns=16, Encoding=28591, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {"L Income", type number}, {"L 2025", type number}, {"L 2030", type number}, {"L 2035", type number}, {"L 2040", type number}, {"L 2045", type number}, {"L 2050", type number}, {"L 2055", type number}, {"L 2060", type number}, {"L 2065", type number}, {"G Fund", type number}, {"F Fund", type number}, {"C Fund", type number}, {"S Fund", type number}, {"I Fund", type number}}),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Changed Type",2),
    #"Sorted Rows" = Table.Sort(#"Removed Bottom Rows",{{"Date", Order.Descending}})
in
    #"Sorted Rows"
The query also contains steps to remove the bottom 2 rows which were blank and sort by descending Date column.


Thanks for the in depth reply. I tried to connect to the new URL but I get an error that I'm not able to connect :

1594745191924.png


I was able to find a online you-tube that helped me get the data off the original site. It runs a bit slow but only need to update once a day so it is not too bad.

I know I am doing a few extra steps because there are only 2 tables I need to get. First is the column headings (which I don't really need), and the body table.

Here is the code I'm using that is working, but I think your code would be more efficient.
Code:
Sub ScrapeTSPSiteForPrices()
 Dim IE As New SHDocVw.InternetExplorer
 Dim HTMLDoc As MSHTML.HTMLDocument
 Dim HTMLTables As MSHTML.IHTMLElementCollection
 Dim HTMLTable As MSHTML.IHTMLElement
 Dim HTMLDiv As MSHTML.IHTMLElement
 Dim TableSection As MSHTML.IHTMLElement
 Dim TableRow As MSHTML.IHTMLElement
 Dim TableCell As MSHTML.IHTMLElement
 Dim RowText As String
 Dim RowNumber As Integer
 Dim ColNumber As Integer
 
 Application.ScreenUpdating = False
 Application.EnableEvents = False
 
 
 Worksheets("TSP Link").Range("A1:Q30").ClearContents
 
 IE.Visible = False
 IE.Navigate "https://www.tsp.gov/fund-performance/share-price-history/"

 Do While IE.readyState <> READYSTATE_COMPLETE Or IE.Busy
 Loop
 
 Set HTMLDoc = IE.Document
  
 Set HTMLDiv = HTMLDoc.getElementById("dynamic-share-price-table")
 Set HTMLTables = HTMLDiv.getElementsByTagName("table")
 
  
 For Each HTMLTable In HTMLTables
     
     RowNumber = -1
   
     For Each TableSection In HTMLTable.Children
       
         For Each TableRow In TableSection.Children
                    
            RowNumber = RowNumber + 1
            ColNumber = 0
            
            For Each TableCell In TableRow.Children
               ColNumber = ColNumber + 1
               Worksheets("TSP Link").Cells(RowNumber, ColNumber) = TableCell.innerText
               
             
            Next TableCell
               
        Next TableRow
   
   Next TableSection
   
Next HTMLTable

Application.ScreenUpdating = True
Application.EnableEvents = True
   
 
End Sub

Thank you for your assistance. If you could let me know how you got around the error, I would like to try your method out.
 
Upvote 0
Copy the underlying link, not the displayed link which the forum shortens with "..." in the middle.
 
Upvote 0
Welcome to MrExcel forums.

It's because the web page loads the data table using a separate JavaScript request to https://secure.tsp.gov/components/C...ddate=20200714&Lfunds=1&InvFunds=1&download=0

As you can see, the URL contains startdate and enddate query parameters, which I guess are 30 days ago and the current date. It is possible to update this date range in the Power Query so you always import the latest 30 days of data when you refresh the query.

Create a Power Query using that URL instead of your web page URL. Then Edit the Query and open the Advanced Editor and replace the query code with this:
Code:
let

    TimeNow = DateTime.LocalNow(),

    // Subtract 30 days from current date and format as yyyyMMdd
    StartDate = Date.ToText(Date.AddDays(DateTime.Date(TimeNow),-30),"yyyyMMdd"),

    EndDate = Date.ToText(DateTime.Date(TimeNow),"yyyyMMdd"),

    URL = Text.Combine({"https://secure.tsp.gov/components/CORS/getSharePricesRaw.html?startdate=",StartDate,"&enddate=",EndDate,"&Lfunds=1&InvFunds=1&download=0"}),

    Source = Csv.Document(Web.Contents(URL),[Delimiter=",", Columns=16, Encoding=28591, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {"L Income", type number}, {"L 2025", type number}, {"L 2030", type number}, {"L 2035", type number}, {"L 2040", type number}, {"L 2045", type number}, {"L 2050", type number}, {"L 2055", type number}, {"L 2060", type number}, {"L 2065", type number}, {"G Fund", type number}, {"F Fund", type number}, {"C Fund", type number}, {"S Fund", type number}, {"I Fund", type number}}),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Changed Type",2),
    #"Sorted Rows" = Table.Sort(#"Removed Bottom Rows",{{"Date", Order.Descending}})
in
    #"Sorted Rows"
The query also contains steps to remove the bottom 2 rows which were blank and sort by descending Date column.


Hello again,

John W's solution to my problem has been working for nearly 2 years but the site has recently under went an update and my Power Query has stopped working.


The table is at :


and it is the table that has the date and prices for various funds.

I have been using the above Query but it seems to no longer find the table I am searching for.

Any ideas on getting this table into my (looking to put the table in my current workbook on the "TSP Link' sheet at $A$1) will be appreciated.
 
Upvote 0
If you use your browser's developer tools (press F12), the Network tab (Firefox) shows that the web page makes a JS request to:


Even though the start and end dates in the URL and query string don't relate exactly to the current date, the above URL returns the share price history for the last 29 days from the current date.

Therefore, create a Power Query web query with the above URL. But you must add, via the Advanced option, the following HTTP request header:

ParameterValue
User-AgentMozilla/5.0 (Windows NT 10.0; Win64; x64; Trident/7.0; rv:11.0) like Gecko

Without the above header the PQ connection attempt displays the error "Access to the resource is forbidden".

Next, select Document in the Navigator and click Transform Data. Delete the Navigation step and edit the Source step to replace "Web.Page" with "Csv.Document". Sort the Date column in descending order and now the data is in the correct format to Close & Load to your worksheet.

The full query, shown in the Advanced Editor is:
Power Query:
let
    Source = Csv.Document(Web.Contents("https://www.tsp.gov/data/getSharePricesRaw_startdate_20220424_enddate_20220524_Lfunds_1_InvFunds_1_download_0.html?startdate=2022-04-26&enddate=2022-05-26&Lfunds=1&InvFunds=1&download=0", [Headers=[#"User-Agent"="Mozilla/5.0 (Windows NT 10.0; Win64; x64; Trident/7.0; rv:11.0) like Gecko"]])),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {"L Income", type number}, {"L 2025", type number}, {"L 2030", type number}, {"L 2035", type number}, {"L 2040", type number}, {"L 2045", type number}, {"L 2050", type number}, {"L 2055", type number}, {"L 2060", type number}, {"L 2065", type number}, {"G Fund", type number}, {"F Fund", type number}, {"C Fund", type number}, {"S Fund", type number}, {"I Fund", type number}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Date", Order.Descending}})
in
    #"Sorted Rows"
You can now edit the query to customise it for the current date, using lines similar to post #2.
 
Upvote 0
If you use your browser's developer tools (press F12), the Network tab (Firefox) shows that the web page makes a JS request to:


Even though the start and end dates in the URL and query string don't relate exactly to the current date, the above URL returns the share price history for the last 29 days from the current date.

Therefore, create a Power Query web query with the above URL. But you must add, via the Advanced option, the following HTTP request header:

ParameterValue
User-AgentMozilla/5.0 (Windows NT 10.0; Win64; x64; Trident/7.0; rv:11.0) like Gecko

Without the above header the PQ connection attempt displays the error "Access to the resource is forbidden".

Next, select Document in the Navigator and click Transform Data. Delete the Navigation step and edit the Source step to replace "Web.Page" with "Csv.Document". Sort the Date column in descending order and now the data is in the correct format to Close & Load to your worksheet.

The full query, shown in the Advanced Editor is:
Power Query:
let
    Source = Csv.Document(Web.Contents("https://www.tsp.gov/data/getSharePricesRaw_startdate_20220424_enddate_20220524_Lfunds_1_InvFunds_1_download_0.html?startdate=2022-04-26&enddate=2022-05-26&Lfunds=1&InvFunds=1&download=0", [Headers=[#"User-Agent"="Mozilla/5.0 (Windows NT 10.0; Win64; x64; Trident/7.0; rv:11.0) like Gecko"]])),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {"L Income", type number}, {"L 2025", type number}, {"L 2030", type number}, {"L 2035", type number}, {"L 2040", type number}, {"L 2045", type number}, {"L 2050", type number}, {"L 2055", type number}, {"L 2060", type number}, {"L 2065", type number}, {"G Fund", type number}, {"F Fund", type number}, {"C Fund", type number}, {"S Fund", type number}, {"I Fund", type number}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Date", Order.Descending}})
in
    #"Sorted Rows"
You can now edit the query to customise it for the current date, using lines similar to post #2.

Thank you for the explanation and the code. It working just perfectly now.

Thanks again!
 
Upvote 0

Forum statistics

Threads
1,223,243
Messages
6,170,967
Members
452,371
Latest member
Frana

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top