using Power Query to gather XML data from multiple url strings everyday,

heathball

Board Regular
Joined
Apr 6, 2017
Messages
135
Office Version
  1. 365
Platform
  1. Windows
I need to download xml data from multiple xml url strings every day, into Excel.

there are approx 300 url strings.

Only the data component of each string (2017/5/6) changes every day, otherwise, every string is constant. The date component of each url string will always be tomorrows date. (tomorrow meaning the day after i am gathering the data)

Most of the url strings will not be active (return empty or non/active) on a given day, but there will always be between 20 and 180 that do contain data, So i just need to check all and the ones that do contain data can be transferred to Excel.

i can format the data after the transfer if necessary, or perhaps PQ can achieve it, but the main goal i am looking for is automation with the process of the changing the dates within the url.


I have been told that Power Query can fully automate the task of changing the date component of each xml URL string every day.

I have been looking at Power Query (its not simple) and i have a few broad questions i was hoping to answer before i continue this uphill journey.

1. Can PQ automate this task, or at least semi-automate it?
2. Can PQ achieve a certain format within excel from the xml data that is gathered. For example can the data be neatly set out in a table-like structure for easy uploading into a database?
3. Is PQ a sensible choice for this task? Is there a better option?
4. If PQ is a good choice for this task, is there a link to a tutorial that specifically covers the xml (or related) part of PQ? I cant seem to find the right information online.

thanks in advance
 

Excel Facts

How can you turn a range sideways?
Copy the range. Select a blank cell. Right-click, Paste Special, then choose Transpose.
1. Yes
2. Yes
3. Yes
4. There are plenty of places that you can get information about PQ.

To get you started you will need to utilise two Power Query Functions to access the data.

Web.Contents - https://msdn.microsoft.com/en-us/library/mt260892.aspx
Xml.Tables - https://msdn.microsoft.com/en-us/library/mt260874.aspx

This will Parse the Web Contents as an XML Table, for example:

Code:
let
    Source = Xml.Tables(Web.Contents("https://www.w3schools.com/xml/note.xml"))
in
    Source

We'd need more info to assist with the URL generation.

But toget tomorrows date in the yyyy-mm-dd format you can use:

Code:
tDate = Date.ToText(DateTime.Date(Date.AddDays(DateTime.LocalNow(), 1)), "yyyy-MM-dd"),
 
Last edited:
Upvote 0
thanks for your reply
if this is the actual web page Imgur: The most awesome images on the Internet
that the xml url [TABLE="width: 139"]
<tbody>[TR]
[TD="width: 139"]http://tatts.com/pagedata/racing/2017/4/5/BR7.xml offers, why am i getting only 1 row, and seemingly not much data at all so far - here is what i have .... Imgur: The most awesome images on the Internet
has it got to do with requesting only the data that is in table format? [/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
Links are fine.

You haven't drilled down far enough. PQ is just displaying the Top level.

Click on the Table in "Meeting"

I'll give a better answer when I'm by a PC.
 
Upvote 0
In the meantime here's something for you to look at.

This code is in a Query Called BR7

Code:
let
    Source = Xml.Tables(Web.Contents("http://tatts.com/pagedata/racing/2017/4/5/BR7.xml ")),
    Meeting = Source{0}[Meeting],
    Race = Meeting{0}[Race],
    #"Removed Other Columns" = Table.SelectColumns(Race,{"Runner"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Data", each ExpandAll([Runner])),
    #"Expanded Data" = Table.ExpandTableColumn(#"Added Custom", "Data", {"WinOdds.Attribute:Odds", "WinOdds.Attribute:Lastodds", "WinOdds.Attribute:LastCalcTime", "WinOdds.Attribute:CalcTime", "WinOdds.Attribute:Short", "PlaceOdds.Attribute:Odds", "PlaceOdds.Attribute:Lastodds", "PlaceOdds.Attribute:Short", "FixedOdds.Book.Attribute:BookStatus", "FixedOdds.Book.Attribute:SubEventId", "FixedOdds.Attribute:OfferId", "FixedOdds.Attribute:RunnerNo", "FixedOdds.Attribute:RaceNo", "FixedOdds.Attribute:MeetingCode", "FixedOdds.Attribute:RaceDayDate", "FixedOdds.Attribute:WinOdds", "FixedOdds.Attribute:PlaceOdds", "FixedOdds.Attribute:RetailWinOdds", "FixedOdds.Attribute:RetailPlaceOdds", "FixedOdds.Attribute:OfferName", "FixedOdds.Attribute:Status", "FixedOdds.Attribute:LateScratching", "FixedOdds.Attribute:Deduction", "FixedOdds.Attribute:PlaceDeduction", "Attribute:RunnerNo", "Attribute:RunnerName", "Attribute:Scratched", "Attribute:ScratchStatus", "Attribute:Rider", "Attribute:RiderChanged", "Attribute:Barrier", "Attribute:Handicap", "Attribute:Weight", "Attribute:LastResult", "Attribute:Rtng", "Attribute:JockeySilk", "Attribute:Form"}, {"WinOdds.Attribute:Odds", "WinOdds.Attribute:Lastodds", "WinOdds.Attribute:LastCalcTime", "WinOdds.Attribute:CalcTime", "WinOdds.Attribute:Short", "PlaceOdds.Attribute:Odds", "PlaceOdds.Attribute:Lastodds", "PlaceOdds.Attribute:Short", "FixedOdds.Book.Attribute:BookStatus", "FixedOdds.Book.Attribute:SubEventId", "FixedOdds.Attribute:OfferId", "FixedOdds.Attribute:RunnerNo", "FixedOdds.Attribute:RaceNo", "FixedOdds.Attribute:MeetingCode", "FixedOdds.Attribute:RaceDayDate", "FixedOdds.Attribute:WinOdds", "FixedOdds.Attribute:PlaceOdds", "FixedOdds.Attribute:RetailWinOdds", "FixedOdds.Attribute:RetailPlaceOdds", "FixedOdds.Attribute:OfferName", "FixedOdds.Attribute:Status", "FixedOdds.Attribute:LateScratching", "FixedOdds.Attribute:Deduction", "FixedOdds.Attribute:PlaceDeduction", "Attribute:RunnerNo", "Attribute:RunnerName", "Attribute:Scratched", "Attribute:ScratchStatus", "Attribute:Rider", "Attribute:RiderChanged", "Attribute:Barrier", "Attribute:Handicap", "Attribute:Weight", "Attribute:LastResult", "Attribute:Rtng", "Attribute:JockeySilk", "Attribute:Form"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Data",{"Runner"})
in
    #"Removed Columns"

Then in a Query called ExpandAll

Code:
let
    //Define function taking two parameters - a table and an optional column number 
    Source = (TableToExpand as table, optional ColumnNumber as number) =>
    let
     //If the column number is missing, make it 0
     ActualColumnNumber = if (ColumnNumber=null) then 0 else ColumnNumber,
     //Find the column name relating to the column number
     ColumnName = Table.ColumnNames(TableToExpand){ActualColumnNumber},
     //Get a list containing all of the values in the column
     ColumnContents = Table.Column(TableToExpand, ColumnName),
     //Iterate over each value in the column and then
     //If the value is of type table get a list of all of the columns in the table
     //Then get a distinct list of all of these column names
     ColumnsToExpand = List.Distinct(List.Combine(List.Transform(ColumnContents, 
                        each if _ is table then Table.ColumnNames(_) else {}))),
     //Append the original column name to the front of each of these column names
     NewColumnNames = List.Transform(ColumnsToExpand, each ColumnName & "." & _),
     //Is there anything to expand in this column?
     CanExpandCurrentColumn = List.Count(ColumnsToExpand)>0,
     //If this column can be expanded, then expand it
     ExpandedTable = if CanExpandCurrentColumn 
                         then 
                         Table.ExpandTableColumn(TableToExpand, ColumnName, 
                                ColumnsToExpand, NewColumnNames) 
                         else 
                         TableToExpand,
     //If the column has been expanded then keep the column number the same, otherwise add one to it
     NextColumnNumber = if CanExpandCurrentColumn then ActualColumnNumber else ActualColumnNumber+1,
     //If the column number is now greater than the number of columns in the table
     //Then return the table as it is
     //Else call the ExpandAll function recursively with the expanded table
     OutputTable = if NextColumnNumber>(Table.ColumnCount(ExpandedTable)-1) 
                        then 
                        ExpandedTable 
                        else 
                        ExpandAll(ExpandedTable, NextColumnNumber)
    in
     OutputTable
in
    Source

Source from Chris Webb's Blog: https://blog.crossjoin.co.uk/2014/05/21/expanding-all-columns-in-a-table-in-power-query/
 
Upvote 0
That looks like a very helpful resource. Thanks.
I see what you mean by drill down. The literal meaning. I didn't see that earlier.
with regards to the codes, they look fanstastic, once i work out where to place them.

I have my table on PQ. It looks awesome. Imgur: The most awesome images on the Internet

i just don't know why it does not load onto excel when i select "close and load."

Im also not clear on where the code is placed, or if something has to be added to it, in order to change the date component of many url xml strings. I assume its a function, but then there are parameters.
 
Upvote 0
Upvote 0
Here's a sample.

You simply have to enter the end of the URL ("MRFields", "BRFields", "SRFields") into the Table on sheet1 and sheet2 will return the results.

Well as far as I can guess anyway.
 
Upvote 0

Forum statistics

Threads
1,225,730
Messages
6,186,698
Members
453,369
Latest member
positivemind

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top