# using Power Query to gather XML  data from multiple url strings everyday,



## heathball (Apr 6, 2017)

I need to download xml data from multiple xml url strings every day, into Excel.

there are approx 300 url strings. 

Only the data component of each string (2017/5/6) changes every day, otherwise, every string is constant. The date component of each url string will always be tomorrows date. (tomorrow meaning the day after i am gathering the data)

Most of the url strings will not be active (return empty or non/active) on a given day, but there will always be between 20 and 180 that do contain data, So i just need to check all and the ones that do contain data can be transferred to Excel.

i can format the data after the transfer if necessary, or perhaps PQ can achieve it, but the main goal i am looking for is automation with the process of the changing the dates within the url.


I have been told that Power Query can fully automate the task of changing the date component of each xml URL string every day.

I have been looking at Power Query  (its not simple) and i have a few broad questions i was hoping to answer before i continue this uphill journey.

1. Can PQ automate this task, or at least semi-automate it? 
2. Can PQ achieve a certain format within excel from the xml data that is gathered. For example can the data be neatly set out in a table-like structure for easy uploading into a database?
3. Is PQ a sensible choice for this task? Is there a better option?
4. If PQ is a good choice for this task, is there a link to a tutorial that specifically covers the xml (or related) part of PQ?  I cant seem to find the right information online.

thanks in advance


----------



## Comfy (Apr 6, 2017)

1. Yes
2. Yes
3. Yes
4. There are plenty of places that you can get information about PQ.

To get you started you will need to utilise two Power Query Functions to access the data.

Web.Contents - https://msdn.microsoft.com/en-us/library/mt260892.aspx
Xml.Tables - https://msdn.microsoft.com/en-us/library/mt260874.aspx

This will Parse the Web Contents as an XML Table, for example:


```
let
    Source = Xml.Tables(Web.Contents("https://www.w3schools.com/xml/note.xml"))
in
    Source
```

We'd need more info to assist with the URL generation.

But toget tomorrows date in the yyyy-mm-dd format you can use:


```
tDate = Date.ToText(DateTime.Date(Date.AddDays(DateTime.LocalNow(), 1)), "yyyy-MM-dd"),
```


----------



## heathball (Apr 7, 2017)

thanks for your reply
if this is the actual web page  Imgur: The most awesome images on the Internet 
that the xml url  
http://tatts.com/pagedata/racing/2017/4/5/BR7.xml     offers, why am i getting only 1 row, and seemingly not much data at all so far  - here is what i have ....  Imgur: The most awesome images on the Internet
has it got to do with requesting only the data that is in table format? 

<tbody>

</tbody>


----------



## heathball (Apr 7, 2017)

Im not sure if those imgur links can be opened on this site.


----------



## Comfy (Apr 7, 2017)

Links are fine.

You haven't drilled down far enough.  PQ is just displaying the Top level.

Click on the Table in "Meeting"

I'll give a better answer when I'm by a PC.


----------



## Comfy (Apr 7, 2017)

Is the Runner information that you are interested in?


----------



## Comfy (Apr 7, 2017)

In the meantime here's something for you to look at.

This code is in a Query Called *BR7*


```
let
    Source = Xml.Tables(Web.Contents("http://tatts.com/pagedata/racing/2017/4/5/BR7.xml ")),
    Meeting = Source{0}[Meeting],
    Race = Meeting{0}[Race],
    #"Removed Other Columns" = Table.SelectColumns(Race,{"Runner"}),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Data", each ExpandAll([Runner])),
    #"Expanded Data" = Table.ExpandTableColumn(#"Added Custom", "Data", {"WinOdds.Attribute:Odds", "WinOdds.Attribute:Lastodds", "WinOdds.Attribute:LastCalcTime", "WinOdds.Attribute:CalcTime", "WinOdds.Attribute:Short", "PlaceOdds.Attribute:Odds", "PlaceOdds.Attribute:Lastodds", "PlaceOdds.Attribute:Short", "FixedOdds.Book.Attribute:BookStatus", "FixedOdds.Book.Attribute:SubEventId", "FixedOdds.Attribute:OfferId", "FixedOdds.Attribute:RunnerNo", "FixedOdds.Attribute:RaceNo", "FixedOdds.Attribute:MeetingCode", "FixedOdds.Attribute:RaceDayDate", "FixedOdds.Attribute:WinOdds", "FixedOdds.Attribute:PlaceOdds", "FixedOdds.Attribute:RetailWinOdds", "FixedOdds.Attribute:RetailPlaceOdds", "FixedOdds.Attribute:OfferName", "FixedOdds.Attribute:Status", "FixedOdds.Attribute:LateScratching", "FixedOdds.Attribute:Deduction", "FixedOdds.Attribute:PlaceDeduction", "Attribute:RunnerNo", "Attribute:RunnerName", "Attribute:Scratched", "Attribute:ScratchStatus", "Attribute:Rider", "Attribute:RiderChanged", "Attribute:Barrier", "Attribute:Handicap", "Attribute:Weight", "Attribute:LastResult", "Attribute:Rtng", "Attribute:JockeySilk", "Attribute:Form"}, {"WinOdds.Attribute:Odds", "WinOdds.Attribute:Lastodds", "WinOdds.Attribute:LastCalcTime", "WinOdds.Attribute:CalcTime", "WinOdds.Attribute:Short", "PlaceOdds.Attribute:Odds", "PlaceOdds.Attribute:Lastodds", "PlaceOdds.Attribute:Short", "FixedOdds.Book.Attribute:BookStatus", "FixedOdds.Book.Attribute:SubEventId", "FixedOdds.Attribute:OfferId", "FixedOdds.Attribute:RunnerNo", "FixedOdds.Attribute:RaceNo", "FixedOdds.Attribute:MeetingCode", "FixedOdds.Attribute:RaceDayDate", "FixedOdds.Attribute:WinOdds", "FixedOdds.Attribute:PlaceOdds", "FixedOdds.Attribute:RetailWinOdds", "FixedOdds.Attribute:RetailPlaceOdds", "FixedOdds.Attribute:OfferName", "FixedOdds.Attribute:Status", "FixedOdds.Attribute:LateScratching", "FixedOdds.Attribute:Deduction", "FixedOdds.Attribute:PlaceDeduction", "Attribute:RunnerNo", "Attribute:RunnerName", "Attribute:Scratched", "Attribute:ScratchStatus", "Attribute:Rider", "Attribute:RiderChanged", "Attribute:Barrier", "Attribute:Handicap", "Attribute:Weight", "Attribute:LastResult", "Attribute:Rtng", "Attribute:JockeySilk", "Attribute:Form"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Data",{"Runner"})
in
    #"Removed Columns"
```

Then in a Query called *ExpandAll*


```
let
    //Define function taking two parameters - a table and an optional column number 
    Source = (TableToExpand as table, optional ColumnNumber as number) =>
    let
     //If the column number is missing, make it 0
     ActualColumnNumber = if (ColumnNumber=null) then 0 else ColumnNumber,
     //Find the column name relating to the column number
     ColumnName = Table.ColumnNames(TableToExpand){ActualColumnNumber},
     //Get a list containing all of the values in the column
     ColumnContents = Table.Column(TableToExpand, ColumnName),
     //Iterate over each value in the column and then
     //If the value is of type table get a list of all of the columns in the table
     //Then get a distinct list of all of these column names
     ColumnsToExpand = List.Distinct(List.Combine(List.Transform(ColumnContents, 
                        each if _ is table then Table.ColumnNames(_) else {}))),
     //Append the original column name to the front of each of these column names
     NewColumnNames = List.Transform(ColumnsToExpand, each ColumnName & "." & _),
     //Is there anything to expand in this column?
     CanExpandCurrentColumn = List.Count(ColumnsToExpand)>0,
     //If this column can be expanded, then expand it
     ExpandedTable = if CanExpandCurrentColumn 
                         then 
                         Table.ExpandTableColumn(TableToExpand, ColumnName, 
                                ColumnsToExpand, NewColumnNames) 
                         else 
                         TableToExpand,
     //If the column has been expanded then keep the column number the same, otherwise add one to it
     NextColumnNumber = if CanExpandCurrentColumn then ActualColumnNumber else ActualColumnNumber+1,
     //If the column number is now greater than the number of columns in the table
     //Then return the table as it is
     //Else call the ExpandAll function recursively with the expanded table
     OutputTable = if NextColumnNumber>(Table.ColumnCount(ExpandedTable)-1) 
                        then 
                        ExpandedTable 
                        else 
                        ExpandAll(ExpandedTable, NextColumnNumber)
    in
     OutputTable
in
    Source
```

Source from Chris Webb's Blog: https://blog.crossjoin.co.uk/2014/05/21/expanding-all-columns-in-a-table-in-power-query/


----------



## heathball (Apr 7, 2017)

That looks like a very helpful resource. Thanks.
I see what you mean by drill down. The literal meaning. I didn't see that earlier.
with regards to the codes, they look fanstastic, once i work out where to place them.

I have my table on PQ. It looks awesome.  Imgur: The most awesome images on the Internet

i just don't know why it does not load onto excel when i select "close and load."

Im also not clear on where the code is placed, or if something has to be added to it, in order to change the date component of many url xml strings. I assume its a function, but then there are parameters.


----------



## heathball (Apr 7, 2017)

these are some examples of the url xml strings
https://tatts.com/pagedata/racing/2017/4/8/MRFields.xml
https://tatts.com/pagedata/racing/2017/4/8/BRFields.xml
https://tatts.com/pagedata/racing/2017/4/8/SRFields.xml

to keep it simple, i have chosen a set of strings that number about 22, instead of the original plan of 300. Later when I know what i am doing with PQ, i will gather that extra column that the other individual strings contain, and add it to this table.


----------



## Comfy (Apr 7, 2017)

Here's a sample.

You simply have to enter the end of the URL ("MRFields", "BRFields", "SRFields") into the Table on sheet1 and sheet2 will return the results.

Well as far as I can guess anyway.


----------



## heathball (Apr 6, 2017)

I need to download xml data from multiple xml url strings every day, into Excel.

there are approx 300 url strings. 

Only the data component of each string (2017/5/6) changes every day, otherwise, every string is constant. The date component of each url string will always be tomorrows date. (tomorrow meaning the day after i am gathering the data)

Most of the url strings will not be active (return empty or non/active) on a given day, but there will always be between 20 and 180 that do contain data, So i just need to check all and the ones that do contain data can be transferred to Excel.

i can format the data after the transfer if necessary, or perhaps PQ can achieve it, but the main goal i am looking for is automation with the process of the changing the dates within the url.


I have been told that Power Query can fully automate the task of changing the date component of each xml URL string every day.

I have been looking at Power Query  (its not simple) and i have a few broad questions i was hoping to answer before i continue this uphill journey.

1. Can PQ automate this task, or at least semi-automate it? 
2. Can PQ achieve a certain format within excel from the xml data that is gathered. For example can the data be neatly set out in a table-like structure for easy uploading into a database?
3. Is PQ a sensible choice for this task? Is there a better option?
4. If PQ is a good choice for this task, is there a link to a tutorial that specifically covers the xml (or related) part of PQ?  I cant seem to find the right information online.

thanks in advance


----------



## heathball (Apr 8, 2017)

Ok thanks
Just one more question.
If i am to automate the process of changing the dates within the urls each day.
should i look into a using a combination of both  functions and parameters, or just one of those?
thanks


----------



## Comfy (Apr 8, 2017)

The workbook I posted already automates changing the URL.


----------



## Comfy (Apr 8, 2017)

Ahh, I didn't share the link.

I can only put that down to how late it was and the vino in my system 

https://1drv.ms/x/s!AoASJ-g9hQ69g9NdJtwv3DBzJQReEg


----------

