XL Pro
Board Regular
- Joined
- Apr 17, 2002
- Messages
- 249
- Office Version
- 365
- Platform
- Windows
Hi - just started learning Power Query / Get & Transform this month - what an incredible tool! My background is VBA, so getting data cleaned/shaped without writing any code is incredible!
Has anyone tried reading an XML Spreadsheet? Here's the header of my data:
Data is received monthly with 150K+ rows give or take. I used a small subset of about 2K rows to create my M Code, but when applied to the full dataset, I've waited over 30min without any results and just canceled the process.
Here's the M Code I have (It uses Ken Puls' Parameter Table to get a file- local, not on a shared drive)
Currently I get great performance if I open the file first and save as an Excel file. A bit better performance if I save it as a CSV. Obviously, with those two formats, it's already in table format, I don't have to traverse the XML.
I don't know if I'm parsing the XML correctly or efficiently. I'd really like to be able to use the XML file directly rather than taking the extra "open/save as" route.
Thanks for any tips/tricks.
Has anyone tried reading an XML Spreadsheet? Here's the header of my data:
HTML:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="date"><NumberFormat ss:Format="Short Date"/></Style>
<Style ss:ID="bold"><Font ss:Bold="1"/></Style>
<Style ss:ID="default"></Style>
</Styles>
<Worksheet ss:Name="Data">
<Table>
<Row ss:StyleID="bold"><Cell><Data ss:Type="String">Company</Data></Cell><Cell><Data ss:Type="String">Job</Data></Cell><Cell><Data ss:Type="String">Emp Num</Data></Cell><Cell><Data ss:Type="String">Emp Name</Data></Cell><Cell><Data ss:Type="String">SSN</Data></Cell><Cell><Data ss:Type="String">Period Date</Data></Cell><Cell><Data ss:Type="String">Skill</Data></Cell><Cell><Data ss:Type="String">Hour Type</Data></Cell><Cell><Data ss:Type="String">ST</Data></Cell><Cell><Data ss:Type="String">OT</Data></Cell><Cell><Data ss:Type="String">DT</Data></Cell><Cell><Data ss:Type="String">HW</Data></Cell><Cell><Data ss:Type="String">HP</Data></Cell><Cell><Data ss:Type="String">Skill Category</Data></Cell><Cell><Data ss:Type="String">Skill Category Name</Data></Cell><Cell><Data ss:Type="String">FD</Data></Cell><Cell><Data ss:Type="String">Calc No.</Data></Cell><Cell><Data ss:Type="String">Description</Data></Cell><Cell><Data ss:Type="String">Contribution</Data></Cell><Cell><Data ss:Type="String">VH</Data></Cell><Cell><Data ss:Type="String">Contribution Amount</Data></Cell><Cell><Data ss:Type="String">Has Dues</Data></Cell><Cell><Data ss:Type="String">Dues Amnt</Data></Cell><Cell><Data ss:Type="String">Has PAC</Data></Cell><Cell><Data ss:Type="String">PAC Amount</Data></Cell><Cell><Data ss:Type="String">Has 401k</Data></Cell><Cell><Data ss:Type="String">401K Amount</Data></Cell><Cell><Data ss:Type="String">401K Rate</Data></Cell><Cell><Data ss:Type="String">Premium</Data></Cell><Cell><Data ss:Type="String">Has ROTH</Data></Cell><Cell><Data ss:Type="String">ROTH Amount</Data></Cell><Cell><Data ss:Type="String">ROTH Rate</Data></Cell><Cell><Data ss:Type="String">Roth Premium</Data></Cell><Cell><Data ss:Type="String">Address</Data></Cell><Cell><Data ss:Type="String">Local</Data></Cell><Cell><Data ss:Type="String">Union Skill</Data></Cell><Cell><Data ss:Type="String">Skill Name</Data></Cell><Cell><Data ss:Type="String">Skill Class Prior Rate</Data></Cell><Cell><Data ss:Type="String">Skill Class Current Rate</Data></Cell><Cell><Data ss:Type="String">Skill Class Due</Data></Cell><Cell><Data ss:Type="String">Skill Class HRA</Data></Cell><Cell><Data ss:Type="String">Skill Class 401A</Data></Cell><Cell><Data ss:Type="String">Skill Credit Union Code</Data></Cell><Cell><Data ss:Type="String">Skill Credit Union Rate</Data></Cell><Cell><Data ss:Type="String">Skill 401k Code</Data></Cell><Cell><Data ss:Type="String">Skill 401k Rate</Data></Cell><Cell><Data ss:Type="String">Unassigned</Data></Cell><Cell><Data ss:Type="String">Company Name</Data></Cell><Cell><Data ss:Type="String">CO Addr1</Data></Cell><Cell><Data ss:Type="String">CO Addr2</Data></Cell><Cell><Data ss:Type="String">CO City</Data></Cell><Cell><Data ss:Type="String">CO State</Data></Cell><Cell><Data ss:Type="String">CO Zip</Data></Cell><Cell><Data ss:Type="String">Master Agreement</Data></Cell><Cell><Data ss:Type="String">Year</Data></Cell><Cell><Data ss:Type="String">Period</Data></Cell><Cell><Data ss:Type="String">Check</Data></Cell><Cell><Data ss:Type="String">Tran</Data></Cell><Cell><Data ss:Type="String">Key</Data></Cell><Cell><Data ss:Type="String">Union</Data></Cell><Cell><Data ss:Type="String">Gross</Data></Cell><Cell><Data ss:Type="String">Manual Check</Data></Cell></Row>
</Table>
</Worksheet>
</Workbook>
Here's the M Code I have (It uses Ken Puls' Parameter Table to get a file- local, not on a shared drive)
Code:
let
SolutionFile = fnGetParameter("File"),
Source = Xml.Tables(File.Contents(SolutionFile)),
#"Filtered Rows" = Table.SelectRows(Source, each not Text.Contains([Name], "Styles")),
Table = #"Filtered Rows"{0}[Table],
#"Filtered Rows1" = Table.SelectRows(Table, each ([Name] = "Table")),
Table1 = #"Filtered Rows1"{0}[Table],
Table2 = Table1{0}[Table],
#"Removed Columns" = Table.RemoveColumns(Table2,{"urn:schemas-microsoft-com:office:spreadsheet"}),
AddedIndex = Table.AddIndexColumn(#"Removed Columns", "Index", 0, 1),
#"Expanded Cell" = Table.ExpandTableColumn(AddedIndex, "Cell", {"Data"}, {"Cell.Data"}),
#"Expanded Cell.Data" = Table.ExpandTableColumn(#"Expanded Cell", "Cell.Data", {"Element:Text"}, {"Cell.Data.Element:Text"}),
#"Renamed Columns" = Table.RenameColumns(#"Expanded Cell.Data",{{"Cell.Data.Element:Text", "Data"}, {"Index", "Row"}}),
#"Added Index" = Table.AddIndexColumn(#"Renamed Columns", "Index", 0, 1),
#"Inserted Modulo" = Table.AddColumn(#"Added Index", "Inserted Modulo", each Number.Mod([Index], 63), type number),
#"Removed Columns1" = Table.RemoveColumns(#"Inserted Modulo",{"Index"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns1", {{"Inserted Modulo", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns1", {{"Inserted Modulo", type text}}, "en-US")[#"Inserted Modulo"]), "Inserted Modulo", "Data"),
#"Removed Columns2" = Table.RemoveColumns(#"Pivoted Column",{"Row"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Columns2"),
Custom1 = Table.RowCount(#"Promoted Headers")
in
Custom1
I don't know if I'm parsing the XML correctly or efficiently. I'd really like to be able to use the XML file directly rather than taking the extra "open/save as" route.
Thanks for any tips/tricks.