I am using Power Query to compare column values in two tables, where each table has a Key column as well as around 20 dimension columns, which I want to check to make sure that the dimensions align between the two tables for each Key. For example
Table1
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Key
[/TD]
[TD]Dim1
[/TD]
[TD]Dim2
[/TD]
[TD]Dim3
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]xxx
[/TD]
[TD]yyy
[/TD]
[TD]zzz
[/TD]
[/TR]
[TR]
[TD]DEF
[/TD]
[TD]aaa
[/TD]
[TD]bbb
[/TD]
[TD]ccc
[/TD]
[/TR]
</tbody>[/TABLE]
Table2
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Key
[/TD]
[TD]Dim1
[/TD]
[TD]Dim2
[/TD]
[TD]Dim3
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]hhh
[/TD]
[TD]jjj
[/TD]
[TD]zzz
[/TD]
[/TR]
[TR]
[TD]DEF
[/TD]
[TD]aaa
[/TD]
[TD]bbb
[/TD]
[TD]ddd
[/TD]
[/TR]
</tbody>[/TABLE]
And I want a resulting table that gives me all instances where the dimensions do not match, like this
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Key
[/TD]
[TD]Dimension
[/TD]
[TD]Table1 Value
[/TD]
[TD]Table2 Value
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]Dim1
[/TD]
[TD]xxx
[/TD]
[TD]hhh
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]Dim2
[/TD]
[TD]yyy
[/TD]
[TD]jjj
[/TD]
[/TR]
[TR]
[TD]DEF
[/TD]
[TD]Dim3
[/TD]
[TD]ccc
[/TD]
[TD]ddd
[/TD]
[/TR]
</tbody>[/TABLE]
To accomplish this, I tried this:
My query is
It works, but is extremely slow (30+ minutes refresh time). Each of Table1 and Table2 have about 100,000 rows, and 20 dimension columns, so the unpivoted version of each would have around 2,000,000 rows.
This seems like it shouldn't be that difficult for PQ to handle, am I doing something wrong here that is hurting performance this much?
Thanks in advance.
Table1
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Key
[/TD]
[TD]Dim1
[/TD]
[TD]Dim2
[/TD]
[TD]Dim3
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]xxx
[/TD]
[TD]yyy
[/TD]
[TD]zzz
[/TD]
[/TR]
[TR]
[TD]DEF
[/TD]
[TD]aaa
[/TD]
[TD]bbb
[/TD]
[TD]ccc
[/TD]
[/TR]
</tbody>[/TABLE]
Table2
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Key
[/TD]
[TD]Dim1
[/TD]
[TD]Dim2
[/TD]
[TD]Dim3
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]hhh
[/TD]
[TD]jjj
[/TD]
[TD]zzz
[/TD]
[/TR]
[TR]
[TD]DEF
[/TD]
[TD]aaa
[/TD]
[TD]bbb
[/TD]
[TD]ddd
[/TD]
[/TR]
</tbody>[/TABLE]
And I want a resulting table that gives me all instances where the dimensions do not match, like this
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Key
[/TD]
[TD]Dimension
[/TD]
[TD]Table1 Value
[/TD]
[TD]Table2 Value
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]Dim1
[/TD]
[TD]xxx
[/TD]
[TD]hhh
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]Dim2
[/TD]
[TD]yyy
[/TD]
[TD]jjj
[/TD]
[/TR]
[TR]
[TD]DEF
[/TD]
[TD]Dim3
[/TD]
[TD]ccc
[/TD]
[TD]ddd
[/TD]
[/TR]
</tbody>[/TABLE]
To accomplish this, I tried this:
- Append tables together
- Unpivot dimensions
- Remove duplicates to get all unique combinations of Key, Dimension
- Merge the unique combinations to Table1 and Table2 to get the values in each table, and compare
My query is
Code:
let
Source = Table.Combine({Table1, Table2}),
#"Grouped Rows" = Table.Group(#"Source", List.Combine({ ColumnsToKeep_Key, {"Source"} }), {{"Volume", each List.Sum([Volume]), type number}, {"Revenue", each List.Sum([Revenue]), type number}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Grouped Rows", {"Key", "Volume", "Revenue"}, "Attribute", "Value"),
#"Table1 Table" = Table.SelectRows(#"Unpivoted Other Columns", each [Source] = "Table1"),
#"Table2 Table" = Table.SelectRows(#"Unpivoted Other Columns", each [Source] = "Table2"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Value", "Source"}),
#"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
#"Merged Table1" = Table.NestedJoin(#"Removed Duplicates", {"Key", "Attribute"}, #"Table1 Table", {"Key", "Attribute"}, "Table1_Table", JoinKind.LeftOuter),
#"Merged Table2" = Table.NestedJoin(#"Merged Table1", {"Key", "Attribute"}, #"Table2 Table", {"Key", "Attribute"}, "Table2_Table", JoinKind.LeftOuter),
#"Expanded Table1_Table" = Table.ExpandTableColumn(#"Merged Table2", "Table1_Table", {"Value"}, {"Table1_Table.Value"}),
#"Expanded Table2_Table" = Table.ExpandTableColumn(#"Expanded Table1_Table", "Table2_Table", {"Value"}, {"Table2_Table.Value"}),
#"Added Value Check" = Table.AddColumn(#"Expanded Table2_Table", "Value Check", each if [Table1_Table.Value] = [Table2_Table.Value] then "OK" else "Check"),
#"Filtered Rows" = Table.SelectRows(#"Added Value Check", each [Value Check] <> "OK"),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Value Check"}),
in
#"Removed Columns1"
It works, but is extremely slow (30+ minutes refresh time). Each of Table1 and Table2 have about 100,000 rows, and 20 dimension columns, so the unpivoted version of each would have around 2,000,000 rows.
This seems like it shouldn't be that difficult for PQ to handle, am I doing something wrong here that is hurting performance this much?
Thanks in advance.
Last edited: