PQ Performance in Comparing Two Tables

cr731

Well-known Member
Joined
Sep 17, 2010
Messages
611
I am using Power Query to compare column values in two tables, where each table has a Key column as well as around 20 dimension columns, which I want to check to make sure that the dimensions align between the two tables for each Key. For example

Table1
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Key
[/TD]
[TD]Dim1
[/TD]
[TD]Dim2
[/TD]
[TD]Dim3
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]xxx
[/TD]
[TD]yyy
[/TD]
[TD]zzz
[/TD]
[/TR]
[TR]
[TD]DEF
[/TD]
[TD]aaa
[/TD]
[TD]bbb
[/TD]
[TD]ccc
[/TD]
[/TR]
</tbody>[/TABLE]

Table2
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Key
[/TD]
[TD]Dim1
[/TD]
[TD]Dim2
[/TD]
[TD]Dim3
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]hhh
[/TD]
[TD]jjj
[/TD]
[TD]zzz
[/TD]
[/TR]
[TR]
[TD]DEF
[/TD]
[TD]aaa
[/TD]
[TD]bbb
[/TD]
[TD]ddd
[/TD]
[/TR]
</tbody>[/TABLE]


And I want a resulting table that gives me all instances where the dimensions do not match, like this

[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Key
[/TD]
[TD]Dimension
[/TD]
[TD]Table1 Value
[/TD]
[TD]Table2 Value
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]Dim1
[/TD]
[TD]xxx
[/TD]
[TD]hhh
[/TD]
[/TR]
[TR]
[TD]ABC
[/TD]
[TD]Dim2
[/TD]
[TD]yyy
[/TD]
[TD]jjj
[/TD]
[/TR]
[TR]
[TD]DEF
[/TD]
[TD]Dim3
[/TD]
[TD]ccc
[/TD]
[TD]ddd
[/TD]
[/TR]
</tbody>[/TABLE]


To accomplish this, I tried this:

  • Append tables together
  • Unpivot dimensions
  • Remove duplicates to get all unique combinations of Key, Dimension
  • Merge the unique combinations to Table1 and Table2 to get the values in each table, and compare

My query is

Code:
let
    Source = Table.Combine({Table1, Table2}),
    #"Grouped Rows" = Table.Group(#"Source", List.Combine({ ColumnsToKeep_Key, {"Source"} }), {{"Volume", each List.Sum([Volume]), type number}, {"Revenue", each List.Sum([Revenue]), type number}}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Grouped Rows", {"Key", "Volume", "Revenue"}, "Attribute", "Value"),
    #"Table1 Table" = Table.SelectRows(#"Unpivoted Other Columns", each [Source] = "Table1"),
    #"Table2 Table" = Table.SelectRows(#"Unpivoted Other Columns", each [Source] = "Table2"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Value", "Source"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Columns"),
    #"Merged Table1" = Table.NestedJoin(#"Removed Duplicates", {"Key", "Attribute"}, #"Table1 Table", {"Key", "Attribute"}, "Table1_Table", JoinKind.LeftOuter),
    #"Merged Table2" = Table.NestedJoin(#"Merged Table1", {"Key", "Attribute"}, #"Table2 Table", {"Key", "Attribute"}, "Table2_Table", JoinKind.LeftOuter),
    #"Expanded Table1_Table" = Table.ExpandTableColumn(#"Merged Table2", "Table1_Table", {"Value"}, {"Table1_Table.Value"}),
    #"Expanded Table2_Table" = Table.ExpandTableColumn(#"Expanded Table1_Table", "Table2_Table", {"Value"}, {"Table2_Table.Value"}),
    #"Added Value Check" = Table.AddColumn(#"Expanded Table2_Table", "Value Check", each if [Table1_Table.Value] = [Table2_Table.Value] then "OK" else "Check"),
    #"Filtered Rows" = Table.SelectRows(#"Added Value Check", each [Value Check] <> "OK"),
    #"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Value Check"}),
in
    #"Removed Columns1"

It works, but is extremely slow (30+ minutes refresh time). Each of Table1 and Table2 have about 100,000 rows, and 20 dimension columns, so the unpivoted version of each would have around 2,000,000 rows.

This seems like it shouldn't be that difficult for PQ to handle, am I doing something wrong here that is hurting performance this much?



Thanks in advance.
 
Last edited:

Excel Facts

Convert text numbers to real numbers
Select a column containing text numbers. Press Alt+D E F to quickly convert text to numbers. Faster than "Convert to Number"
Try this:

Table2:

Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Key", type text}, {"Dim1", type text}, {"Dim2", type text}, {"Dim3", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "TableName", each "Table2")
in
    #"Added Custom"

Table1:

Code:
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Key", type  text}, {"Dim1", type text}, {"Dim2", type text}, {"Dim3", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "TableName", each "Table1"),
    #"Appended Query" = Table.Combine({#"Added Custom", Table2}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Appended Query", {"Key", "TableName"}, "Attribute", "Value"),
    #"Pivoted Column" = Table.Pivot(#"Unpivoted Columns", List.Distinct(#"Unpivoted Columns"[TableName]), "TableName", "Value"),
    #"Added Custom1" = Table.AddColumn(#"Pivoted Column", "ConditionSign", each if [Table1] =[Table2] then 0 else 1),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([ConditionSign] = 1)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"ConditionSign"})
in
    #"Removed Columns"
 
Upvote 0
Hi
It is a little shorter.
Code:
let    
    table1 = Excel.CurrentWorkbook(){[Name="table1"]}[Content],
    table2 = Excel.CurrentWorkbook(){[Name="table2"]}[Content],
    unpTable1 = Table.UnpivotOtherColumns(table1, {"Key"}, "Dimension", "Table1 Value"),
    unpTable2 = Table.UnpivotOtherColumns(table2, {"Key"}, "Dimension", "Table2 Value"),
    joined = Table.Join(unpTable1, {"Key", "Dimension"}, unpTable2, {"Key", "Dimension"}),
    result = Table.SelectRows(joined, each [#"Table1 Value"] <> [#"Table2 Value"])
in
    result
Regards,
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,898
Messages
6,175,272
Members
452,628
Latest member
dd2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top