I am struggling with a Power Query performance problem and have exhausted all of my ideas, so I am hoping someone here may have an idea.
I have two tables:
- Fact table containing sales data: approximately 200,000 rows. The relevant fields for this question include Item Number and Period (of sale).
- Item cost table containing Item Number, Period, and Cost. The cost table may not have a cost for every period for every item. Approximately 350,000 rows.
My goal is: in the Fact table, join the cost table to return the cost of the item for the period of sale. The challenge is, if a cost doesn’t exist for the item in that period, then it should return the cost for the most recent period (that is before the period of sale), looking back up to 6 months.
Here are examples:
Cost table
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Item Number
[/TD]
[TD="width: 208, bgcolor: transparent"]Period
[/TD]
[TD="width: 208, bgcolor: transparent"]Cost
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]A
[/TD]
[TD="width: 208, bgcolor: transparent"]201812
[/TD]
[TD="width: 208, bgcolor: transparent"]1
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]A
[/TD]
[TD="width: 208, bgcolor: transparent"]201811
[/TD]
[TD="width: 208, bgcolor: transparent"]2
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]A
[/TD]
[TD="width: 208, bgcolor: transparent"]201810
[/TD]
[TD="width: 208, bgcolor: transparent"]3
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]B
[/TD]
[TD="width: 208, bgcolor: transparent"]201809
[/TD]
[TD="width: 208, bgcolor: transparent"]4
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]B
[/TD]
[TD="width: 208, bgcolor: transparent"]201807
[/TD]
[TD="width: 208, bgcolor: transparent"]5
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]B
[/TD]
[TD="width: 208, bgcolor: transparent"]201805
[/TD]
[TD="width: 208, bgcolor: transparent"]6
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]C
[/TD]
[TD="width: 208, bgcolor: transparent"]201801
[/TD]
[TD="width: 208, bgcolor: transparent"]7
[/TD]
[/TR]
</tbody>[/TABLE]
Fact table
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Item Number
[/TD]
[TD="width: 156, bgcolor: transparent"]Period
[/TD]
[TD="width: 156, bgcolor: transparent"]Cost (Desired Result)
[/TD]
[TD="width: 156, bgcolor: transparent"]Explanation
[/TD]
[/TR]
[TR]
[TD="width: 156, bgcolor: transparent"]A
[/TD]
[TD="width: 156, bgcolor: transparent"]201812
[/TD]
[TD="width: 156, bgcolor: transparent"]1
[/TD]
[TD="width: 156, bgcolor: transparent"]Item has a cost in 201812, so return that
[/TD]
[/TR]
[TR]
[TD="width: 156, bgcolor: transparent"]B
[/TD]
[TD="width: 156, bgcolor: transparent"]201808
[/TD]
[TD="width: 156, bgcolor: transparent"]5
[/TD]
[TD="width: 156, bgcolor: transparent"]Item has no cost in 201808, so return most recent period that is before 201808 (which is 201807).
[/TD]
[/TR]
[TR]
[TD="width: 156, bgcolor: transparent"]C
[/TD]
[TD="width: 156, bgcolor: transparent"]201810
[/TD]
[TD="width: 156, bgcolor: transparent"]Null
[/TD]
[TD="width: 156, bgcolor: transparent"]No cost exists within the past 6 months for this item.
[/TD]
[/TR]
</tbody>[/TABLE]
My attempts include:
Attempt 1
Merge on item only and then add a column to filter the merged table as follows:
In this case, in my fact table, I created a helper column “Period 6 Month Back” which is the period representing 6 months ago. This would filter the cost table for only periods prior/equal to period of sale and greater than 6 months prior. I use Table.First to get just the most recent entry (the cost table is sorted period descending).
Performance using this approach was very poor (10-15 minute refresh). I tried using Table.Buffer in various places and nothing helped.
Attempt 2
Create 6 helper columns in fact table representing each month that a cost could potentially be returned, merge to the cost table 6 times for each period, expand all results, and then return first non null.
This performed similarly poorly to attempt 1.
Is there any solution to this that would have better performance or is Power Query just not the right tool for this?
Thanks
I have two tables:
- Fact table containing sales data: approximately 200,000 rows. The relevant fields for this question include Item Number and Period (of sale).
- Item cost table containing Item Number, Period, and Cost. The cost table may not have a cost for every period for every item. Approximately 350,000 rows.
My goal is: in the Fact table, join the cost table to return the cost of the item for the period of sale. The challenge is, if a cost doesn’t exist for the item in that period, then it should return the cost for the most recent period (that is before the period of sale), looking back up to 6 months.
Here are examples:
Cost table
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Item Number
[/TD]
[TD="width: 208, bgcolor: transparent"]Period
[/TD]
[TD="width: 208, bgcolor: transparent"]Cost
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]A
[/TD]
[TD="width: 208, bgcolor: transparent"]201812
[/TD]
[TD="width: 208, bgcolor: transparent"]1
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]A
[/TD]
[TD="width: 208, bgcolor: transparent"]201811
[/TD]
[TD="width: 208, bgcolor: transparent"]2
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]A
[/TD]
[TD="width: 208, bgcolor: transparent"]201810
[/TD]
[TD="width: 208, bgcolor: transparent"]3
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]B
[/TD]
[TD="width: 208, bgcolor: transparent"]201809
[/TD]
[TD="width: 208, bgcolor: transparent"]4
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]B
[/TD]
[TD="width: 208, bgcolor: transparent"]201807
[/TD]
[TD="width: 208, bgcolor: transparent"]5
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]B
[/TD]
[TD="width: 208, bgcolor: transparent"]201805
[/TD]
[TD="width: 208, bgcolor: transparent"]6
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]C
[/TD]
[TD="width: 208, bgcolor: transparent"]201801
[/TD]
[TD="width: 208, bgcolor: transparent"]7
[/TD]
[/TR]
</tbody>[/TABLE]
Fact table
[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Item Number
[/TD]
[TD="width: 156, bgcolor: transparent"]Period
[/TD]
[TD="width: 156, bgcolor: transparent"]Cost (Desired Result)
[/TD]
[TD="width: 156, bgcolor: transparent"]Explanation
[/TD]
[/TR]
[TR]
[TD="width: 156, bgcolor: transparent"]A
[/TD]
[TD="width: 156, bgcolor: transparent"]201812
[/TD]
[TD="width: 156, bgcolor: transparent"]1
[/TD]
[TD="width: 156, bgcolor: transparent"]Item has a cost in 201812, so return that
[/TD]
[/TR]
[TR]
[TD="width: 156, bgcolor: transparent"]B
[/TD]
[TD="width: 156, bgcolor: transparent"]201808
[/TD]
[TD="width: 156, bgcolor: transparent"]5
[/TD]
[TD="width: 156, bgcolor: transparent"]Item has no cost in 201808, so return most recent period that is before 201808 (which is 201807).
[/TD]
[/TR]
[TR]
[TD="width: 156, bgcolor: transparent"]C
[/TD]
[TD="width: 156, bgcolor: transparent"]201810
[/TD]
[TD="width: 156, bgcolor: transparent"]Null
[/TD]
[TD="width: 156, bgcolor: transparent"]No cost exists within the past 6 months for this item.
[/TD]
[/TR]
</tbody>[/TABLE]
My attempts include:
Attempt 1
Merge on item only and then add a column to filter the merged table as follows:
Code:
Table.First(Table.SelectRows([Cost],(x) => x[Period] <= [Period] and x[Period] >= [Period 6 Month Back]))
In this case, in my fact table, I created a helper column “Period 6 Month Back” which is the period representing 6 months ago. This would filter the cost table for only periods prior/equal to period of sale and greater than 6 months prior. I use Table.First to get just the most recent entry (the cost table is sorted period descending).
Performance using this approach was very poor (10-15 minute refresh). I tried using Table.Buffer in various places and nothing helped.
Attempt 2
Create 6 helper columns in fact table representing each month that a cost could potentially be returned, merge to the cost table 6 times for each period, expand all results, and then return first non null.
This performed similarly poorly to attempt 1.
Is there any solution to this that would have better performance or is Power Query just not the right tool for this?
Thanks