Finding most recent entry with criteria in Power Query

cr731

Well-known Member
Joined
Sep 17, 2010
Messages
611
I am struggling with a Power Query performance problem and have exhausted all of my ideas, so I am hoping someone here may have an idea.

I have two tables:
- Fact table containing sales data: approximately 200,000 rows. The relevant fields for this question include Item Number and Period (of sale).
- Item cost table containing Item Number, Period, and Cost. The cost table may not have a cost for every period for every item. Approximately 350,000 rows.

My goal is: in the Fact table, join the cost table to return the cost of the item for the period of sale. The challenge is, if a cost doesn’t exist for the item in that period, then it should return the cost for the most recent period (that is before the period of sale), looking back up to 6 months.

Here are examples:

Cost table

[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Item Number

[/TD]
[TD="width: 208, bgcolor: transparent"]Period
[/TD]
[TD="width: 208, bgcolor: transparent"]Cost
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]A

[/TD]
[TD="width: 208, bgcolor: transparent"]201812
[/TD]
[TD="width: 208, bgcolor: transparent"]1
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]A

[/TD]
[TD="width: 208, bgcolor: transparent"]201811
[/TD]
[TD="width: 208, bgcolor: transparent"]2
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]A

[/TD]
[TD="width: 208, bgcolor: transparent"]201810
[/TD]
[TD="width: 208, bgcolor: transparent"]3
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]B
[/TD]
[TD="width: 208, bgcolor: transparent"]201809
[/TD]
[TD="width: 208, bgcolor: transparent"]4
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]B
[/TD]
[TD="width: 208, bgcolor: transparent"]201807
[/TD]
[TD="width: 208, bgcolor: transparent"]5
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]B
[/TD]
[TD="width: 208, bgcolor: transparent"]201805
[/TD]
[TD="width: 208, bgcolor: transparent"]6
[/TD]
[/TR]
[TR]
[TD="width: 208, bgcolor: transparent"]C
[/TD]
[TD="width: 208, bgcolor: transparent"]201801
[/TD]
[TD="width: 208, bgcolor: transparent"]7

[/TD]
[/TR]
</tbody>[/TABLE]


Fact table

[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Item Number

[/TD]
[TD="width: 156, bgcolor: transparent"]Period
[/TD]
[TD="width: 156, bgcolor: transparent"]Cost (Desired Result)
[/TD]
[TD="width: 156, bgcolor: transparent"]Explanation
[/TD]
[/TR]
[TR]
[TD="width: 156, bgcolor: transparent"]A
[/TD]
[TD="width: 156, bgcolor: transparent"]201812
[/TD]
[TD="width: 156, bgcolor: transparent"]1
[/TD]
[TD="width: 156, bgcolor: transparent"]Item has a cost in 201812, so return that
[/TD]
[/TR]
[TR]
[TD="width: 156, bgcolor: transparent"]B
[/TD]
[TD="width: 156, bgcolor: transparent"]201808
[/TD]
[TD="width: 156, bgcolor: transparent"]5
[/TD]
[TD="width: 156, bgcolor: transparent"]Item has no cost in 201808, so return most recent period that is before 201808 (which is 201807).
[/TD]
[/TR]
[TR]
[TD="width: 156, bgcolor: transparent"]C
[/TD]
[TD="width: 156, bgcolor: transparent"]201810
[/TD]
[TD="width: 156, bgcolor: transparent"]Null
[/TD]
[TD="width: 156, bgcolor: transparent"]No cost exists within the past 6 months for this item.
[/TD]
[/TR]
</tbody>[/TABLE]


My attempts include:

Attempt 1

Merge on item only and then add a column to filter the merged table as follows:

Code:
Table.First(Table.SelectRows([Cost],(x) => x[Period] <= [Period] and x[Period] >= [Period 6 Month Back]))

In this case, in my fact table, I created a helper column “Period 6 Month Back” which is the period representing 6 months ago. This would filter the cost table for only periods prior/equal to period of sale and greater than 6 months prior. I use Table.First to get just the most recent entry (the cost table is sorted period descending).

Performance using this approach was very poor (10-15 minute refresh). I tried using Table.Buffer in various places and nothing helped.

Attempt 2

Create 6 helper columns in fact table representing each month that a cost could potentially be returned, merge to the cost table 6 times for each period, expand all results, and then return first non null.

This performed similarly poorly to attempt 1.

Is there any solution to this that would have better performance or is Power Query just not the right tool for this?

Thanks
 

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.

Forum statistics

Threads
1,223,903
Messages
6,175,284
Members
452,630
Latest member
OdubiYouth

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top