DAX Iterator Efficiency: Formula or Measure in Second Argument of AVERAGEX?

mgirvin

Well-known Member
Joined
Dec 15, 2005
Messages
1,248
Office Version
  1. 365
Platform
  1. Windows
Note: I recently posted my DAX Formula question at the Microsoft Power BI Forums (https://community.powerbi.com/t5/Forums/ct-p/Forums) and had a really bad experience (my post was marked as spam and I could not get help with the issue)... So I am going to try here at mrexcel.com, where I have had great experiences over the years : )

DAX Iterator Efficiency: Formula or Measure in Second Argument of AVERAGEX?

Dear Team,

I have a Fact Table named fTransactions with 3,360 rows.

I have created this Measure to sum the Transactional Line Item Revenue from a Fact Table:

Measure [1] is as follows:

Total Revenue :=
SUMX (
fTransactions,
ROUND ( RELATED ( dProduct[RetailPrice] ) * fTransactions[Units], 2 )
)

Next, I would like to create a Measure to average the Transactional Line Item Revenue from a Fact Table. But my question is: “Should I use a Measure in the second argument of AVERAGEX or repeat the formula?” The two measures I am considering are listed here:

Measure [2] is as follows:

Ave Transactional Rev 2nd is Formula :=
AVERAGEX (
fTransactions,
ROUND ( RELATED ( dProduct[RetailPrice] ) * fTransactions[Units], 2 )
)

Measure [3] is as follows:

Ave Transactional Rev 2nd is Measure :=
AVERAGEX ( fTransactions, [Total Revenue] )

I am asking this question because it seems to me that in Measure [3] there are two iterations over the Fact Table, and maybe this is inefficient?

When I run the two formulas in DAX Studio to time and look at query plan (I am a novice at reading all this), I can see:

Measure [2]
  1. Physical Query Plan has 16 lines
  2. The largest number of records in the Physical Plan is 1
  3. For the Server Timings: FE = 1 ms and SE = 1 ms
  4. In the Server Query Tab I see one Query

Measure [3]
  1. Physical Query Plan has 19 lines
  2. The largest number of records in the Physical Plan is 3,360
  3. For the Server Timings: FE = 2 ms and SE = 2 ms
  4. In the Server Query Tab I see two Queries


There are other details in DAX Studio too. But it seems to me like the DAX Studio information is suggesting that Measure [3] with the Measure in the second argument of AVERAGEX has to work harder.

Any ideas that can help me to understand more completely which version to use for calculating the average of the Transactional Line Item Revenue? Measure [2] or Measure [3]?
 

Excel Facts

What is the last column in Excel?
Excel columns run from A to Z, AA to AZ, AAA to XFD. The last column is XFD.
Okay, I guess I answered my own question. In the first post I was comparing Measure [2] and [3] using a 3000 row table. I just tried the comparison on a 2 million row table, and the DAX Studio information clearly shows that the Measure [3] takes much longer to calculate and has to internally materialize larger tables. So it looks like Measure [3] is having to work harder. Further, if I do not want the risk of double counting because of the Contact Transition converting the Row Context into Filter Context when there are duplicates or no primary key, then it would make sense to use Measure [2], the one with the formula in the second argument rather than the Measure.
 
Last edited:
Upvote 0
Hi Mike. The general rule is to try not to iterate a fact table if you have other options (edit:and definitely avoid unnecessary context transitions). In measure 1 you would be much better off with something like

SUMX(dProducts,dProduct[price] * calculate(sum(trans[units])))

this formula only needs to iterate through the product table (which is presumably smaller than the fact table). You need the CALCULATE to trigger context transition for each row of the product table.

In fact, if you have many products that are the same price, it is even better to write something like this

SUMX(values(dProducts[price]),dProduct[price] * calculate(sum(trans[units])))

this formula only iterates through the unique prices. If there is a relatively small number of unique prices compared to the number of products, then this will be more efficient.

Regarding your other questions about measures 2 and 3, you should definitely avoid 3 and favour 2. The reason is that the implicit calculate around measure 3 causes context transition for every iteration of the outer AVERAGEX. But once again you should avoid iterating over a fact table if you can. So how about something like this?

SUMX(values(dProducts[price]),dProduct[price] * calculate(sum(trans[units])))/COUNTROWS(fTransactions)


Rgds

Matt
 
Last edited:
Upvote 0
Dear Matt Allington,

Thank you very much for the hot tips! I just have never dealt with more than 10 million rows of data and so I had not thought to iterate over the product table like you suggest when trying to calculate total revenue or average transactional revenue.

For Total Revenue, I tried these two Measures:

[1] Total Gross Revenue Iterate Over Fact := SUMX(fTransactions,RELATED(dProduct[RetailPrice])*fTransactions[UnitsSold])

[2] Total Gross Revenue Iterate Over Product := SUMX(dProduct,dProduct[RetailPrice]*CALCULATE(SUM(fTransactions[UnitsSold])))


On a small data set (3000 rows) there was not much difference in DAX Studio time and the xmSQL code was Exactly The Same!!!! But, on a 2 million row data set, there was a slight time advantage to Measure [2] , and the xmSQL code was not the same.


For Average Transactional Revenue, I tried these three Measures:

[3] Ave Gross Transaction Rev No AVEX := SUMX(dProduct,dProduct[RetailPrice]*CALCULATE(SUM(fTransactions[UnitsSold])))/COUNTROWS(fTransactions)

[4] Ave Gross Transaction Rev AVEX F := AVERAGEX(fTransactions,RELATED(dProduct[RetailPrice])*fTransactions[UnitsSold])

[5] Ave Gross Transaction Rev AVEX M := AVERAGEX(fTransactions,[Total Gross Revenue Iterate Over Fact])


for the 2 million row data set, Measure [3] was consistently twice as fast as Measure [4] , and Measure [5] took many, many times longer than either of the other formulas!


Thanks so much for the help Matt Allington : )

Sincerely, Mike Girvin
 
Upvote 0
No worries Mike. Marco told me that if the timing test returns a result of less than 100ms, then you would not put too much care into the results, as such fast queries can vary in length significantly on different execution runs. It is only really in large tables that you see material reproducible differences.

Did you try the VALUES(product[price]) variant? I would be interested in the comparison to measure 3
 
Upvote 0
I did not try VALUES because there were no duplicate prices. I will have to try a data set with duplicate prices soon, to see how to goes.

Thanks again, Matt Allington : )
 
Upvote 0

Forum statistics

Threads
1,225,741
Messages
6,186,761
Members
453,370
Latest member
juliewar

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top