# SUMX: Count unique records in a table (ignoring blanks blank rows)



## circledchicken

Hi PowerPivot experts,

I am beginning to learn how to use PowerPivot and this question is based on the last 'Useful' example in this post:
SUMX() – The 5-point palm, exploding fxn technique « PowerPivotPro

I have a table named Table1 that looks like this:


AB1ProductStore2Apples13Oranges4Pears15Apples16728Apples1

<tbody>

</tbody>*Excel 2010*

I have two measures defined as follows:


Count of Stores:
=DISTINCTCOUNT(Table1[Store])
UniqueCombinations:
=SUMX(DISTINCT(Table1[Product]), [Count of Stores])
In my PivotTable, the only element is the 'UniqueCombinations' measure in the Values section. Currently the value of the measure is 5 corresponding to these rows:


AB1ProductStore2Apples13Oranges4Pears1672

<tbody>

</tbody>My question is, how do I adjust the measures to exclude any rows where *all* the corresponding columns are blank? I would like my UniqueCombinations measure to return 4 in this case (excluding row 6).

I know I could create a calculated column for example and then count non-blanks on that (using something like =COUNTAX( DISTINCT( Table1[CalcColumn] ), Table1[CalcColumn] ). However I would like to learn how to do this using purely measures if possible.

Many thanks for your time.


----------



## miguel.escobar

=CALCULATE(DISTINCTCOUNT(Table1[Product]), FILTER(Table1,[Product] <> BLANK()))

The formula above its basically saying that I want to have a distinct count of the items in column name Product and then there's a filter that states that I don't want to count the blanks, just the ones that are non-blank..

Below my Example table:

ProductStoreApples1Oranges2Pears22

<colgroup><col><col></colgroup><tbody>

</tbody>
Hope this helps.

Note: there are other ways of calculating this but that's the first solution that I could come up with.

Best!
Miguel


----------



## miguel.escobar

My bad...I needed to re-read what you were actually aiming at. The correct formula is:
=CALCULATE(DISTINCTCOUNT([Product]),FILTER(Table1,
          COUNTA([Store]) <> 0 && COUNTA([Product]) <> 0)
)


It's saying that I'd like to calculate a distinct count of the column [Product] but first filtering the table and only counting the records that are not completely blank hence the
COUNTA([Store]) <> 0 && COUNTA([Product]) <> 0

This should do the job.

Best!
Miguel


----------



## miguel.escobar

MAN! you really had me thinking on this one! =D, I really enjoyed this one.

I'm working on the final cut


----------



## miguel.escobar

=IF(CALCULATE(COUNTBLANK([Product]),FILTER(Table1,[Store]<>BLANK()))>=1,1,0)+ CALCULATE(DISTINCTCOUNT([Product]),FILTER(Table1,[Product]<>BLANK()))

(blanks on product but that are not blanks on store + nonblanks on product)

take in consideration that once the:
blanks on product but that are not blanks on store

case appears, I use the IF to just sum 1 and not count all of those cases.

Miguel


----------



## circledchicken

miguel.escobar said:


> =IF(CALCULATE(COUNTBLANK([Product]),FILTER(Table1,[Store]<>BLANK()))>=1,1,0)+ CALCULATE(DISTINCTCOUNT([Product]),FILTER(Table1,[Product]<>BLANK()))
> 
> (blanks on product but that are not blanks on store + nonblanks on product)
> 
> take in consideration that once the:
> blanks on product but that are not blanks on store
> 
> case appears, I use the IF to just sum 1 and not count all of those cases.
> 
> Miguel


Hi Miguel

Thankyou for your help with this, much appreciated (and I'm glad you enjoyed it )!

I need to learn more about how CALCULATE works, but I don't think that last version is completely correct, because for a case like this:


AB1ProductStore2Apples13Oranges4Pears15Apples162738Apples1

<tbody>

</tbody>
It would return 4, but I need it to return 5 corresponding to the following:


AB1ProductStore2Apples13Oranges4Pears16273

<tbody>

</tbody>


----------



## miguel.escobar

At last! I finally get it.

You want to have a distinct count of the actual combination of column A(Product) and Column B(Store).
=COUNTROWS(SUMMARIZE(Table1,[Product],[Store],"String", CONCATENATE(Table1[Product],Table1[Store])))-IF(AND(COUNTBLANK([Store])>=1,COUNTBLANK([Product])>=1),1,0)

That's quite easy actually. The approach that I like to use is the SUMMARIZE one which is basically creating a select query within my tables to give me another tables that looks like this

Product (all grouped) - Store (All grouped) - Concatenation of both

what we mean by grouped it's that its only showing the distinct labels for those instance. 
Once we actually have the SUMMARIZE thing going on, all we have to do it's have a COUNTROWS of those and since you don't want to count the instances where both columns are blank that's why I added the subtraction with
IF(AND(COUNTBLANK([Store])>=1,COUNTBLANK([Product])>=1),1,0)

where, in the event that there's an instance where both columns are blank, it will subtract 1 from the count of the total rows in the summarize.

Let me know if this formula works for you.

Miguel


----------



## circledchicken

Hi Miguel,

Thanks again for having a look. Unfortunately, it still doesn't work quite as expected. 

For example, with the table below the measure should return 5 (rows 2,3,4,6 and 7), but instead it returns 4:


AB1ProductStore2Apples13Oranges4Pears15Apples16Banana728Apples1

<thead>

</thead><tbody>

</tbody>
I only asked this question because I'm beginning to learn how to use PowerPivot - I have no real need for a solution at the moment so please feel free to ignore it if you want! 

Thanks for all the help you have offered so far though, much appreciated.


----------



## miguel.escobar

it's personal now...this formula won't beat me 
The summarize function is doing the job but the other part of the formula is not and its because we don't wanna count the blank+blank combination.


----------



## AlbertoFerrari

Hi eveybody, would you like to try out this formula?


		Code:
__


=SUMX (
    VALUES (Table1[Product]),
    IF (
        ISBLANK ( Table1[Product] ),
        CALCULATE (
            DISTINCTCOUNT ( Table1[Store] ),
            NOT ( ISBLANK ( Table1[Store] ))
        ),
        CALCULATE (
            DISTINCTCOUNT ( Table1[Store] )
        )
    )
)

Seems to compute the right value, and it is pretty easy to understand, even if it uses the CALCULATE inside SUMX, which is the only tricky part.

Alberto


----------



## circledchicken

Hi PowerPivot experts,

I am beginning to learn how to use PowerPivot and this question is based on the last 'Useful' example in this post:
SUMX() – The 5-point palm, exploding fxn technique « PowerPivotPro

I have a table named Table1 that looks like this:


AB1ProductStore2Apples13Oranges4Pears15Apples16728Apples1

<tbody>

</tbody>*Excel 2010*

I have two measures defined as follows:


Count of Stores:
=DISTINCTCOUNT(Table1[Store])
UniqueCombinations:
=SUMX(DISTINCT(Table1[Product]), [Count of Stores])
In my PivotTable, the only element is the 'UniqueCombinations' measure in the Values section. Currently the value of the measure is 5 corresponding to these rows:


AB1ProductStore2Apples13Oranges4Pears1672

<tbody>

</tbody>My question is, how do I adjust the measures to exclude any rows where *all* the corresponding columns are blank? I would like my UniqueCombinations measure to return 4 in this case (excluding row 6).

I know I could create a calculated column for example and then count non-blanks on that (using something like =COUNTAX( DISTINCT( Table1[CalcColumn] ), Table1[CalcColumn] ). However I would like to learn how to do this using purely measures if possible.

Many thanks for your time.


----------



## miguel.escobar

Welcome to the party, Alberto!

I was aiming at a formula without the need for the SUMX but the iteration would need to be row by row and the easiest way to do it seems to be the sumx


		Code:
__


=SUMX(
                    SUMMARIZE(Table1,[Product],[Store],"Sum", 
                                    IF(CONCATENATE(Table1[Product],Table1[Store])=BLANK(),
                                                             BLANK(),1)
                                                             )
                     ,[Sum]
                    )


that should read as follows, I'd like to have a SUMX of the new table made with the SUMMARIZE on the colum "Sum" and this column has either the value 1 on the every record or 0 in the event that the combination of product&store is blank.


----------



## AlbertoFerrari

Miguel,
From a performance point of view, putting the IF inside SUMMARIZE would result in the formula engine kicking in during evaluation and do a lot of work, because the storage engine cannot resolve IF expression. Moreover, using CONCATENATE you are preventing the optimizer to understand what you are doing, using a simpler boolean condition would make the optimizer life much easier. 
I have not looked at the query plan and, in my previous formula, I was more concerned with clarity than performance, but I guess that the top performer for such a formula would be something like this:


		Code:
__


=SUMX (
    FILTER (
        CROSSJOIN (
            VALUES (Table1[Product]),
            VALUES (Table1[Store])
        ),
        NOT (
            ISBLANK ( Table1[Store] ) && 
            ISBLANK ( Table1[Product] )
        )
    ),
    CALCULATE (
        DISTINCTCOUNT ( Table1[Store] )
    )
)

This query does not use formula engine, apart from the filter on the CROSSJOIN, which should result in a small table, and relies on code that can be executed in VertiPaq. This has a significant effect on the usage of cache too. The drawback of this code is that it is much harder to read, until you used DAX for a while. I have to say that I read much better this latter formula than the previous one.
As I said, I have not checked the query plan, so I am not 100% sure about performance, but my guess is that this latter formula would perform much better than all of the previous ones.

Alberto


----------



## miguel.escobar

Alberto,

That formula just throw this party out the window! REALLY good approach and yes, I agree with you that this last formula it's much easier to read than the previous one and it's basically doing what I wanted to do with the summarize, which is joining both columns and make a distinct count of those without making a count of the result where store&product is blank.

Thanks, Alberto!
Miguel


----------



## Laurent C

My spider-sense triggers whenever I see a DISTINCTCOUNT within a SUMX.
If we only had these rows
Product Store
Apple   1
Pears   1
The sum of the distinct number of stores for product would be 2, even though there would only one store. EDIT: Would such a measure make sense?
@Alberto: your second formula does not quite match the first one you provided. Whereas it provides the same grand-totals and sub-totals, the result will be 1 for the case Product and Store are both blank.


----------



## AlbertoFerrari

Laurent, I agree with you that the formula does not seem to have a lot of sense, but it looks like what was requested. 
And... yes, the border case of everything blank might be different, thanks for noting this. Anyway, I wanted to show a technique, then you need to adapt it to suite your specific needs and there are always big chances that results from my formulas are not perfect, because i seldom test them when writing answers. The interesting part of the formula is the filtering of a CROSSJOIN. The first time I saw this used by the dev team members I was amazed by the power of this feature.
A formula that has more sense is the following (not tested!):



		Code:
__


=CALCULATE (
    DISTINCTCOUNT ( Table1[Store] ),
    FILTER (
        CROSSJOIN (
            VALUES (Table1[Product]),
            VALUES (Table1[Store])
        ),
        NOT (
            ISBLANK ( Table1[Store] ) && 
            ISBLANK ( Table1[Product] )
        )
    )
)


But it computes a different number... I don't really know the business rule underneath the request, it all depends from what the original user wanted. 

Finally, it is worth for anybody using FILTER over a CROSSJOIN, to learn the KEEPFILTERS function too. It is a really hard function to digest, in my personal opinion it is the hardest function in DAX, not because of its semantic but because you need to learn when to use it. I wrote a blog post some months ago (Alberto Ferrari : KEEPFILTERS: a new DAX feature to correctly compute over arbitrary shaped sets) about KEEPFILTERS, but it only covers what I learned at that point. There are more considerations that I still had not time to put in a blog post but, roughly speaking, whenever you create a filter on more than one column and then you iterate over one of the columns, context transition can really drive you crazy and KEEPFILTERS is your (only) friend at that point.

Alberto


----------



## circledchicken

Wow! miguel.escobar, AlbertoFerrari and Laurent C - you are awesome! 

Thankyou so much for your replies, comments and suggestions. I barely understand the DAX language at the moment and your explanations are very valuable and much appreciated.



AlbertoFerrari said:


> But it computes a different number... I don't really know the business rule underneath the request, it all depends from what the original user wanted.


As per post 8, I was just trying to learn and experimenting with PowerPivot, and this was a problem I was unsure of how to solve. The underlying idea was basically:

I want a measure that returns the count of unique records in a table (in this case the two-column Product and Store table), ignoring any cases where *both* columns are blank. I wanted to do this with a measure, but some simple equivalent ways of expressing what I was trying to get are:

Using Advanced Filter for unique records only on the whole table in normal Excel, and then counting the resulting unique entries, excluding any resulting row where both column entries were blank.
Creating a new column where both columns in the table are concatenated, then doing a unique count on that excluding any resulting blank row.
I know there shouldn't be duplicate records in the dataset in the first place and I should get rid of them first, but was just wondering if there was a way of getting a DAX measure to do this count.

Alberto - both your suggestions (post #10 and post #12) seem to do exactly what I had asked. miguel - your latest one also seems to do exactly what I had asked.

Laurent - I couldn't quite figure out the case where Alberto's second formula (post #12) would deliver a different result from the first (post #10). I tried it on my example table, with a single row where the Product and Store columns were both blank and both of the suggested formulas returned nothing as expected. They also both returned the correct amounts for various other sample entries in my table.

Many thanks again to you all for your time and for helping me!


----------



## miguel.escobar

some people go by the motto
"Excel is Fun" - Excelisfun

but nowadays, people are using more:
"Powerpivot is FUN!" - Powerpivot users

I hope that our post encourage you to use more Powerpivot and you can be certain that this community is extremely active.


----------



## Laurent C

If you put Product and Store on rows the formula in post #10 will return 1 for the combination (blank, blank). The formula in post #12 returns blank.
As already said totals and sub-totals look the same, so they only differ at the detail level.
The following part in formula from post #10 seems to cause the unexpected results:
CALCULATE (
            DISTINCTCOUNT ( Table1[Store] );
            NOT ( ISBLANK ( Table1[Store] ))
        )
It will return the expected result (blank) when applied to the combination (Oranges, blank), but will return 1 when applied to the (blank, blank) combination.
Strangely, previous formula and the following one will both return the same result (1), whenever Product is blank:
CALCULATE (
            DISTINCTCOUNT ( Table1[Store] );
            ISBLANK ( Table1[Store] )
        )
EDIT: the last statement is not exactly true, if you had a 3rd column (like a RowID) you could have different results depending on whether this third column is in the pivot table or not. Glitch?


----------



## Laurent C

So, I made some more testing and found out what the reason is. If you had another row in your data with Product = blank and Store = 3, the following measure would return the result 2
CALCULATE (
 DISTINCTCOUNT ( Table1[Store] );
 NOT ( ISBLANK ( Table1[Store] ))
 )
This clearly indicates what the problem is: NOT ( ISBLANK ( Table1[Store] )) will overwrite the current context filter not just add to it.


----------



## AlbertoFerrari

Pretty cool finding, you're right.

This should make the trick.

CALCULATE (
    DISTINCTCOUNT ( Table1[Store] );
    NOT ( ISBLANK ( Table1[Store] )),
    VALUES ( Table1[Store )
 )


I still believe the other formula is much better, at least it does not suffer from these context interactions.


----------



## circledchicken

Hi PowerPivot experts,

I am beginning to learn how to use PowerPivot and this question is based on the last 'Useful' example in this post:
SUMX() – The 5-point palm, exploding fxn technique « PowerPivotPro

I have a table named Table1 that looks like this:


AB1ProductStore2Apples13Oranges4Pears15Apples16728Apples1

<tbody>

</tbody>*Excel 2010*

I have two measures defined as follows:


Count of Stores:
=DISTINCTCOUNT(Table1[Store])
UniqueCombinations:
=SUMX(DISTINCT(Table1[Product]), [Count of Stores])
In my PivotTable, the only element is the 'UniqueCombinations' measure in the Values section. Currently the value of the measure is 5 corresponding to these rows:


AB1ProductStore2Apples13Oranges4Pears1672

<tbody>

</tbody>My question is, how do I adjust the measures to exclude any rows where *all* the corresponding columns are blank? I would like my UniqueCombinations measure to return 4 in this case (excluding row 6).

I know I could create a calculated column for example and then count non-blanks on that (using something like =COUNTAX( DISTINCT( Table1[CalcColumn] ), Table1[CalcColumn] ). However I would like to learn how to do this using purely measures if possible.

Many thanks for your time.


----------



## Laurent C

Agreed


----------



## circledchicken

miguel.escobar said:


> I hope that our post encourage you to use more Powerpivot and you can be certain that this community is extremely active.


Yes, that's good to know and thank you all once again for your help!


----------

