# Filtering a column based on the values in a different column in the current context



## ruve1k (Jan 25, 2012)

It's great to see this new section of the Mr. Excel message board!
I was wondering if any of the Pros could take a look at this thread and give some insight and hopefully suggest something more efficient.
Thanks!


----------



## powerpivotpro (Jan 25, 2012)

I will need the question reformulated to speak "pivot-ese" since I don't know SQL 

Something along the lines of "I have a pivot that looks like this and I want to add a measure that returns the following values."


----------



## Jon von der Heyden (Jan 26, 2012)

Hi Rob

The way I interpret this is a bit like how autofilter works in Excel 2007/2010.  I.e. previous we could only filter by up to 2 criteria, and apply and operator AND/OR.  Now we can pass an entire array of filter values to the filter.

So say you have a field that contains the following:

<b>Excel 2010</b><table cellpadding="2.5px" rules="all" style=";background-color: #FFFFFF;border: 1px solid;border-collapse: collapse; border-color: #BBB"><colgroup><col width="25px" style="background-color: #DAE7F5" /><col /><col /></colgroup><thead><tr style=" background-color: #DAE7F5;text-align: center;color: #161120"><th></th><th>A</th><th>B</th></tr></thead><tbody><tr ><td style="color: #161120;text-align: center;">1</td><td style="background-color: #BFBFBF;;">FIELD_1</td><td style="background-color: #BFBFBF;;">FIELD_2</td></tr><tr ><td style="color: #161120;text-align: center;">2</td><td style=";">a</td><td style=";">Richard</td></tr><tr ><td style="color: #161120;text-align: center;">3</td><td style=";">b</td><td style=";">Peter</td></tr><tr ><td style="color: #161120;text-align: center;">4</td><td style=";">c</td><td style=";">Colin</td></tr><tr ><td style="color: #161120;text-align: center;">5</td><td style=";">d</td><td style=";">Rory</td></tr><tr ><td style="color: #161120;text-align: center;">6</td><td style=";">e</td><td style=";">Barry</td></tr><tr ><td style="color: #161120;text-align: center;">7</td><td style=";">f</td><td style=";">Andy</td></tr><tr ><td style="color: #161120;text-align: center;">8</td><td style=";">g</td><td style=";">Rob</td></tr><tr ><td style="color: #161120;text-align: center;">9</td><td style=";">h</td><td style=";">Mike</td></tr><tr ><td style="color: #161120;text-align: center;">10</td><td style=";">i</td><td style=";">Joe</td></tr><tr ><td style="color: #161120;text-align: center;">11</td><td style=";">j</td><td style=";">Domenic</td></tr></tbody></table><p style="width:3.6em;font-weight:bold;margin:0;padding:0.2em 0.6em 0.2em 0.5em;border: 1px solid #BBB;border-top:none;text-align: center;background-color: #DAE7F5;color: #161120">Sheet1</p><br /><br />

I understand that the OP would like to pass a range or array of filter values, such as say a,c,e,g,i.

The result should look like:
<b>Excel 2010</b><table cellpadding="2.5px" rules="all" style=";background-color: #FFFFFF;border: 1px solid;border-collapse: collapse; border-color: #BBB"><colgroup><col width="25px" style="background-color: #DAE7F5" /><col /><col /></colgroup><thead><tr style=" background-color: #DAE7F5;text-align: center;color: #161120"><th></th><th>A</th><th>B</th></tr></thead><tbody><tr ><td style="color: #161120;text-align: center;">1</td><td style="background-color: #BFBFBF;;">FIELD_1</td><td style="background-color: #BFBFBF;;">FIELD_2</td></tr><tr ><td style="color: #161120;text-align: center;">2</td><td style=";">a</td><td style=";">Richard</td></tr><tr ><td style="color: #161120;text-align: center;">3</td><td style=";">c</td><td style=";">Colin</td></tr><tr ><td style="color: #161120;text-align: center;">4</td><td style=";">e</td><td style=";">Barry</td></tr><tr ><td style="color: #161120;text-align: center;">5</td><td style=";">g</td><td style=";">Rob</td></tr><tr ><td style="color: #161120;text-align: center;">6</td><td style=";">i</td><td style=";">Joe</td></tr></tbody></table><p style="width:3.6em;font-weight:bold;margin:0;padding:0.2em 0.6em 0.2em 0.5em;border: 1px solid #BBB;border-top:none;text-align: center;background-color: #DAE7F5;color: #161120">Sheet1</p><br /><br />


----------



## ruve1k (Jan 26, 2012)

Here's what the table looks like.

<table style="width: 277px; height: 276px;" border="1" bordercolor="#999999" cellspacing="0"> <tbody><tr><td bgcolor="#C0C0C0"> 
</td> <td bgcolor="#C0C0C0" align="center">*A*</td><td bgcolor="#C0C0C0" align="center">*B*</td><td bgcolor="#C0C0C0" align="center">*C*</td></tr> <tr><td bgcolor="#C0C0C0" align="center">*1*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">*Field_1 *</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">*Field_2 *</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="left">*Amount *</td></tr> <tr><td bgcolor="#C0C0C0" align="center">*2*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">a </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="right">
</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="right">5 </td></tr> <tr><td bgcolor="#C0C0C0" align="center">*3*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">b </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="right">
</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="right">9 </td></tr> <tr><td bgcolor="#C0C0C0" align="center">*4*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">c </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="right">
</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="right">7 </td></tr> <tr><td bgcolor="#C0C0C0" align="center">*5*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">d </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">a </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="right">5 </td></tr> <tr><td bgcolor="#C0C0C0" align="center">*6*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">e </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">b </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="right">4 </td></tr> <tr><td bgcolor="#C0C0C0" align="center">*7*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">f </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">c </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="right">6 </td></tr> <tr><td bgcolor="#C0C0C0" align="center">*8*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">g </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">d </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="right">10 </td></tr> <tr><td bgcolor="#C0C0C0" align="center">*9*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">h </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">e </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="right">2 </td></tr> <tr><td bgcolor="#C0C0C0" align="center">*10*</td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">i </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="center" width="192" align="left">f </td><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="192" align="right">4 </td></tr> </tbody></table>
First I created a simple measure:
Measure_1:=SUM(myTable[Amount])

Now for Measure_2, I need a sum of the [Amount] column where Field_1 is filtered on the values in Field_2 in the current context. 
For example: If the context of the cell in my PT is where Field_1 contains {d,e,f} then Measure_1 = 5+4+6 = 15. Now in this context the values in Field_2 are {a,b,c} so I need Measure_2 to return the sum of [Amount] from the rows where Field_1 = {a,b,c} i.e. 5+9+7 = 21.

In my post on MSDN PowerpIvot forum  I came up with something that works but unfortunately it is painfully slow. Here it is for reference.


```
Measure_2:
=SUMX(VALUES(myTable[Field_2]), 
    CALCULATE(myTable[Measure_1],
             ALL(myTable),
             myTable[Field_1]=EARLIER(myTable[Field_2])
    )
 )
```
My idea was that the same way that after using ALL(myTable) you can reinstate the context filter of a certain column by including VALUES(myTable[myColumn]) as a filter expression, I instead wanted to apply VALUES(Field_2) as a table filter expression against Filed_1 (a different column). It seemed that this would be orders of magnitude more efficient than iterating over every value in Field_2. I'm not really sure how I can do this though.


----------



## ruve1k (Jan 26, 2012)

ruve1k said:


> My idea was that the same way ...


That should've been :"My alternate idea that I never figured out was..."


----------



## David Churchward (Jan 26, 2012)

This might work for you.  Create a table that holds all of the distinct values for Field_1 (I've called the table Field1 - sorry if that's confusing - didn't think!).  In theory, this will hold the distinct values for Field_2 as well since values in Field_2 are ultimately going to filter Field_1.  Create a relationship between myTable and the Field1 table, linking the two Field_1 columns.  With that in place, the following measure should work:


IF(COUNTROWS(VALUES(Field1[Field_1]))=1,
CALCULATE(
SUM(myTable[Amount]),
ALL(myTable),
Field1[Field_1]=VALUES(myTable[Field_2])
           )
     )

Please let me know if it works for you and if it is any quicker.


----------



## AlbertoFerrari (Jan 26, 2012)

ruve1k said:


> That should've been :"My alternate idea that I never figured out was..."


 
Ruve1k, your data model is a variation of the many to many pattern, in reality you have a many to many relationship between the two fields but you have stored it into the fact table, which is not a good idea because you need to traverse the fact table to get the relationship between the two fields.


I think the best option would be to create a table holding all the distinct combinations of Field1 and Field2 and an ID, slightly change the data model by adding the ID to the fact table and relate it with the new dimension. 

Then substitute the iteration over the fact table with an iteration over the distinct table and a single step over the fact table.
I have used this technique to solve many to many patterns and it is very fast with a 4 Billion rows fact table (not in PowerPivot, of course, I am speaking about SSAS here). I tried with PowerPivot with 100M rows, and it works pretty well.


Alberto


----------



## David Churchward (Jan 26, 2012)

Sorry - you probably want a total on that too. Let's call the measure in my last post "LastPostMeasure", then your new measure is:

=SUMX(myTable,[LastPostMeasure])


----------



## ruve1k (Jan 26, 2012)

David, 
thanks for your suggestion but in my case each cell's context contains many values from Field_2 (and Filed_1). Essentially, the solution I proposed on MSDN does what you're suggesting but I iterate over every value of Field_2 that is in my context.

Alberto, 
Am I iterating over the fact table? I though I was only iterating over the distinct values in Fileld_2 with the SUMX function. Could you please clarify the steps of iteration in my formula?
Thanks!


----------



## AlbertoFerrari (Jan 26, 2012)

ruve1k said:


> David,
> thanks for your suggestion but in my case each cell's context contains many values from Field_2 (and Filed_1). Essentially, the solution I proposed on MSDN does what you're suggesting but I iterate over every value of Field_2 that is in my context.
> 
> Alberto,
> ...


 
Yep, sorry, I was not very precise.
You are iterating over VALUES (Field1) and, for each value, you make a roundtrip to the fact table. Suppose you have 10 values for Field1, you make 10 scans of the fact table.
Due to the way the Vertipaq engine works, each trip over the fact table costs a lot (supposing it is a big one), thus you need to reduce them.
If you have a Field12 table containing Field1, Field2 and an ID, you can do the following:

=CALCULATE (
    SUM (MyTable[Amount]),
    FILTER (
        ALL (Field12),
        COUNTROWS (
            FILTER (
                VALUES (Field12[Field_2]),
                EARLIER (Field12[Field_1]) = Field12[Field_2]
            )
        ) > 0
    )
)

This formula makes a single trip over the fact table (SUM), iterating over the Field12 many times but, supposing the Field12 table is much smaller, it should perform better.
You should recognize the many to many pattern here even if the presence of EARLIER and a double iteration over Field12 makes the formula somehow hard to read.

Alberto


----------



## ruve1k (Jan 25, 2012)

It's great to see this new section of the Mr. Excel message board!
I was wondering if any of the Pros could take a look at this thread and give some insight and hopefully suggest something more efficient.
Thanks!


----------



## powerpivotpro (Jan 26, 2012)

Guys you are scaring the new people 

If you are reading this and are new to PowerPivot, rest assured that what's being discussed here is VERY advanced relative to what you need to know.

Hell, even I don't understand Alberto half the time


----------



## David Churchward (Jan 26, 2012)

ruve1k - I'm intrigued about the context situation.  I don't know your dataset, but I'm interested to find out why context wouldn't hold.  Without building the model, it's hard to "hit the wall" but if you've already done so, I would be interested to know more.

Great solution Alberto.  I was of the belief that EARLIER was recursive and could therefore be as costly as SUMX (although I guess it depends what operations it's performing).  Am I wrong or is it simply an "it depends" situation?

Great discussion ruve1k


----------



## AlbertoFerrari (Jan 26, 2012)

powerpivotpro said:


> Guys you are scaring the new people
> 
> If you are reading this and are new to PowerPivot, rest assured that what's being discussed here is VERY advanced relative to what you need to know.
> 
> Hell, even I don't understand Alberto half the time


 
 Rob, believe it or not, I don't understand Excel questions much more than half the time.

Anwyay, I agree, this is much more advanced than what is normally needed... I tend to like complex discussions, my fault.
Moreover, with "normal" table size I think Ruve's first solution should perform pretty well, under 10 million rows the difference should be very very low.


----------



## AlbertoFerrari (Jan 26, 2012)

David Churchward said:


> ruve1k - I'm intrigued about the context situation. I don't know your dataset, but I'm interested to find out why context wouldn't hold. Without building the model, it's hard to "hit the wall" but if you've already done so, I would be interested to know more.
> 
> Great solution Alberto. I was of the belief that EARLIER was recursive and could therefore be as costly as SUMX (although I guess it depends what operations it's performing). Am I wrong or is it simply an "it depends" situation?
> 
> Great discussion ruve1k


 
EARLIER is very fast, it just need to go back one level in the stack of row contexts. A better (and cleaner) solution would be to completely normalize the model, going into a pure many-to-many pattern, in that case you can follow the canonical pattern without the need to use EARLIER.
Moreover, with PowerPivot 2.0, if you leverage the m2m pattern, you can use SUMMARIZE, making the formula much faster.

Alberto


----------



## ruve1k (Jan 26, 2012)

AlbertoFerrari said:


> Moreover, with "normal" table size I think Ruve's first solution should perform pretty well, under 10 million rows the difference should be very very low.


Alberto, 
My data set has only 100K rows. 
Should performance be significantly worse if the measure was a distinct count rather than a sum?
E.g.

```
=SUMX(VALUES(myTable[Field_2]), 
    CALCULATE([B][COLOR=Red][DistinctCountMeasure][/COLOR][/B],
             ALL(myTable),
             myTable[Field_1]=EARLIER(myTable[Field_2])
    )
 )
```
...because my formula with a sum measure inside CALCULATE is not giving me trouble. It's actually a version with a distinct count inside the CALCULATE that is really slow.


----------



## AlbertoFerrari (Jan 26, 2012)

ruve1k said:


> Alberto,
> My data set has only 100K rows.
> Should performance be significantly worse if the measure was a distinct count rather than a sum?
> E.g.
> ...


 
Well... no. Distinct count are very fast in Vertipaq. If your formula is just COUNTROWS (DISTINCT (...)) it should be fast.
If, on the other hand, the formula is more complex... well, it would be much better if you post the complete formula, otherwise any consideration would be pointless.

Alberto


----------



## powerpivotpro (Jan 26, 2012)

Note that Distinct Counts are much slower in PowerPivot v1 than they will be in V2:

http://www.powerpivotpro.com/2011/07/distinct-count-in-powerpivot-v2-much-faster/


----------



## ruve1k (Jan 26, 2012)

I am actually just using a simple DISTINCTCOUNT([Column]) in the SQL Server 2012 RC0 release and I find that performance is much slower than the same formula with a SUM.


----------



## ruve1k (Jan 26, 2012)

David,
Now that I had a few minutes to breath, I reread your suggestion about the single-column table of distinct values from Filed_1. It sounded like a good idea but when I tried it I didn't see any material impact on the sluggish performance. 
Now I'm just wondering conceptually if it make any difference when applying a filter expression to a measure, whether the filter expression is applied to the Field_1 on the fact table or to Field_1 on a related dim table.

Hopefully over the weekend I'll have chance to work on Alberto's solution and see how that performs for me.


----------



## David Churchward (Jan 27, 2012)

Hi ruve1k.  That's interesting.  I usually find a reasonable performance gain using a related dim table, although more so on huge fact tables where the number of distinct dimension values is relatively small.

I would be interested to hear how you get on with Alberto's solution.  Sounds like you've got a fun weekend coming up!

Sorry I couldn't be of more help.  If I get chance, I'll try a few options on some of my datasets and see if I come up with anything.


----------



## ruve1k (Jan 25, 2012)

It's great to see this new section of the Mr. Excel message board!
I was wondering if any of the Pros could take a look at this thread and give some insight and hopefully suggest something more efficient.
Thanks!


----------



## Jon von der Heyden (Jan 27, 2012)

powerpivotpro said:


> Guys you are scaring the new people
> 
> If you are reading this and are new to PowerPivot, rest assured that what's being discussed here is VERY advanced relative to what you need to know.
> 
> Hell, even I don't understand Alberto half the time



I find it rather inspiring!


----------



## David Churchward (Jan 27, 2012)

Jon von der Heyden said:


> I find it rather inspiring!


 
I've been desperately resisting setting up a discussion "Who can confuse Rob the most?" and then handing over to Alberto!


----------



## AlbertoFerrari (Jan 31, 2012)

ruve1k said:


> I am actually just using a simple DISTINCTCOUNT([Column]) in the SQL Server 2012 RC0 release and I find that performance is much slower than the same formula with a SUM.


 
Sorry for not answering, it seems I did not receive mails for the update of the forum.
It would be of GREAT help if you can send me the workbook. I have used Distinct Counts over billions of rows without any problem thus it seems there's something well hidden in the formula that creates problems, I would like to check it on my PC, if possible.


----------

