# Attempting to calculate product of previous values



## Warleggan (Feb 8, 2012)

Hi all,

A more complex question.

I have a table (patients) that either drop out of the research (cured / died => status=0) in a certain interval OR develop a specific disease (status=1). Of course, the table holds content on which disease and which patient, whether it is first time, and so on. But that's not important for the question.

So, imagine the following (very simply) table:

INTERVAL   STATUS
1              1
1              0
1              1
1              0
1              1
1              1
2              1
2              1
2              1
2              0
2              0
2              1

Now, I need to survival chance.

Total N = 12. In interval 1, 2 patients drop out, 4 are registered diseases. So, the survival chance is calculated as follows:

For interval 1, since not all drop-out at the same time, we average this over the year. So, a corrected N' is N - 0.5*dropout = 12 - 0.5*2 = 11

For interval 2, start N is 12 - 6 (everybody from time interval 2) - 0.5 * dropout = 5.

The survival change for interval i then is defined as: (1 - #with_status_1/N') * product_previous_chances.

So, for interval 1, the value is 1 - (4/11) = 0,636
For time interval 2, the value is (1 - (4/5)) * 0,636 = 0,127

I've created a set of measures that calculate these. I want to zoom in on the survival chance.

I now retrieve earlier values with the following DAX formula (example for retrieve 1 interval earlier value).

CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table[interval]= VALUES(Table[interval]) - 1))

That latter -1 identifies the previous value.

However, my range of intervals is 10, I have sets per interval of about 200,000 patients, and you can imagine performance drops dramatically. For 2 intervals the measure (little but simplified) becomes:

CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table[interval]= VALUES(Table[interval])))
*
CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table[interval]= VALUES(Table[interval]) - 1))

Question: can this be done smarter? Maybe using the EARLIER function, although I fail to see how I can do a product (instead of sum, count).

Any help is greatly appreciated, sorry for the long post.

Kind regards,
Edward


----------



## powerpivotpro (Feb 8, 2012)

Funny, just last night I was asking my former colleagues at Microsoft if a PRODUCTX() function is something they could provide us in the future, as I was facing a similar problem.

I'm honestly not sure if that would solve your particular problem though.  I'm going to try to recruit someone else to drop in here and answer you - either someone from MS, or maybe the Italians


----------



## AlbertoFerrari (Feb 8, 2012)

powerpivotpro said:


> Funny, just last night I was asking my former colleagues at Microsoft if a PRODUCTX() function is something they could provide us in the future, as I was facing a similar problem.
> 
> I'm honestly not sure if that would solve your particular problem though. I'm going to try to recruit someone else to drop in here and answer you - either someone from MS, or maybe the Italians


 
By using properties of logarithm, you can create a MULTIPLYX behavior. Check that: http://en.wikipedia.org/wiki/Logarithm

MULTIPLYX (MyTable, MyColumn) can be expressed as 

```
EXP( SUMX ( MyTable, Log(1 + MyTable[Column]) ) )
```


does it help?

Alberto
--------------------------------------------------
www.powerpivotworkshop.com

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-comfficeffice" /><o></o>


----------



## ruve1k (Feb 8, 2012)

Alberto, I think  you would need to use *LN *instead of *LOG*, like so:


```
EXP( SUMX ( MyTable, LN(MyTable[MyColumn]) ) )
```
To illustrate how this works, say your table looks like this:
<table border="1" bordercolor="#999999" cellspacing="0"><tbody><tr><td rowspan="1" colspan="1" bgcolor="#DAEEF3" height="25.5" valign="bottom" width="111" align="left">*MyColumn *</td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">1 </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">2 </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">3 </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">4 </td></tr> </tbody></table>
Essentially, here is how the formula progresses mathematically:
EXP( SUMX ( MyTable, LN(MyTable[MyColumn]) ) )
EXP( LN(1) + LN(2) + LN(3) + LN(4) )
EXP( LN(1*2*3*4) )
EXP( LN(24) )
24
Which equals 1*2*3*4

This is the relevant article:
http://en.wikipedia.org/wiki/Natural_logarithm


----------



## ruve1k (Feb 8, 2012)

Edward,
In regards to your original question, since it is a very complicated multi-step calculation I would recommend that you walk us through your more elementary measures first (show the actual formulas) so that we can get a better sense of what you're trying to do and how you're doing it.  I think this will make it easier for others to try their hand at your problem.


----------



## Warleggan (Feb 8, 2012)

EXP ( LN () )

Hadn't thought of that. Will try some things.


----------



## Warleggan (Feb 8, 2012)

Here is some more explanation of the measures I defined. Please note my intervals are all of the same length (a year). My pivot typically shows survival percentages for all intervals (at most 10 years).

Measure #Rows (get the number of rows regardless of status):
=CALCULATE(COUNT(Table[status]), ALL(Table[status]))

Helper measure #Values(interval) (get the number of values):
=COUNTROWS(VALUES(Table[interval]))

Measure N (the total number of rows in the pivot):
=CALCULATE(COUNT(Table[status]), ALL(Table[status], Table[ctime]))

So far, rather straightforward.

Now, for the N' (corrected N in a specific interval):
=Table[N]-IF([#Values(interval)]=1, CALCULATE(SUMX(VALUES(Table[interval]), Table[Count of status]), Table[interval] < VALUES(Table[interval]) && Table[interval] > VALUES(Table[interval]) - 10), 0)-(
[#Rows]-Table[Count of status])*0.5-(IF([#Values(interval)]=1, CALCULATE(SUMX(VALUES(Table[interval]), 
[#Rows]-Table[Count of status]), Table[interval] < VALUES(Table[interval]) && Table[interval] > VALUES(Table[interval]) - 10), 0))

So, for one of the intervals, the above measure calculates start N - #rows in previous intervals - 0.5*[deaths in this interval]. There might be a more elegant way to do this, but this one does not cause the problem (yet).

Then, calculate a chance for a specific interval, measure p:
=AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N'])

Finally, the one that drains performance I need to find a much better solution for. Now complete, the measure (actually chance that a patient develops a second (or more) disease after time):

=1-(IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval]))=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval]))), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-1)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-1)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-2)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-2)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-3)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-3)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-4)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-4)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-5)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-5)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-6)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-6)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-7)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-7)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-8)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-8)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-9)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-9)), 0))

Hope this helps explaining what I'm trying to do. It is a form of survival analysis (interested in first time survivors that develop a second disease).

Since there are several other slicers (e.g. age first disease, gender, ethnicity, disease type, start year of investigation), the measures need to be flexible to the selections made (whatever the rows the pivot is based on).

I'd actually like to compare two sets of selections in one graph (e.g. male vs female haven't discovered how to do that yet either), but currently performance is stopping me to move on.

Cheers,
Edward


----------



## mrhopko (Feb 10, 2012)

Hi

This one looked fun so I thought I'd give it a go. You're probably there already but here was my solution:

My N' (Ndash) is a calculated column and looks like this:


COUNT(Table1[Interval])-COUNTX(FILTER(Table1,Table1[Interval] < EARLIER(Table1[Interval])),Table1[Interval])-0.5*CALCULATE(COUNTX(Table1,Table1[Interval]),FILTER(Table1,Table1[Interval]=EARLIER(Table1[Interval])),Table1[Status]=0)
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-comfficeffice" /><o></o>
<o>My Chance is a measure and looks like this</o>
<o></o>
<o>=CALCULATE(EXP(SUMX(VALUES(Table1[Ndash]),LN(Table1[Ndash]))),FILTER(ALL(Table1[Interval]),CALCULATE(MAX(Table1[Interval]))<=CALCULATE(MAX(Table1[Interval]),VALUES(Table1[Interval]))))

I have no idea if that will work with your actual table. 

I'd love to see what solution you came up with if it's any different

MrHopko</o>


----------

