Attempting to calculate product of previous values

Warleggan · Feb 8, 2012

Hi all,

A more complex question.

I have a table (patients) that either drop out of the research (cured / died => status=0) in a certain interval OR develop a specific disease (status=1). Of course, the table holds content on which disease and which patient, whether it is first time, and so on. But that's not important for the question.

So, imagine the following (very simply) table:

INTERVAL STATUS
1 1
1 0
1 1
1 0
1 1
1 1
2 1
2 1
2 1
2 0
2 0
2 1

Now, I need to survival chance.

Total N = 12. In interval 1, 2 patients drop out, 4 are registered diseases. So, the survival chance is calculated as follows:

For interval 1, since not all drop-out at the same time, we average this over the year. So, a corrected N' is N - 0.5*dropout = 12 - 0.5*2 = 11

For interval 2, start N is 12 - 6 (everybody from time interval 2) - 0.5 * dropout = 5.

The survival change for interval i then is defined as: (1 - #with_status_1/N') * product_previous_chances.

So, for interval 1, the value is 1 - (4/11) = 0,636
For time interval 2, the value is (1 - (4/5)) * 0,636 = 0,127

I've created a set of measures that calculate these. I want to zoom in on the survival chance.

I now retrieve earlier values with the following DAX formula (example for retrieve 1 interval earlier value).

CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table[interval]= VALUES(Table[interval]) - 1))

That latter -1 identifies the previous value.

However, my range of intervals is 10, I have sets per interval of about 200,000 patients, and you can imagine performance drops dramatically. For 2 intervals the measure (little but simplified) becomes:

CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table[interval]= VALUES(Table[interval])))
*
CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table[interval]= VALUES(Table[interval]) - 1))

Question: can this be done smarter? Maybe using the EARLIER function, although I fail to see how I can do a product (instead of sum, count).

Any help is greatly appreciated, sorry for the long post.

Kind regards,
Edward

powerpivotpro · Feb 8, 2012

Funny, just last night I was asking my former colleagues at Microsoft if a PRODUCTX() function is something they could provide us in the future, as I was facing a similar problem.

I'm honestly not sure if that would solve your particular problem though. I'm going to try to recruit someone else to drop in here and answer you - either someone from MS, or maybe the Italians

AlbertoFerrari · Feb 8, 2012

powerpivotpro said:
Funny, just last night I was asking my former colleagues at Microsoft if a PRODUCTX() function is something they could provide us in the future, as I was facing a similar problem.

I'm honestly not sure if that would solve your particular problem though. I'm going to try to recruit someone else to drop in here and answer you - either someone from MS, or maybe the Italians

By using properties of logarithm, you can create a MULTIPLYX behavior. Check that: http://en.wikipedia.org/wiki/Logarithm

MULTIPLYX (MyTable, MyColumn) can be expressed as

Rich (BB code):

EXP( SUMX ( MyTable, Log(1 + MyTable[Column]) ) )

does it help?

Alberto
--------------------------------------------------
www.powerpivotworkshop.com

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com

ffice

ffice" /><o

></o

>

ruve1k · Feb 8, 2012

Alberto, I think you would need to use LN instead of LOG, like so:

Code:

EXP( SUMX ( MyTable, LN(MyTable[MyColumn]) ) )

To illustrate how this works, say your table looks like this:
<table border="1" bordercolor="#999999" cellspacing="0"><tbody><tr><td rowspan="1" colspan="1" bgcolor="#DAEEF3" height="25.5" valign="bottom" width="111" align="left">MyColumn </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">1 </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">2 </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">3 </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">4 </td></tr> </tbody></table>
Essentially, here is how the formula progresses mathematically:
EXP( SUMX ( MyTable, LN(MyTable[MyColumn]) ) )
EXP( LN(1) + LN(2) + LN(3) + LN(4) )
EXP( LN(1*2*3*4) )
EXP( LN(24) )
24
Which equals 1*2*3*4

This is the relevant article:
http://en.wikipedia.org/wiki/Natural_logarithm

ruve1k · Feb 8, 2012

Edward,
In regards to your original question, since it is a very complicated multi-step calculation I would recommend that you walk us through your more elementary measures first (show the actual formulas) so that we can get a better sense of what you're trying to do and how you're doing it. I think this will make it easier for others to try their hand at your problem.

Warleggan · Feb 8, 2012

EXP ( LN () )

Hadn't thought of that. Will try some things.

Warleggan · Feb 8, 2012

Here is some more explanation of the measures I defined. Please note my intervals are all of the same length (a year). My pivot typically shows survival percentages for all intervals (at most 10 years).

Measure #Rows (get the number of rows regardless of status):
=CALCULATE(COUNT(Table[status]), ALL(Table[status]))

Helper measure #Values(interval) (get the number of values):
=COUNTROWS(VALUES(Table[interval]))

Measure N (the total number of rows in the pivot):
=CALCULATE(COUNT(Table[status]), ALL(Table[status], Table[ctime]))

So far, rather straightforward.

Now, for the N' (corrected N in a specific interval):
=Table[N]-IF([#Values(interval)]=1, CALCULATE(SUMX(VALUES(Table[interval]), Table[Count of status]), Table[interval] < VALUES(Table[interval]) && Table[interval] > VALUES(Table[interval]) - 10), 0)-(
[#Rows]-Table[Count of status])*0.5-(IF([#Values(interval)]=1, CALCULATE(SUMX(VALUES(Table[interval]),
[#Rows]-Table[Count of status]), Table[interval] < VALUES(Table[interval]) && Table[interval] > VALUES(Table[interval]) - 10), 0))

So, for one of the intervals, the above measure calculates start N - #rows in previous intervals - 0.5*[deaths in this interval]. There might be a more elegant way to do this, but this one does not cause the problem (yet).

Then, calculate a chance for a specific interval, measure p:
=AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N'])

Finally, the one that drains performance I need to find a much better solution for. Now complete, the measure (actually chance that a patient develops a second (or more) disease after time):

=1-(IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval]))=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval]))), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-1)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-1)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-2)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-2)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-3)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-3)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-4)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-4)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-5)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-5)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-6)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-6)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-7)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-7)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-8)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-8)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-9)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-9)), 0))

Hope this helps explaining what I'm trying to do. It is a form of survival analysis (interested in first time survivors that develop a second disease).

Since there are several other slicers (e.g. age first disease, gender, ethnicity, disease type, start year of investigation), the measures need to be flexible to the selections made (whatever the rows the pivot is based on).

I'd actually like to compare two sets of selections in one graph (e.g. male vs female haven't discovered how to do that yet either), but currently performance is stopping me to move on.

Cheers,
Edward

mrhopko · Feb 10, 2012

Hi

This one looked fun so I thought I'd give it a go. You're probably there already but here was my solution:

My N' (Ndash) is a calculated column and looks like this:

COUNT(Table1[Interval])-COUNTX(FILTER(Table1,Table1[Interval] < EARLIER(Table1[Interval])),Table1[Interval])-0.5*CALCULATE(COUNTX(Table1,Table1[Interval]),FILTER(Table1,Table1[Interval]=EARLIER(Table1[Interval])),Table1[Status]=0)
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com

ffice

ffice" /><o

></o

>
<o

>My Chance is a measure and looks like this</o

>
<o

></o

>
<o

>=CALCULATE(EXP(SUMX(VALUES(Table1[Ndash]),LN(Table1[Ndash]))),FILTER(ALL(Table1[Interval]),CALCULATE(MAX(Table1[Interval]))<=CALCULATE(MAX(Table1[Interval]),VALUES(Table1[Interval]))))

I have no idea if that will work with your actual table.

I'd love to see what solution you came up with if it's any different

MrHopko</o

>

Attempting to calculate product of previous values

Warleggan

New Member

Excel Facts

powerpivotpro

Board Regular

AlbertoFerrari

New Member

ruve1k

Board Regular

ruve1k

Board Regular

Warleggan

New Member

Warleggan

New Member

mrhopko

Board Regular

Similar threads

Forum statistics

Share this page

Attempting to calculate product of previous values

New Member

Excel Facts

Board Regular

New Member

Board Regular

Board Regular

New Member

New Member

Board Regular

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock