Attempting to calculate product of previous values

Warleggan

New Member
Joined
Feb 1, 2012
Messages
5
Hi all,

A more complex question.

I have a table (patients) that either drop out of the research (cured / died => status=0) in a certain interval OR develop a specific disease (status=1). Of course, the table holds content on which disease and which patient, whether it is first time, and so on. But that's not important for the question.

So, imagine the following (very simply) table:

INTERVAL STATUS
1 1
1 0
1 1
1 0
1 1
1 1
2 1
2 1
2 1
2 0
2 0
2 1

Now, I need to survival chance.

Total N = 12. In interval 1, 2 patients drop out, 4 are registered diseases. So, the survival chance is calculated as follows:

For interval 1, since not all drop-out at the same time, we average this over the year. So, a corrected N' is N - 0.5*dropout = 12 - 0.5*2 = 11

For interval 2, start N is 12 - 6 (everybody from time interval 2) - 0.5 * dropout = 5.

The survival change for interval i then is defined as: (1 - #with_status_1/N') * product_previous_chances.

So, for interval 1, the value is 1 - (4/11) = 0,636
For time interval 2, the value is (1 - (4/5)) * 0,636 = 0,127

I've created a set of measures that calculate these. I want to zoom in on the survival chance.

I now retrieve earlier values with the following DAX formula (example for retrieve 1 interval earlier value).

CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table[interval]= VALUES(Table[interval]) - 1))

That latter -1 identifies the previous value.

However, my range of intervals is 10, I have sets per interval of about 200,000 patients, and you can imagine performance drops dramatically. For 2 intervals the measure (little but simplified) becomes:

CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table[interval]= VALUES(Table[interval])))
*
CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table[interval]= VALUES(Table[interval]) - 1))

Question: can this be done smarter? Maybe using the EARLIER function, although I fail to see how I can do a product (instead of sum, count).

Any help is greatly appreciated, sorry for the long post.

Kind regards,
Edward
 

Excel Facts

Back into an answer in Excel
Use Data, What-If Analysis, Goal Seek to find the correct input cell value to reach a desired result
Funny, just last night I was asking my former colleagues at Microsoft if a PRODUCTX() function is something they could provide us in the future, as I was facing a similar problem.

I'm honestly not sure if that would solve your particular problem though. I'm going to try to recruit someone else to drop in here and answer you - either someone from MS, or maybe the Italians :)
 
Upvote 0
Funny, just last night I was asking my former colleagues at Microsoft if a PRODUCTX() function is something they could provide us in the future, as I was facing a similar problem.

I'm honestly not sure if that would solve your particular problem though. I'm going to try to recruit someone else to drop in here and answer you - either someone from MS, or maybe the Italians :)

By using properties of logarithm, you can create a MULTIPLYX behavior. Check that: http://en.wikipedia.org/wiki/Logarithm

MULTIPLYX (MyTable, MyColumn) can be expressed as
Rich (BB code):
EXP( SUMX ( MyTable, Log(1 + MyTable[Column]) ) )


does it help?

Alberto
--------------------------------------------------
www.powerpivotworkshop.com

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
 
Upvote 0
Alberto, I think you would need to use LN instead of LOG, like so:
Code:
EXP( SUMX ( MyTable, LN(MyTable[MyColumn]) ) )
To illustrate how this works, say your table looks like this:
<table border="1" bordercolor="#999999" cellspacing="0"><tbody><tr><td rowspan="1" colspan="1" bgcolor="#DAEEF3" height="25.5" valign="bottom" width="111" align="left">MyColumn </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">1 </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">2 </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">3 </td></tr> <tr><td rowspan="1" colspan="1" bgcolor="#FFFFFF" height="25.5" valign="bottom" width="111" align="right">4 </td></tr> </tbody></table>
Essentially, here is how the formula progresses mathematically:
EXP( SUMX ( MyTable, LN(MyTable[MyColumn]) ) )
EXP( LN(1) + LN(2) + LN(3) + LN(4) )
EXP( LN(1*2*3*4) )
EXP( LN(24) )
24
Which equals 1*2*3*4

This is the relevant article:
http://en.wikipedia.org/wiki/Natural_logarithm
 
Last edited:
Upvote 0
Edward,
In regards to your original question, since it is a very complicated multi-step calculation I would recommend that you walk us through your more elementary measures first (show the actual formulas) so that we can get a better sense of what you're trying to do and how you're doing it. I think this will make it easier for others to try their hand at your problem.
 
Upvote 0
Here is some more explanation of the measures I defined. Please note my intervals are all of the same length (a year). My pivot typically shows survival percentages for all intervals (at most 10 years).

Measure #Rows (get the number of rows regardless of status):
=CALCULATE(COUNT(Table[status]), ALL(Table[status]))

Helper measure #Values(interval) (get the number of values):
=COUNTROWS(VALUES(Table[interval]))

Measure N (the total number of rows in the pivot):
=CALCULATE(COUNT(Table[status]), ALL(Table[status], Table[ctime]))

So far, rather straightforward.

Now, for the N' (corrected N in a specific interval):
=Table[N]-IF([#Values(interval)]=1, CALCULATE(SUMX(VALUES(Table[interval]), Table[Count of status]), Table[interval] < VALUES(Table[interval]) && Table[interval] > VALUES(Table[interval]) - 10), 0)-(
[#Rows]-Table[Count of status])*0.5-(IF([#Values(interval)]=1, CALCULATE(SUMX(VALUES(Table[interval]),
[#Rows]-Table[Count of status]), Table[interval] < VALUES(Table[interval]) && Table[interval] > VALUES(Table[interval]) - 10), 0))

So, for one of the intervals, the above measure calculates start N - #rows in previous intervals - 0.5*[deaths in this interval]. There might be a more elegant way to do this, but this one does not cause the problem (yet).

Then, calculate a chance for a specific interval, measure p:
=AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N'])

Finally, the one that drains performance I need to find a much better solution for. Now complete, the measure (actually chance that a patient develops a second (or more) disease after time):

=1-(IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval]))=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval]))), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-1)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-1)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-2)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-2)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-3)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-3)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-4)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-4)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-5)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-5)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-6)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-6)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-7)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-7)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-8)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-8)), 0)
*
IF([#Values(interval)]=1, IF(CALCULATE(Table[p], Table[interval] = VALUES(Table[interval])-9)=0,1,CALCULATE(AVERAGEX(VALUES(Table[interval]), 1-Table[Count of status]/Table[N']), Table_prev1997[interval] = VALUES(Table[interval])-9)), 0))

Hope this helps explaining what I'm trying to do. It is a form of survival analysis (interested in first time survivors that develop a second disease).

Since there are several other slicers (e.g. age first disease, gender, ethnicity, disease type, start year of investigation), the measures need to be flexible to the selections made (whatever the rows the pivot is based on).

I'd actually like to compare two sets of selections in one graph (e.g. male vs female haven't discovered how to do that yet either), but currently performance is stopping me to move on.

Cheers,
Edward
 
Last edited:
Upvote 0
Hi

This one looked fun so I thought I'd give it a go. You're probably there already but here was my solution:

My N' (Ndash) is a calculated column and looks like this:


COUNT(Table1[Interval])-COUNTX(FILTER(Table1,Table1[Interval] < EARLIER(Table1[Interval])),Table1[Interval])-0.5*CALCULATE(COUNTX(Table1,Table1[Interval]),FILTER(Table1,Table1[Interval]=EARLIER(Table1[Interval])),Table1[Status]=0)
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
<o:p>My Chance is a measure and looks like this</o:p>
<o:p></o:p>
<o:p>=CALCULATE(EXP(SUMX(VALUES(Table1[Ndash]),LN(Table1[Ndash]))),FILTER(ALL(Table1[Interval]),CALCULATE(MAX(Table1[Interval]))<=CALCULATE(MAX(Table1[Interval]),VALUES(Table1[Interval]))))

I have no idea if that will work with your actual table.

I'd love to see what solution you came up with if it's any different

MrHopko</o:p>
 
Upvote 0

Forum statistics

Threads
1,223,250
Messages
6,171,036
Members
452,374
Latest member
keccles

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top