# Tricky Power Pivot Calculation, summing the occurrence of different system states at different times



## GundayMonday (May 13, 2013)

Hello,

I have a weird problem and I'm not even sure if you can solve it using Power Pivot's functionality.

I'm looking at how ~500 unique IDs are changing over time within a system. The IDs can enter one of three different states within the system (let's call them A, B, and C for simplicity). Every row in my data starts with a time stamp telling what state the unique Id is in at that time. I'm trying to create a column that tells me how many times a specific ID has entered a specific state up to the current time.

So my data looks like this:


TimestampUnique IDState01123A01124B02123A02124C03123B03124B04123A

<tbody>

</tbody>
I want to return a column that counts the number of times a specific ID has newly entered each state:

TimestampUnique IDStateA totalB totalC total01123A10001124B01002123A10002124C01103123B11003124B02104123A210

<tbody>

</tbody>
My pseudo code would look like:

At each time stamp, check and see if the state of each ID has changed. Ignore any time stamp > than the current row.
If it has changed, add 1 to whatever the new state total is for that ID.
If it hasn't changed, keep counts from the last time step.

Is there a way to do this in PowerPivot? Excel I can do it no problem, but my data set has ~9million rows in it . I've been googling for the past few hours and trying out all these weird DAX functions, but can't find anything that works.

Can anyone help me figure this out?


----------



## Jacob Barnett (May 13, 2013)

It's probably a bit of an 'Excel mindset' but the first thing I would do is add a calculated column to determine whether or not the current state if the same as the previous for that [Unique ID]. I'll give you 2 options - would be very interested to know if you have any performance issues and which works better. Theoretically the only impact should be on refresh times.

=if(LOOKUPVALUE(Table1[State],
                                  Table1[Timestamp],
                                  CALCULATE(MAX(Table1[Timestamp]), FILTER(ALL(Table1),table1[Timestamp]<earlier(table1[timestamp]))),
                                  Table1[Unique ID], Table1[Unique ID]
                                  )<>Table1[State],
         "New",
         BLANK()
       )

<earlier(table1[timestamp])&&
=IF(CALCULATE(VALUES(Table1[State]), 
                 TOPN(1,
                                 FILTER(ALL(Table1),table1[Timestamp]<earlier(table1[timestamp])&&
                                                                         table1[unique ID] = earlier(table1[unique ID])
                                              ),
                                 Table1[Timestamp]
                               )
                            )<>Table1[State],
         "New",
         BLANK()
       )


Then using that as a flag I can write a relatively straightforward measure. You will probably need a separate Dates/Timestamp table to get decent performance out of a relatively large dataset. The dates table should have a column containing unique, contiguous dates/time units for the required period and be related to the main table.


```
CALCULATE(COUNTROWS(Table1),
                             FILTER(ALL(Timestamp),Timestamp[Timestamp]<=MAX(Timestamp[Timestamp])),
                             Table1[A]="New"
                  )
```

Hope this gets you on the way, I uploaded my quick test model here if it helps: https://docs.google.com/file/d/0Bz5yMU2oooW2S1RlTi1ONzVhZU0/edit?usp=sharing
</earlier(table1[timestamp])&&
</earlier(table1[timestamp])&&
</earlier(table1[timestamp]))),


----------



## GundayMonday (May 15, 2013)

<earlier(table1[timestamp]))),
Definitely an excel mindset thing </earlier(table1[timestamp]))),
. Thank you very much! I'm trying this out now on my non-generic data. Extremely useful to have example code to base similar problems off of too!


----------



## GundayMonday (May 16, 2013)

My data set is currently just shy of 10 million rows. Your first suggestion takes about 30s to calculate and works great. I tried the second one a few times but it never finished (I gave up after ~10 mins). First suggestion works great though .


----------



## GundayMonday (May 16, 2013)

<earlier(table1[timestamp]))),
Okay, I have a follow up question now. Jacob's solution totally works, but won't scale the way my dataset's growing with a simple excel pivot table (I'm going to have way more rows than excel can display because of way too many unique IDs).

My thought on a solution is to build some new categories for different states in my power pivot dataset, based on the number of times a unique ID has entered a specific state. Let me explain with a data example:


TimestampUnique IDStateNew to State Dummy?01123A101124B102123A002124C103123B103124B104123A1

<tbody>

</tbody>
Same data as before, but now I have an extra column (thanks to Jacob!) that tells me if the Unique ID has newly entered the State for the row's specific timestamp. I'd like to use this dummy variable to figure out how many times a unique ID has newly entered a specific state for all timestamps <= the current row's timestamp. Psuedo-code would look like this:

For a specific unique ID, and a specific state, and all timestamps <= timestamp of current row, sum 'New to State Dummy?'. Would look something like this:

TimestampUnique IDStateNew To State Dummy?New To State Sum01123A1101124B1102123A0102124C1103123B1103124B1204123A12

<tbody>

</tbody>
Having that new column would let me use the CONCATENATE function to combine my 'State' and my 'New To State Sum' columns to get new categories like A1, A2, B1, B2, etc. Then I could use a pivot table to visualize everything without over loading excel's row limit! (There might be a more elegant way to do this, I'm all ears if you know of it )

I can't figure out a way to get that 'New To State Sum' to work. I think I'm messing up some syntax in my DAX functions. There was a nice thread with a similar problem a few months back, but I can't get their proposed solutions to work on my dataset:
http://www.mrexcel.com/forum/powerp...records-table-ignoring-blanks-blank-rows.html

Any insight into how to tackle this?</earlier(table1[timestamp]))),


----------

