I am struggling with standard deviation in a sumproduct() output.
I have developed a matrix that tracks stuff by hour of day and day of week over the course of a great length of time. This matrix is specific and by design.
Right now, I am able to achieve a total count, and an average count (details below). I need to figure out if I can leverage the same formulas to get standard deviation, min, and max.
I do have this answer elsewhere, however, I did it "the hard way" by creating massive tables and datasets to perform DAVERAGE() and DSTDEV()'s against.
Here's where I'm at with my "to-be"
The matrix is driven by a dataset that is about 38,000 rows.
So for my total count matrix, I have:
[TABLE="class: grid, width: 538"]
<colgroup><col span="2"><col span="6"></colgroup><tbody>[TR]
[TD][/TD]
[TD="align: right"]2 X24[/TD]
[TD="align: right"]3[/TD]
[TD="align: right"]4[/TD]
[TD="align: right"]5[/TD]
[TD="align: right"]6[/TD]
[TD="align: right"]7[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]M[/TD]
[TD]T[/TD]
[TD]W[/TD]
[TD]Th[/TD]
[TD]F[/TD]
[TD]S[/TD]
[TD]Su[/TD]
[/TR]
[TR]
[TD]Time[/TD]
[TD]M[/TD]
[TD]T[/TD]
[TD]W[/TD]
[TD]Th[/TD]
[TD]F[/TD]
[TD]S[/TD]
[TD]Su[/TD]
[/TR]
[TR]
[TD="align: right"]12:00 AM
W27[/TD]
[TD="align: right"]50[/TD]
[TD="align: right"]46[/TD]
[TD="align: right"]41[/TD]
[TD="align: right"]54[/TD]
[TD="align: right"]54[/TD]
[TD="align: right"]6[/TD]
[TD="align: right"]3[/TD]
[/TR]
[TR]
[TD="align: right"]5:00 AM[/TD]
[TD="align: right"]264[/TD]
[TD="align: right"]255[/TD]
[TD="align: right"]257[/TD]
[TD="align: right"]291[/TD]
[TD="align: right"]247[/TD]
[TD="align: right"]4[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD="align: right"]6:00 AM[/TD]
[TD="align: right"]537[/TD]
[TD="align: right"]598[/TD]
[TD="align: right"]597[/TD]
[TD="align: right"]640[/TD]
[TD="align: right"]513[/TD]
[TD="align: right"]9[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD="align: right"]7:00 AM[/TD]
[TD="align: right"]726[/TD]
[TD="align: right"]807[/TD]
[TD="align: right"]783[/TD]
[TD="align: right"]689[/TD]
[TD="align: right"]744[/TD]
[TD="align: right"]7[/TD]
[TD="align: right"]9[/TD]
[/TR]
[TR]
[TD="align: right"]8:00 AM[/TD]
[TD="align: right"]785[/TD]
[TD="align: right"]865[/TD]
[TD="align: right"]795[/TD]
[TD="align: right"]818[/TD]
[TD="align: right"]742[/TD]
[TD="align: right"]10[/TD]
[TD="align: right"]7[/TD]
[/TR]
[TR]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[/TR]
</tbody>[/TABLE]
(snipped for the sake of brevity)
(apologies for the garbage/duplicate rows)
In each ## cell, the following formula exists:
=SUMPRODUCT(--(WEEKDAY(Open)=X$24),(HOUR(Open)>=HOUR($W27))*(HOUR(Open)<HOUR($W28)))
"Open" is the column within my raw data which contain a date. Lots and lots of data here to be counted, averaged, and hopefully STDEV'd!
X24 is "2" (above the M)
W27 is "12:00AM"
In order to get an average, I need to take into account the fact that my data source only records when events happen, and not when they don't. e.g. an event at 6:30AM may occur one week and be counted, but not occur the following week. If my sample size were 2, I'd need that total number to calculate my average.
Therefore, to calculate averages, I update that formula to:
=(SUMPRODUCT(--(WEEKDAY(Open)=X$24),(HOUR(Open)>=HOUR($W27))*(HOUR(Open)<HOUR($W28))))/(DAYS(MAX(Open),MIN(Open))/7)
So far, this works fine, and I have validated it against "the long way" which was mentioned above.
My Dilemma:
Where I'm getting stuck is figuring out how/where to apply the standard deviation equation in the SUMPRODUCT formula, or if this is even possible based on how I've derived the total and average counts.
If you really want to play with what's in "Open" you can do a =randbetween(42370375,43465625)/1000 for a few thousand rows to simulate what I've got.
If I may clarify what I've stated above please let me know.
Thank you in advance!
I have developed a matrix that tracks stuff by hour of day and day of week over the course of a great length of time. This matrix is specific and by design.
Right now, I am able to achieve a total count, and an average count (details below). I need to figure out if I can leverage the same formulas to get standard deviation, min, and max.
I do have this answer elsewhere, however, I did it "the hard way" by creating massive tables and datasets to perform DAVERAGE() and DSTDEV()'s against.
Here's where I'm at with my "to-be"
The matrix is driven by a dataset that is about 38,000 rows.
So for my total count matrix, I have:
[TABLE="class: grid, width: 538"]
<colgroup><col span="2"><col span="6"></colgroup><tbody>[TR]
[TD][/TD]
[TD="align: right"]2 X24[/TD]
[TD="align: right"]3[/TD]
[TD="align: right"]4[/TD]
[TD="align: right"]5[/TD]
[TD="align: right"]6[/TD]
[TD="align: right"]7[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD][/TD]
[TD]M[/TD]
[TD]T[/TD]
[TD]W[/TD]
[TD]Th[/TD]
[TD]F[/TD]
[TD]S[/TD]
[TD]Su[/TD]
[/TR]
[TR]
[TD]Time[/TD]
[TD]M[/TD]
[TD]T[/TD]
[TD]W[/TD]
[TD]Th[/TD]
[TD]F[/TD]
[TD]S[/TD]
[TD]Su[/TD]
[/TR]
[TR]
[TD="align: right"]12:00 AM
W27[/TD]
[TD="align: right"]50[/TD]
[TD="align: right"]46[/TD]
[TD="align: right"]41[/TD]
[TD="align: right"]54[/TD]
[TD="align: right"]54[/TD]
[TD="align: right"]6[/TD]
[TD="align: right"]3[/TD]
[/TR]
[TR]
[TD="align: right"]5:00 AM[/TD]
[TD="align: right"]264[/TD]
[TD="align: right"]255[/TD]
[TD="align: right"]257[/TD]
[TD="align: right"]291[/TD]
[TD="align: right"]247[/TD]
[TD="align: right"]4[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD="align: right"]6:00 AM[/TD]
[TD="align: right"]537[/TD]
[TD="align: right"]598[/TD]
[TD="align: right"]597[/TD]
[TD="align: right"]640[/TD]
[TD="align: right"]513[/TD]
[TD="align: right"]9[/TD]
[TD="align: right"]1[/TD]
[/TR]
[TR]
[TD="align: right"]7:00 AM[/TD]
[TD="align: right"]726[/TD]
[TD="align: right"]807[/TD]
[TD="align: right"]783[/TD]
[TD="align: right"]689[/TD]
[TD="align: right"]744[/TD]
[TD="align: right"]7[/TD]
[TD="align: right"]9[/TD]
[/TR]
[TR]
[TD="align: right"]8:00 AM[/TD]
[TD="align: right"]785[/TD]
[TD="align: right"]865[/TD]
[TD="align: right"]795[/TD]
[TD="align: right"]818[/TD]
[TD="align: right"]742[/TD]
[TD="align: right"]10[/TD]
[TD="align: right"]7[/TD]
[/TR]
[TR]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[TD="align: right"][/TD]
[/TR]
</tbody>[/TABLE]
(snipped for the sake of brevity)
(apologies for the garbage/duplicate rows)
In each ## cell, the following formula exists:
=SUMPRODUCT(--(WEEKDAY(Open)=X$24),(HOUR(Open)>=HOUR($W27))*(HOUR(Open)<HOUR($W28)))
"Open" is the column within my raw data which contain a date. Lots and lots of data here to be counted, averaged, and hopefully STDEV'd!
X24 is "2" (above the M)
W27 is "12:00AM"
In order to get an average, I need to take into account the fact that my data source only records when events happen, and not when they don't. e.g. an event at 6:30AM may occur one week and be counted, but not occur the following week. If my sample size were 2, I'd need that total number to calculate my average.
Therefore, to calculate averages, I update that formula to:
=(SUMPRODUCT(--(WEEKDAY(Open)=X$24),(HOUR(Open)>=HOUR($W27))*(HOUR(Open)<HOUR($W28))))/(DAYS(MAX(Open),MIN(Open))/7)
So far, this works fine, and I have validated it against "the long way" which was mentioned above.
My Dilemma:
Where I'm getting stuck is figuring out how/where to apply the standard deviation equation in the SUMPRODUCT formula, or if this is even possible based on how I've derived the total and average counts.
If you really want to play with what's in "Open" you can do a =randbetween(42370375,43465625)/1000 for a few thousand rows to simulate what I've got.
If I may clarify what I've stated above please let me know.
Thank you in advance!