# Weighted Standard Deviation



## CHill97402 (Sep 24, 2014)

I am needing create a weighted standard deviation measure for a data set, and am having extreme difficulties determining how to achieve this with DAX. 

We can assume a simple data structure, with the [Val] and [Wgts] columns holding the Values and Weights respectively.

Within Excel, assuming 'Vals' is a named range holding the values, and 'Wgts' is a named range holding the weights, the formula would be:

=SQRT(SUMPRODUCT(Wgts,((Vals-SUMPRODUCT(Vals,Wgts)/SUM(Wgts))^2))/SUM(Wgts)*COUNT(Vals)/(COUNT(Vals)-1))

How would I transcribe this into DAX?


----------



## GDRIII (Sep 25, 2014)

Read this one

Standard Deviation Demystified in Power Pivot « PowerPivotPro


----------



## CHill97402 (Sep 25, 2014)

GDRIII said:


> Read this one
> 
> Standard Deviation Demystified in Power Pivot « PowerPivotPro



Thanks for the reply, GDRIII, however that post and the STDEVX DAX statements do not handle weighting of the data. I am needing to generate *Weighted* Standard Deviations.


----------



## GDRIII (Sep 25, 2014)

Oh well.  Beyond any powers I have but, you did remind me of this one.  Good piece on Weighted Averages.  Maybe you can glean some inspiration

You’re “Poisson” Running Through My Veins: A Truly Epic Guest Post on Call Centers and Erlang C « PowerPivotPro


----------



## CHill97402 (Sep 29, 2014)

Thanks again, GDRIII - I read through that article earlier, however not sure there is much I can use for this application.

I did manage to get a measure that yields the proper value, however it is incredibly slow at calculating, likely due to it effectively generating the weighted mean for each row of data to calculate the variance. The formula is:

YSD (Q):=SQRT(SUMX(ADDCOLUMNS(ADDCOLUMNS(ADDCOLUMNS(Q,"Mean",Calculate([YMean (Q)],All(Q),Q[Q]=Earlier(Q[Q]),Q[nIter]=Earlier(Q[nIter]))),"Delta",(Q[YMean] - [Mean])^2),"wDelta",[Delta] * SWITCH([Weights],1,1,2,[Weight],3,[WW_Weight])),AVERAGE([wDelta]))/[Orgs (Q)])

A few things to call out on the data structure:

Q is the main table in question here. It has a simple structure of:

Q[Q] - String - A question number
Q[Iter] - numeric - Some questions have multiple iterations, so this is storing this
Q[Resp] - Numeric - A numeric response to the question
Q[Weight] - Numeric - One of the weights
Q[WW_Weight] - Numeric - Another set of weights
Q[YMean] - Numeric - An alternative response, and what the above metric is actually determining the SD on

[Weights]  is just a simple measure that is linked to a slicer to select which weighting scheme to use (unweigthed, using Q[Weight], or using Q[WW_Weight].

Each row of Q is a question response for a single respondent of the survey. So to isolate to a particular question number, a filter on Q[Q] and Q[Iter] is needed (which is the point of the use of earlier() in the formula - it is restoring the filters from slicers on the output pivot/cubefunctions). 

Unfortunately, I need to generate a couple of hundred of these values on each of the worksheets, and with the recalculating the weighted mean for each row, performance is prohibitively slow. There has to be a less 'loopy' solution!


----------

