Standard deviation without outliers

crozfader

Board Regular
Joined
Mar 23, 2011
Messages
86
I'm trying to do the following


I have a series
0 0 0 0 0 0 0 0 0 0 0 96 24 120 120 1 120 240 258 240 278 423 77 3


I want to eliminate outliers and calculate a new mean and standard deviation.

Using the following I was able to calculate the new mean without the outlier (in this case there is only one outlier => 423)

=SUMPRODUCT((V3:AS3<CP3+1.5*CN3)*(V3:AS3>CO3-1.5*CN3)*(V3:AS3))/(24-CQ3)

Where V3:AS3 contains the range above, CN3 is the Inter-Quartile range, CP3 is the third quartile, and CO3 is the first quartile.

For some reason if I try the following array formula, it fails for distributions that contain outliers and works for distributions that don't have outliers...

{=STDEV(IF(OR(V2:AS2>CP2+1.5*CN2,V2:AS2<CO2-1.5*CN2),"",V2:AS2)) }

What I'm trying to do here is return V2:AS2 only if they fit within the ranges defined above, else return ""... I have used similar tricks for others applications without any problem. What am I doing wrong?


For this same distribution, if I just want to count the number of outliers I use:

{=+SUM(IF(OR(V2:AS2<CO2-1.5*CN2,V2:AS2>CP2+1.5*CN2),1,0))}

In this case, the outlier count is 1 (which is obviously the 423 observation)

Any ideas?
 

Excel Facts

How can you automate Excel?
Press Alt+F11 from Windows Excel to open the Visual Basic for Applications (VBA) editor.
What exactly would you classify as an outlier? Is there a certain criteria that you're trying to throw out? If it's mentioned above, I'm missing it.

Also, make sure you have a logical reason for why you want to remove these data points (or in this case, data point).
 
Upvote 0
Q1 - first quartile of the distribution
Q3 - third quartile
IQR = Q3 - Q1

An observation is considered an outlier if Observation < Q1 - 1.5*IQR or Observation > Q3 + 1.5*IQR
 
Upvote 0
Here's how I would do it (with an outlier being greater than 3 stdev from the mean):

Data A1:A16
[TABLE="width: 64"]
<tbody>[TR]
[TD="width: 64, align: right"]1[/TD]
[/TR]
[TR]
[TD="align: right"]2[/TD]
[/TR]
[TR]
[TD="align: right"]3[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]5[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]2[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]3[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]3[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]4[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]5[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]6[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]7[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]8[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]3[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]2[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]1[/TD]
[/TR]
[TR]
[TD="class: xl74, align: right"]80[/TD]
[/TR]
</tbody>[/TABLE]

=STDEV(IF(A1:A16>AVERAGE(A1:A16)+3*STDEV(A1:A16),"",A1:A16))
^^^Entered as an Array (ctrl+shift+enter)

The above only takes into accounts for data GREATER than 3*stdev from the sample mean. But judging from your formulas and knowledge above, I'm sure you can add in the other side.

Also, you can change the (>avg+3*stdev) to your quartile criteria. Hope this helps!

To add in the other side:
=stdev(if(observations>avg+3*stdev,"",if(observations<avg-3*stdev,"",observations)))
<avg-3*stdev,"",observations)))
^^array</avg-3*stdev,"",observations)))
 
Last edited:
Upvote 0
As per pplstuff you can not use OR or AND in an array.

If CN3 is your Inter-Quartile range try.
Excel Workbook
VWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAPAQARAS
10000000000096241201201120240258240278423773
2
3StDEV97.66633
Sheet
 
Upvote 0
Thanks pplstuff and AhoyNC!!! much appreciated.


Here's the solution I came up with, it is very sloppy but works:

=+STDEV(IF(ISERROR(IF(V3:AS3>CP3+3*CN3,"",V3:AS3)+ IF(V3:AS3<CO3-3*CN3,"",V3:AS3)),"",0.5*IF(V3:AS3>CP3+3*CN3,"",V3:AS3)+ 0.5*IF(V3:AS3<CO3-3*CN3,"",V3:AS3)))

I basically just add the two ranges and multiply by 0.5 because they each return V3:AS3...

Your solution works as well and is well more elegant...

The nested ifs in your formula acts as an "AND", is there a way to create "Or" through a similar trick (just curious)?

{=STDEV(IF($V$1:$AS$1>=$CN$3-(1.5*$CN$3),IF($V$1:$AS$1<=$CN$3+(1.5*$CN$3),V1:AS1)))}

Thank you very much
 
Upvote 0
oups here's my full formula

=+STDEV(IF(ISERROR(IF(V3:AS3>CR3+3*CP3,"",V3:AS3)+ IF(V3:AS3<CQ3-3*CP3,"",V3:AS3)),"",0.5*IF(V3:AS3>CR3+3*CP3,"",V3:AS3)+ 0.5*IF(V3:AS3<CQ3-3*CP3,"",V3:AS3)))
 
Upvote 0
Sometimes when you use > and < here the board software interprets those as HTML tags and the formulas get corrupted, try leaving a space at the open end of < or > to counteract that......

In array formulas you can use * to simulate AND and + to simulate OR so for an "OR" version try

=STDEV(IF(($V$1:$AS$1>=$CN$3-1.5*$CN$3)+($V$1:$AS$1<=$CN$3+1.5*$CN$3),V1:AS1))
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,526
Messages
6,172,833
Members
452,483
Latest member
Johnstone

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top