Removing Outliers with standard deviation

ineedhelpp

New Member
Joined
Oct 20, 2019
Messages
4
Hello,

I am working on a project and told to remove all outliers by using the ? +- 3? approach. I am wondering how i should go about this? do i need to calculate the mean and all quartiles to remove the outliers or is there a single excel function that can remove the data outside 3 standard deviations?

Thank you for any help!
 

Excel Facts

Did you know Excel offers Filter by Selection?
Add the AutoFilter icon to the Quick Access Toolbar. Select a cell containing Apple, click AutoFilter, and you will get all rows with Apple
I guess you could run a macro to delete/remove data. If there are less than 30 data points, I normally use sample standard deviation and average. Do that first in two cells and then do a simple =IF().

I normally set extreme outliers if 3 or more standard deviations which is a z rating of 0. e.g.
Code:
Function f3SDOutlier(x As Variant, xRange As Variant) As Boolean
  Dim sStdDev As Double, xBar As Double
  xBar = WorksheetFunction.Average(xRange)
  sStdDev = WorksheetFunction.StDev(xRange)
  f3SDOutlier = fRating(x, xBar, sStdDev) = 0
End Function

Function fRating(x As Variant, mean As Variant, sd As Variant) As Integer
  Dim z As Double, s As Integer, r As Integer
  Application.Volatile False
  On Error GoTo ExitFunction
  z = (x - mean) / sd
  s = 1
  If z < 0 Then s = -1
  If x = "" Or IsEmpty(x) Then z = 4
  Select Case True
    Case z < 1
      r = s * 5
    Case z >= 1 And z < 1.5
      r = s * 4
    Case z >= 1.5 And z < 2
      r = s * 3
    Case z >= 2 And z < 2.5
      r = s * 2
    Case z >= 2.5 And z < 3
      r = s
    Case Else
      r = 0
  End Select
  fRating = r
ExitFunction:
End Function
It is easy enough to write a formula array function from there. I normally mark the data in red and then remove them manually if appropriate.

The quartile method is more elaborate.
 
Last edited:
Upvote 0
I am working on a project and told to remove all outliers by using the u +- 3s approach

Of course, you should do what you are told to do. But unless your data is (expected to be) normally distributed, identifying outliers by +/-sd might not be the right thing to do.

More generally, I prefer to use the IQR method. But I choose +/-1.7IQR in order to approximate +/-3sd for normal distributions.


do i need to calculate the mean and all quartiles to remove the outliers or is there a single excel function that can remove the data outside 3 standard deviations?

Ironically, you do not need to calculate "all quartiles". That would be for the IQR method (and only the 25%ile and 75%ile, and not the mean).

But you do need to calculate the mean (AVERAGE) and standard deviation (STDEVP or STDEV.P) for the +/-3sd method.

Although you could "remove" outliers, it might be sufficient to ignore them in your calculations.

For example, if U1 is =AVERAGE(A1:A1000) and S1 is =STDEVP(A1:A1000), where A1:A1000 is all of your data, the mean and standard deviation of the data "without" (ignoring) outliers are the following array-entered formulas (press ctrl+shift+Enter instead of just Enter):

=AVERAGE(IF(ABS(STANDARDIZE(A1:A1000,U1,S1))<=3, A1:A1000))

=STDEVP(IF(ABS(STANDARDIZE(A1:A1000,U1,S1))<=3, A1:A1000))

You might choose to use STDEV instead of STDEVP in the last formula.

PS.... Again, you should do whatever you are told to do. But in actual practice, most statisticians suggest that outliers should be identified, but only removed or ignored after determining that they represent errors in the data, not merely extremes.
 
Upvote 0
In a blank sheet, you can run the first 3 subs. They create the data in column A, show non-outliers in column B, and mark column A outliers in red.
Code:
Sub ColA()  'Random data in column A
  Dim a, i As Long
  i = 10
  a = RndIntPick(1, 100, i)
  ReDim Preserve a(1 To i + 2)
  a(i + 1) = -1000
  a(i + 2) = 120
  Range("A2", Range("A2").End(xlDown)).ClearContents
  Range("A2").Resize(UBound(a)) = WorksheetFunction.Transpose(a)
End Sub

Sub ColB()  'Column A data with outliers removed
  Dim a
   Range("B2", Range("B2").End(xlDown)).ClearContents
  a = NotOutlier3sd(Range("A2", Range("A2").End(xlDown)))
  Range("B2").Resize(UBound(a)) = WorksheetFunction.Transpose(a)
End Sub

Sub Test_Mark3sdoutliersred()
  Mark3sdOutliersRed Range("A2", Range("A2").End(xlDown))
End Sub

Sub Mark3sdOutliersRed(xRange As Range)
  Dim xSStdDev As Double, xMean As Double, v, i As Long
  Dim r As Range, c As Range
  
  xRange.Interior.Color = xlNone
  
  xMean = WorksheetFunction.Average(xRange)
  xSStdDev = WorksheetFunction.StDev(xRange)
  ReDim a(1 To xRange.Count)
  For i = 1 To UBound(a)
    If xRange(i) > xMean + 3 * xSStdDev Or xRange(i) < xMean - 3 * xSStdDev Then
      xRange(i).Interior.Color = vbRed
    End If
  Next i
End Sub

Function NotOutlier3sd(xRange As Range)
  Dim xSStdDev As Double, xMean As Double, v, i As Long, j As Long, a
  
  xMean = WorksheetFunction.Average(xRange)
  xSStdDev = WorksheetFunction.StDev(xRange)
  ReDim a(1 To xRange.Count)
  For i = 1 To UBound(a)
    If xRange(i) <= xMean + 3 * xSStdDev And xRange(i) >= xMean - 3 * xSStdDev Then
      j = j + 1
      a(j) = xRange(i)
    End If
  Next i
  ReDim Preserve a(1 To j)
  NotOutlier3sd = a
End Function

Function RndIntPick(first As Long, last As Long, _
    noPick As Long, Optional bSort As Boolean = False) As Variant
    Dim i As Long, r As Long, Temp As Long, k As Long
    ReDim iArr(first To last) As Variant
    Dim a() As Variant
     
    For i = first To last
        iArr(i) = i
    Next i
     
    Randomize
    For i = 1 To noPick
        r = Int(Rnd() * (last - first + 1 - (i - 1))) + (first + (i - 1))
        Temp = iArr(r)
        iArr(r) = iArr(first + i - 1)
        iArr(first + i - 1) = Temp
    Next i
     
    ReDim Preserve iArr(first To first + noPick - 1)
    ReDim a(1 To noPick)
    For r = 1 To noPick
        a(r) = iArr(LBound(iArr) + r - 1)
    Next r
     
    If bSort = True Then
        RndIntPick = ArrayListSort(a())
    Else
        RndIntPick = a()
    End If
End Function
 
Function ArrayListSort(sn As Variant, Optional bAscending As Boolean = True) As Variant
    With CreateObject("System.Collections.ArrayList")
        Dim cl As Variant
        For Each cl In sn
            .Add cl
        Next
         
        .Sort 'Sort ascendending
        If bAscending = False Then .Reverse 'Sort and then Reverse to sort descending
        ArrayListSort = .toarray()
    End With
End Function
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,876
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top