How to Remove Outliers from a data set

secret292

New Member
Joined
Sep 26, 2013
Messages
12
I understand how to identify an outlier but how i do remove them?
I wish to average data with excluding the outliers.
The manual method of deleting each outlier individually will take too long for my data set of 10,000+ data points.

Thanks,

Just a Uni student completing an experimental project
 

Excel Facts

Create a Pivot Table on a Map
If your data has zip codes, postal codes, or city names, select the data and use Insert, 3D Map. (Found to right of chart icons).
If you are using Excel 2007 or a later version, look at the AVERAGEIF function in Excel Help.
 
Upvote 0
This code will replace the outlier (assumes data in Column A) with the text "Outlier". I've used a test to see if the data is outside a 3 sigma band to identify an outlier.
You can modify this to delete the data but most statistics functions have a way to ignore text.

I recommend you try it on a COPY of your data first.

Code:
Sub outliers()
Dim dblAverage As Double, dblStdDev As Double
Dim rTest As Range, Rng As Range
Set rTest = Range("A1:A" & Cells(Rows.Count, 1).End(xlUp).Row)
dblAverage = WorksheetFunction.Average(rTest)
dblStdDev = WorksheetFunction.StDev(rTest)
For Each Rng In rTest
    If Rng > dblAverage + 3 * dblStdDev Or Rng < dblAverage - 3 * dblStdDev Then
        Rng.Value = "Outlier" 'or delete the data with Rng.clearcontents
    End If
Next
End Sub
 
Upvote 0
Thanks! that helps and unfortunately my data is not only in column A how would I change this code to include the other columns? I tried looking online and changing it to "A1:D10" for example but it crashes and "A1,:A10,C1:C10" doesn't work either :(
Thanks for your help. Really is a life saver
 
Upvote 0
This code will replace the outlier (assumes data in Column A) with the text "Outlier". I've used a test to see if the data is outside a 3 sigma band to identify an outlier.
You can modify this to delete the data but most statistics functions have a way to ignore text.

I recommend you try it on a COPY of your data first.

Code:
Sub outliers()
Dim dblAverage As Double, dblStdDev As Double
Dim rTest As Range, Rng As Range
Set rTest = Range("A1:A" & Cells(Rows.Count, 1).End(xlUp).Row)
dblAverage = WorksheetFunction.Average(rTest)
dblStdDev = WorksheetFunction.StDev(rTest)
For Each Rng In rTest
    If Rng > dblAverage + 3 * dblStdDev Or Rng < dblAverage - 3 * dblStdDev Then
        Rng.Value = "Outlier" 'or delete the data with Rng.clearcontents
    End If
Next
End Sub

Thanks! that helps and unfortunately my data is not only in column A how would I change this code to include the other columns? I tried looking online and changing it to "A1:D10" for example but it crashes and "A1,:A10,C1:C10" doesn't work either :(
Thanks for your help. Really is a life saver
 
Upvote 0
What is your definition of an outlier and what is the range the formula must consider?

I am happy to use the 3 sigma band from the mean = definition

Range => rows = 2 to 43162 and column is M to AJ ( I have a lot of data to go through as I am recording data at 1 minute intervals)

Below is a small example of my data
AMNOP
1
DateStrain1AStrain1AStrain1BStrain1B
2
1/04/2013 12:00:00 AM
494-3464519000
3
1/04/2013 12:01:00 AM494-3454179005
4
1/04/2013 12:02:00 AM12000-3454209004
5
1/04/2013 12:03:00 AM495-342430-100

<tbody>
</tbody>

What i currently do is use a averageifs function on a separate sheet to average data in a column from specific time intervals (i.e. day 00:00 - 23:59). The averaging at the moment includes outliers and so i would like to average my data without outliers.

This is pretty much my issue.

Thanks for the help!

<tbody>
</tbody>
 
Last edited:
Upvote 0
What is the range to average? And do you want the averaging per interval? If so, what is the range we can read off the time?
 
Upvote 0
I would guess this to be biaxial strain gauge data (am I correct?). if so you want outliers identified from each data column, not the total data cloud.
Try the following:

Code:
Sub outliers2()
Dim dblAverage As Double, dblStdDev As Double
Dim rData As Range, rTest As Range, Rng As Range
Set rData = Range("M2:AJ" & Cells(Rows.Count, 13).End(xlUp).Row)
For Each rTest In rData.Columns
    dblAverage = WorksheetFunction.Average(rTest)
    dblStdDev = WorksheetFunction.StDev(rTest)
    For i = 1 To rTest.Cells.Count
        Set Rng = rTest.Cells(i, 1)
        Debug.Print Rng.Address
        If Rng > dblAverage + 3 * dblStdDev Or Rng < dblAverage - 3 * dblStdDev Then
            Rng.Value = "Outlier" 'or delete the data with Rng.clearcontents
        End If
    Next i
Next
End Sub
 
Upvote 0
What is the range to average? And do you want the averaging per interval? If so, what is the range we can read off the time?

The way I have set up my data is that i have broken it up into months.
So i have separate excel files for each month (May June July..etc)
Within each excel file I have my raw data on sheet 1 called "April" and average data on sheet 2 called "Average" where each cell contains average data per day.
So for example cell M2 contains averaged data for 1st of April from 0000hrs to 2359hrs and so for for cell M3 contains average data of 2nd of June from "April" raw data.

Below is the formula i use and the underlined parts are just modification factors for the equipment used (strain gauges)
=((AVERAGEIFS(April!M:M,April!A:A,">01/04/2013 00:00",April!A:A,"<01/04/2013 23:59"))-674)*2/2.12

My idea is to eliminate the the outliers so that the averaging function does not include them.
If you recommend a better approach I am more than willing to learn.

Thank you for your help
 
Upvote 0

Forum statistics

Threads
1,221,490
Messages
6,160,133
Members
451,622
Latest member
xmrwnx89

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top