Outlier detection and removal from ranges

RawlinsCross

Active Member
Joined
Sep 9, 2016
Messages
437
Good day,

I have a two ranges (for X and Y data) that I graph on a userform using the traditional method of graphing, exporting to image, and loading the image into a picture control. Oftentimes, the data in the Y data has a few errors in terms of outliers (X data is just dates). I'd like to have a button that reruns the graphing but takes the existing range variables (rXValues and rYValues) and removes the outliers in rYValues and removes the corresponding data in the rXValues. So I'm thinking a function that takes in the two range variables, performs the outlier detection and data removal, and returns the new corrected range variables? I already have the graphing code but am struggling to start the function.

For outlier criteria, I would think the IQR (interquartile range) should be sufficient:
1. Calculate the IQR for the data
2. Multiply the IQR by 1.5
3. Add this value to the value of the third quartile (Q3)
4. Subtract this value to the value of the first quartile (Q1)

Just not sure on how to manipulate the ranged variables where they are paired. Any thoughts?
 

Excel Facts

Select all contiguous cells
Pressing Ctrl+* (asterisk) will select the "current region" - all contiguous cells in all directions.
Just an update. I have decided to just take the rYValues and treat the data in that one column of data. I calculate the threshold where data is considered an outlier (as above) and replace that value with #n/a. The code works but I can't convert the cleaned array back to a range which can be used downstream of the function. Any thoughts?

Code:
Public Function RemoveOutliers(rDirtyRange As Range) As Range
Dim dQuart1, dQuart3, dIQR, dScaler, dOutlierMax As Double
Dim vCleanArray() As Variant
Dim i As Integer
Dim TempRange As Range

dQuart1 = Application.WorksheetFunction.Quartile(rDirtyRange, 1)
dQuart3 = Application.WorksheetFunction.Quartile(rDirtyRange, 3)
dIQR = dQuart3 - dQuart1
dScaler = 1.5 * dIQR
If dScaler = 0 Then
    dOutlierMax = dQuart3
Else
    dOutlierMax = dQuart3 + dScaler
End If
vCleanArray = rDirtyRange

'Index the array and replace values above dOutlierMax with NA()

For i = LBound(vCleanArray) To UBound(vCleanArray)
    If vCleanArray(i, 1) > dOutlierMax Then vCleanArray(i, 1) = CVErr(xlErrNA)
Next i
For i = LBound(vCleanArray) To UBound(vCleanArray)
    Debug.Print vCleanArray(i, 1)
Next i
  
'vClean array is correct so the code works up to here.
'Now all I need to do is to convert the vCleanArray back into a Range that I can return as the "RemoveOutliers" Range
'Clear any data that may be in the range D116:E145

Range("D116:E1450").Clear

'Copy the non-duplicate coordinates to D116:E116 & below

Set TempRange = Range("D116").Resize(UBound(vCleanArray), 1)
TempRange = Application.WorksheetFunction.Transpose(vCleanArray)
RemoveOutliers = TempRange

End Function
 
Upvote 0
Finally got it to work. For variant arrays, I had to physically cycle through the area and assign each value to a range cell. Not sure if this is the most efficient approach so would love feedback. So basically the steps in this function:

1. Function accepts a range which may or may not have extreme outliers
2. First and third quartiles are calculated
3. Interquartile range is calculated (Q3-Q1)
4. Threshold value (dScaler) is calculated
5. Minor if statement if data is uniform - effects the threshold value
6. Pass the range into a variant array
7. Index the array and replace values above the threshold with blank values
8. Copy the new cleaned array to a range on worksheet (numbers physically get copied into cells)
9. Set the function return to this range

Code:
Public Function RemoveOutliers(rDirtyRange As Range) As Range

Dim dQuart1, dQuart3, dIQR, dScaler, dOutlierMax As Double
Dim vCleanArray() As Variant
Dim i As Integer
Dim TempRange As Range
Dim myCell As Range

'Calculate the Quartiles
dQuart1 = Application.WorksheetFunction.Quartile(rDirtyRange, 1)
dQuart3 = Application.WorksheetFunction.Quartile(rDirtyRange, 3)
dIQR = dQuart3 - dQuart1
dScaler = 6 * dIQR

'Somtimes dScaler would return 0 and needed adjustment
If dScaler = 0 Then
    dOutlierMax = dQuart3
Else
    dOutlierMax = dQuart3 + dScaler
End If

'Pass the range into a variant array
vCleanArray = rDirtyRange

'Index the array and replace values above dOutlierMax with blank values
For i = LBound(vCleanArray) To UBound(vCleanArray)
    If vCleanArray(i, 1) > dOutlierMax Then vCleanArray(i, 1) = ""
Next i
  
'Clear any data that may be in the range D116:E1450
Range("D116:E1450").Clear

'Copy the non-duplicate coordinates to D116 & below
Set TempRange = Range("D116").Resize(UBound(vCleanArray), 1)
i = 1
For Each myCell In TempRange
    myCell = vCleanArray(i, 1)
    i = i + 1
Next myCell

Set RemoveOutliers = TempRange
End Function
 
Upvote 0

Forum statistics

Threads
1,223,231
Messages
6,170,884
Members
452,364
Latest member
springate

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top