outlier per column

lferreira

New Member
Joined
Aug 14, 2018
Messages
5
I have a macro that is building an array. The array has 400 columns and 1000 rows. I need outliers identified from each data column, not the total data cloud. I have to post the data in the worksheet first and then do the calculation. I'm using the code below. The code calculates the values in the worksheet. But it takes too long to calculate. I need to speed up the code, but I can not find the correct syntax. I'd like to first calculate in the array and then throw them in the worksheet. Is it possible? Thx for support.

Sub outliers()
Dim mAvg As Double, mStdD As Double
Dim rData As Range, rCell As Range, Rng As Range
Set rData = Range("B2:OK1001")
For Each rCell In rData.Columns
mAvg = WorksheetFunction.Average(rCell )
mStdD = WorksheetFunction.StDev(rCell )
For i = 1 To rCell .Cells.Count
Set Rng = rCell .Cells(i, 1)
If Rng <> "-" Then
If Rng <> "" Then
If Rng > mAvg + 1 * mStdD Or Rng < mAvg - 1 * mStdD Then
Rng.Value = "Outlier"
End If: End If: End If
Next i: Next
End Sub
 

Excel Facts

Can you sort left to right?
To sort left-to-right, use the Sort dialog box. Click Options. Choose "Sort left to right"
Hi,

Try to use the following code:

Code:
Option Explicit


Sub outliers()
    Dim mAvg                As Double
    Dim mStdD               As Double
    Dim rData               As Range
    Dim rColumn             As Range
    Dim rCell               As Range
    Dim i                   As Long
    Dim lngLastRow          As Long
    
    Set rData = Range("B2:D101")
    
    For Each rColumn In rData.Columns
        mAvg = WorksheetFunction.Average(rColumn)
        mStdD = WorksheetFunction.StDev(rColumn)
        For Each rCell In rColumn.Cells
            If rCell > mAvg + 1 * mStdD Or rCell < mAvg - 1 * mStdD Then
                'rCell.Font.Color = RGB(255, 0, 0) 'uncomment this line to change font color to red for all outliers
                rCell.Value = "Outlier"
            End If
        Next rCell
    Next rColumn
    
End Sub

Let me know if the performance has improved.
 
Upvote 0
+ sorry, of course you also need to change your Range to "B2:OK1001"
 
Upvote 0
thanks for your attention, JustynaMK. Yes, the code got faster. But not much. Already using up your code, Thx. But I have a question: Is there a way to do the calculation inside the array, and then throw the already calculated data in the worksheet? (...Instead of setting the range I set the array...)
 
Upvote 0
Hi,

How about simply turning off Screen Updating and Auto Calc - I noticed a vast improvement on my side.

Code:
Sub outliers()
    Dim mAvg                As Double
    Dim mStdD               As Double
    Dim rData               As Range
    Dim rColumn             As Range
    Dim rCell               As Range
    Dim i                   As Long
    Dim lngLastRow          As Long
    
    Set rData = Range("B2:OK1001")
    
[COLOR=#ff0000]    With Application[/COLOR]
[COLOR=#ff0000]        .Calculation = xlManual[/COLOR]
[COLOR=#ff0000]        .ScreenUpdating = False[/COLOR]
[COLOR=#ff0000]    End With[/COLOR]
    
    For Each rColumn In rData.Columns
        mAvg = WorksheetFunction.Average(rColumn)
        mStdD = WorksheetFunction.StDev(rColumn)
        For Each rCell In rColumn.Cells
            If rCell > mAvg + 1 * mStdD Or rCell < mAvg - 1 * mStdD Then
                'rCell.Font.Color = RGB(255, 0, 0) 'uncomment this line to change font color to red for all outliers
                rCell.Value = "Outlier"
            End If
        Next rCell
    Next rColumn
    
[COLOR=#ff0000]    With Application[/COLOR]
[COLOR=#ff0000]        .ScreenUpdating = True[/COLOR]
[COLOR=#ff0000]        .Calculation = xlCalculationAutomatic[/COLOR]
[COLOR=#ff0000]    End With[/COLOR]
    
End Sub
 
Upvote 0

Hi, JustynaMK, it's a great tip.
I already use this setting as default in my VBE. I tried to use defined types (TYPE, END TYPE), i tried to use collections, but it did not work.
His
code structure
("for each" tip inside another "for each") was very valuable.
and the font color was also pretty cool. thank you again.
I think I'll leave the code as it is.
.. It's great.
 
Upvote 0
Hi

Does this work?
Code:
Public Sub Outlier()
    Dim rngData             As Excel.Range
    
    Const strFormula        As String = "if(row(),if({{address}} > average({{address}})+1*stdev({{address}}),""outlier"",if({{address}} < average({{address}})-1*stdev({{address}}),""outlier"",{{address}})))"
    
    Set rngData = Range("B2:OK1001")
    With rngData
        .Value = Evaluate(Replace$(strFormula, "{{address}}", .Address))
    End With
End Sub
 
Upvote 0
Actually, we need to loop the columns with this approach:
Code:
Public Sub Outlier()
    Dim rngData             As Excel.Range
    Dim rngColumn           As Excel.Range
    Dim lngCalc             As XlCalculation
    
    Const strFormula        As String = "if(row(),if({{address}} > average({{address}})+1*stdev({{address}}),""outlier"",if({{address}} < average({{address}})-1*stdev({{address}}),""outlier"",{{address}})))"
    
    With Application
        lngCalc = .Calculation
        .Calculation = xlCalculationManual
        .ScreenUpdating = False
    End With
    
    Set rngData = Range("B2:OK1001")
    
    For Each rngColumn In rngData.Columns
        With rngColumn
            .Value = Evaluate(Replace$(strFormula, "{{address}}", .Address))
        End With
    Next rngColumn
    
    With Application
        .Calculation = lngCalc
        .ScreenUpdating = False
    End With
End Sub
 
Last edited:
Upvote 0
hello Jon. Thank you for your code. In fact the code already works. Let me explain better:
I have a code that captures data online, and this code accumulates and stores the data inside an array. And every 2 minutes the data are transferred to the worksheet. And then the statistics are calculated.
The question is: I would like to calculate the statistics of the data before transferring them to the worksheet.

Obs. sorry my poor english...rs

best regards,
Leonardo
 
Upvote 0
Hi Leonardo

Thank you, I understand. I hope that the improvement offered by Justyna or I will offer marked performance improvement. I suspect the main culprits were:

1. Calculation being invoked multiple times because you affect each cell in each column, thus triggering calculation; and/or;
2. The fact that you loop each cell in each column.

In terms of processing within the array - I will need to see the code that houses the results in the array. Can you share that?
 
Upvote 0

Forum statistics

Threads
1,223,909
Messages
6,175,310
Members
452,634
Latest member
cpostell

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top