Identifying outliers

tiredofit

Well-known Member
Joined
Apr 11, 2013
Messages
1,924
Office Version
  1. 365
  2. 2019
Platform
  1. Windows
This is my data:

Code:
300, 300, 300, 300, 300, 20300, 20300

Looking at it, I would say the last two values, 20300, are outliers.


However, following this method:

Code:
https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/box-whisker-plots/a/identifying-outliers-iqr-rule

results in NO outliers beng identified.

Is the algorithm correct?

Thanks
 

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
I have transposed the algorithm into excel myself, step by step, and it seems correct. Are you looking for high outliers as well as low outliers?
 
Upvote 0
You would be right to question why are the two values of 20300 so high when the rest of the data is 300. However if you use one of the standard formulas for outliers then they would not show as being an outlier.

One common formula for outliers is:
Lower Limit = Quartile1 - (1.5)(Interquartile range)
Upper Limit = Quartile 3 + ((1.5)(Interquartile range)

So, as you can see from your data the value would need to be greater than 25300 to be an outlier. However as stated above I would question this data.
 
Last edited:
Upvote 0
I have transposed the algorithm into excel myself, step by step, and it seems correct. Are you looking for high outliers as well as low outliers?
Yes I am.

You would be right to question why are the two values of 20300 so high when the rest of the data is 300. However if you use one of the standard formulas for outliers then they would not show as being an outlier.

One common formula for outliers is:
Lower Limit = Quartile1 - (1.5)(Interquartile range)
Upper Limit = Quartile 3 + ((1.5)(Interquartile range)

So, as you can see from your data the value would need to be greater than 25300 to be an outlier. However as stated above I would question this data.

As far as I know, there does not seem to be universally accepted method which works every time.

After all, if I have data such as:

Code:
300, 300, 300, 300, 300.000001

would you say the last value is an outlier?
 
Upvote 0
I think you have to understand the context of the data that's being reported. I would question both of your data sets. Is it reasonable that reported data would go from 300 to 20300? In the second example how did that one data point become accurate down to 6 decimal points? The person doing the analysis needs to have a good understanding of the data they are working with and use some common sense when something looks out of place and how to proceed with the analysis.
 
Upvote 0
I understand your point but I've been asked "blindly" to identify the outliers, regardless of how the data originated, so am seeking a "one-size-fits-all" approach.
 
Upvote 0
I think you first need to settle upon what "generally accepted" method you are going to use to identify outliers, and go from there.
In your first example, you only have 7 data points, and 2 of them, while far above the others, match each other.
The small sample size does not give a very high confidence level that those are outliers, especially since you have 2 of them (which represents over 25% of your data) that are exactly the same.

So, I think the best approach is to settle upon the method you are going to use to identify outliers, and then the results speak for themselves (whether or not you agree with the results is not really of consequence, as long as you have chosen an acceptable method and applied it correctly).
 
Upvote 0

Forum statistics

Threads
1,223,893
Messages
6,175,242
Members
452,623
Latest member
russelllowellpercy

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top