Lognormal Distribution being applied to IQR

psrs0810

Well-known Member
Joined
Apr 14, 2009
Messages
1,109
Can Lognormal Distribution be applied to an IQR calculation? If so,
how?

I have a set of values:
239
6,852.00
9,383.00
10,411.00
11,874.00
13,190.00
16,705.00
16,831.00
18,902.00
19,947.00

When calculating a 3IQR, it would include all of the values.
Q1 (25%) 9,640.00
Q3 (75%) 16,799.50
IQR (Interquartile Range) 7,159.50
IQR x 3 21,478.50
Q1 - (IQRx3) -11,838.50
Q3 - (IQRx3) 38,278.00

I am not understanding how a lognormal distribution works.
from what I have been told, when applying the lognormal
distribution, it would remove the value = 239


I have tried using the excel =LOGNORMDIST. it is asks for:
X - which I am not sure which value I am to use. I used 239
Mean - I calculated as 12,433.40
StDev - I calculated as 6,055.47
This gave me a Lognormal Dist of: .02
If that is correct, how does it get applied to the IQR?
And why would you apply a calculation with a standard deviation that
uses the mean, when the IQR is using the median?

Thank you assistance
 

Excel Facts

Which Excel functions can ignore hidden rows?
The SUBTOTAL and AGGREGATE functions ignore hidden rows. AGGREGATE can also exclude error cells and more.
I do not recall using the LOGNORMDIST function but this is how I would start to go about this problem.

The InterQuartile Range should be a measure of statistical dispersion in a normal distribution and involves finding the middle fifty (%).

It appears that it can still be calculated for the lognormal, but what does it mean, what is the point, and what are you getting at?

Here is an example:
Excel Workbook
ABCDE
1STDin3Start10.37017483
2AVGin7End10
3Select40# Bins10
4LogNormDistNormalizeCumulativeBin
50.009815330.026515390.026515391
60.017764190.047988650.074504042
70.024584160.066412290.140916323
80.030656240.082815580.22373194
90.03617940.097735980.321467885
100.041274830.111500910.432968786
110.046023560.124329250.557298047
120.050483230.136376730.693674778
130.054696750.147759240.8414349
140.058697140.158566110
...

Copy These Doown The Column
Cell Formulas
RangeFormula
E1=SUM(A5:INDEX(A:A,D3+4))
Excel Workbook
ABCD
50.009815330.026515390.026515391
...
Cell Formulas
RangeFormula
A5=LOGNORMDIST(D5,B$2,B$1)
B5=A5/E$1
C5=SUM(B5:B$5)
D5=D$1+(ROW()-5)*(D$2-D$1)/(D$3-1)

Column D can be plotted on the X axis vs the PDF in Column B on the Y axis.
What is Column D plotted against Column C the cumulative?
This will approximate the integral which can give the IQR.
Set bins to a larger number, 100 for instance.
Then find the closest bin numbers between 0.25 and 0.75 and take difference for the middle fifty.

Your 239 would not be part of the middle fifty.
How can you get a middle fifty unless you have bin numbers associated with your data?
 
Upvote 0
JackBean

To answer your first set of questions:
It appears that it can still be calculated for the lognormal, but what does it mean, what is the point, and what are you getting at?

the main reason I am asking is, the lognormal distribution is to help remove any outliers - as you solved at the bottom of your response.

I am not familar how the Lognormal Distribution works.
So I am trying to figure out how it gets applied when removing outliers.
I am also trying to follow your details. So the Lognormal Distribution only gets applied to the IQR?
If so, whats the point? Wouldn't that only change the IQR by a little?

I have a few more questions:
What are: STDin are AVGin
And I am not familar with bin numbers. can you explain?

You have been a huge help thus far - thanks
 
Last edited:
Upvote 0
The STDin is the standard deviation.
The AVGin is the mean or average.

I have noticed in the Excel Help for LOGNORMDIST that it is actually the cumulative function.
So what was previously posted is INCORRECT and this will have to be redone.

From wiki:
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
In probability theory, a log-normal distribution is a probability distribution of a random variable whose logarithm is normally distributed. If Y is a random variable with a normal distribution, then X = exp(Y) has a log-normal distribution; likewise, if X is log-normally distributed, then Y = log(X) is normally distributed.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

How are you calculating these numbers?
Do you have a range of x over which you are generating the lognormal distribution?
What is your AVG and STD of the normal distribution?
 
Upvote 0
The 10 values that I have are coming from a report.
From what I understand, those are the only numbers that are being used for calculation purposes.

I want to know how the report is working to remove outliers.
Based on what I was provided, they are using 3IQR and if a number is negative, they are applying a lognormal distribution.

I did request from the company how the calculation is working and being applied, but I don't have a response just yet.

I am trying to figure out if applying a Lognormal Distribution even makes sense.
 
Upvote 0
So apparently the negative numbers are not wanted and so the data is transformed into a lognormal distribution.

If you have 10 values you may have trouble selecting a middle fifty. That could leave 2 on the low end, and 3 on the high end thrown out, or vice versa.

The problem with deciding on where the breakpoint for 25 and 75 is, when basing this on sample data, is that the sample of data is so small. These breakpoints can shift with each new set of sample data. If the sample is large, then the data set will more closely approximate the Probability Distribution Function (PDF).

If this data is generated by a process, then how do you know what the PDF is?

Excel has a LOGNORMDIST (this is cumulative), and LOGINV "Returns the inverse of the lognormal cumulative distribution function".

I would prefer to use the PDF (see wikipedia "log-normal distribution"), in Column A below starting in Cell A5.

Keep in mind that the entire PDF is not being modeled, but the # of rows can be extended to model as much of the PDF as is desired.

This creates a log-normal distribution as a histogram.
The PDF in Column A is normalized in Column B.
The Cumulative in Column C means, for bins, that for Row 5, any results between 0 and 0.06355 would go in that bin.
This also generates a random selection of results for the log-normal distribution in Column E from the count of the results in Column F.
Excel Workbook
ABCDEF
1STD0.5Start1STDout1.74642492
2AVG1End20AVGout3.5
3Select40Bins20NormSum1.201415586
4LogNormDistNormalizeCumulativeBinCountDraw
50.0763547570.0635539920114
60.3304645660.2750626590.0635539922146
70.3194610720.2659038850.3386166523102
80.2093058310.1742160110.604520537472
90.1200420670.0999171880.778736548534
100.0657424270.0547208040.878653736633
110.0356212810.0296494250.93337454718
120.0194001630.0161477540.963023965812
130.0106982980.0089047440.979171719905
140.0059932990.0049885310.9880764631003
150.0034151460.0028426020.9930649941102
160.001979890.0016479650.9959075961203
170.0011674040.0009716910.997555561305
180.0006996630.0005823660.9985272511401
190.0004259190.0003545140.9991096171504
200.0002631460.000219030.9994641311602
210.0001648760.0001372350.9996831611704
220.0001046848.71341E-050.9998203961802
236.73054E-055.60217E-050.999907531906
244.37891E-053.64479E-050.9999635522002
252
263
272
287
294
302
313
324
332
343
352
362
373
383
393
405
412
426
433
444
...
Cell Formulas
RangeFormula
A5=1/(D5*B$1*(2*PI()))^0.5*EXP(-((LN(D5)-B$2)^2)/(2*B$1^2))
B5=A5/F$3
C5=SUM(B4:B$5)*(ROW()>5)
D5=D$1+(ROW()-5)*(D$2-D$1)/(D$3-1)
E5=COUNTIF(F$5:INDEX(F:F,B$3+4),D5)
F5=VLOOKUP(RAND(),C$5:INDEX(D:D,D$3+4),2)
Cell Formulas
RangeFormula
F1=STDEVP(INDIRECT("F5:F"&D2+4))
F2=AVERAGE(INDIRECT("F5:F"&D2+4))
F3=SUM(A5:INDEX(A:A,D3+4))


X axis is Column D below.
Y axis is Column B (PDF normalized):


Y axis is Column E (randomly generated data):


This shows that the random data more closely matchs the PDF for a large number of samples (5000)
Y axis is Column E (randomly generated data):


A better estimation of where the 0.25 and 0.75 break points should go can be found by setting, in example below, # of Bins to 5000.

0.25 is bracketed here:
Excel Workbook
ABCDE
3290.3493838840.0011176310.2499530112.2314462890
3300.3495539760.0011181750.2510706422.2352470490
...

0.75 is bracketed here:
Excel Workbook
ABCDE
8830.1750798970.0005600560.7499023164.3370674130
8840.1747167210.0005588950.7504623724.3408681740
...

The middle fifty is bracketed between about 2.23 to 4.34.

I am not familiar with the term "3IQR".
It seems possible to apply the IQR to most any PDF.
It is difficult to say whether or how the IQR should be applied to your situation.
There is not enough information given.

This is similar to generating random samples based on the normal distribution:
Random Number based on Normal Distribution
http://www.mrexcel.com/forum/showthread.php?t=465705
Also, this has info on:
Named Formulas for chart ranges
Conditional Formatting Formulas
 
Upvote 0
Jackbean - Thank you for your help on this.
I really doubt that company that I am pulling this report from is applying this complex, multistep calculation, just to remove outliers.

Also, thank you for your response for not being familar with 3IQR. I have looked for information about 3IQR and have found very little.
 
Upvote 0

Forum statistics

Threads
1,223,879
Messages
6,175,145
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top