I have a set of variables I would like to generate distributions for and draw random numbers from for a Monte Carlo Simulation. For each variable, I have the mean, Upper 95% CI, and lower 95% CI. The upper and lower CI's are almost never equal.
The problem is that the two Confidence Intervals are not equal, suggesting that the distribution is skewed. I'm no statistician either, so I may be getting my terminology wrong [....] The question is how to back-engineer the skewness based on the inputs I have mean and 2 CIs...I think. Here's a sub-sample of the data.
[TABLE="width: 245"]
<tbody>[TR]
[TD="align: right"]Mean
[/TD]
[TD="align: right"]Low 95% CI[/TD]
[TD="align: right"]High 95% CI[/TD]
[TD="align: right"]N[/TD]
[/TR]
[TR]
[TD="align: right"]115.65[/TD]
[TD="align: right"]0[/TD]
[TD="align: right"]252.02[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]208.31[/TD]
[TD="align: right"]87.09[/TD]
[TD="align: right"]348.64[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]384.91[/TD]
[TD="align: right"]262.09[/TD]
[TD="align: right"]603.61[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]224.61[/TD]
[TD="align: right"]78.11[/TD]
[TD="align: right"]392.8[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]72.09[/TD]
[TD="align: right"]0[/TD]
[TD="align: right"]202.77[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]215.68[/TD]
[TD="align: right"]77.25[/TD]
[TD="align: right"]363.99[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]145.65[/TD]
[TD="align: right"]14.62[/TD]
[TD="align: right"]277.76[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]192.74[/TD]
[TD="align: right"]85.3[/TD]
[TD="align: right"]347.23[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]188.14[/TD]
[TD="align: right"]59.29[/TD]
[TD="align: right"]315.74[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]107.7[/TD]
[TD="align: right"]0[/TD]
[TD="align: right"]228.55[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]110.78[/TD]
[TD="align: right"]0[/TD]
[TD="align: right"]242.27[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]183.73[/TD]
[TD="align: right"]70.87[/TD]
[TD="align: right"]336.32[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]127.85[/TD]
[TD="align: right"]1.55[/TD]
[TD="align: right"]286.09[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]127.54[/TD]
[TD="align: right"]24.44[/TD]
[TD="align: right"]265.96[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]56.3[/TD]
[TD="align: right"]0[/TD]
[TD="align: right"]169.58[/TD]
[TD="align: right"]100[/TD]
[/TR]
[TR]
[TD="align: right"]70.82[/TD]
[TD="align: right"]0[/TD]
[TD="align: right"]198.1[/TD]
[TD="align: right"]100[/TD]
[/TR]
</tbody>[/TABLE]
To summarize the following, the Monte Carlo simulation might generate random data "with 95% confidence" using NORMINV(R1,M1,S1), where:
R1: =RAND()
M1: =AVERAGE(A1:A16)
S1: =TINV(1-95%,COUNT(A1:A16)-1)*STDEV(A1:A16)*SQRT(100)
assuming the means of the samples (left-most column above) are in A1:A16.
Alternatively, you might create a range for each data point "with 95% confidence" using a low of NORMINV(R1,M2,S1) and a high of NORMINV(R1,M3,S2), where:
M2: =AVERAGE(A1:A16)-TINV(1-95%,COUNT(A1:A16)-1)*STDEV(A1:A16)
M3: =AVERAGE(A1:A16)+TINV(1-95%,COUNT(A1:A16)-1)*STDEV(A1:A16)
-----
I believe you are misusing terminology and misinterpreting the data that you have, and based on that, there seems to be a lot of misdirection in this thread.
First, a "confidence interval" is a measure of the estimate of a population
parameter or sample
statistic. It is not a measure of
the distribution of the population or sample. For example, for a normal distribution, we might talk about the confidence interval of the sample or population
mean.
Second, the confidence interval of
the mean is always symmetrical. Specifically, it is u +/- se, where "u" is the population or sample mean, and "se" is the "standard error" of the mean. This is derived from the Central Limit Theorem, which states in effect that
the means of equal-sized random samples from a normally-distributed population
are normally distributed. The "standard error" of the mean is the standard deviation of the distribution of
the means.
Note that
the means of the samples are normally distributed. That
does not imply the population is normally distributed. But for non-normally-distributed poplulations, the means of the samples
tend to be normally distributed as the size of the samples increases. Thus, the average of
the means of the samples is an estimator of the mean of the population.
So my interpretation of your data is: we have 16 samples of size 100 from a population of unknown distribution.
I suspect the so-called low and high "95% confidence intervals" [sic; read: confidence
limits) probably mean that
95% of each sample data are between those limits. These are not confidence limits. Instead, they are the
central 95%ile range of the sample data. As such, they are not particularly interesting or useful.
If the means of the samples are in A1:A16, AVERAGE(A1:A16), about 158.28, is the average of
the means of the samples. And STDEV(A1:A16), about 81.00, is the sample standard error of
the means of the samples.
If we want to state the average of
the means with 95% confidence, the 95% confidence limits would be +/-NORMSINV(1-(1-95%)/2), about 1.96, times the standard error if we had a "large" number of samples. For a small number of samples, the 95% confidence limits are +/-TINV(1-95%,COUNT(A1:A16)-1), about 2.13, times the standard error.
So we are 95% confident that the average of
the means is about 158.28 +/- 172.66. (172.66 =~ TINV(1-95%,COUNT(A1:A16)-1)*STDEV(A1:A16).)
Note that the relative standard error, about 81.00, is about 51.18% of the average of the means. Typically, that is too large an interval to be useful. For a tighter interval, we need the means of more samples -- probably many more. We can estimate the number of required samples.
Based on the CLT, we use AVERAGE(A1:A16) as the estimated mean of the population data from which we drew 16 samples.
If we assume the population data is normally distributed (note: we have
no reason to make that assumption), we are 95% confident that the estimated standard deviation of the population is the 95% confidence limit of the means times SQRT(100), where 100 is the sample size.
That is because the CLT tells us: se = sd / Sqrt, where "sd" is the sample standard deviation of a sample of size n.
Thus, the estimated standard deviation of the population data is TINV(1-95%,COUNT(A1:A16)-1)*STDEV(A1:A16)*SQRT(100), about 1726.57.
So your Monte Carlo simulation might generate random data using NORMINV(RAND(),158.28,1726.57), substituting the actual formulas for those constants.
Needless to say, that is likely to be a very poor simulation of the population data for several reasons:
1. As noted, the relative standard error is unreasonably high.
2. We have no reason to assume the population data is normally distributed. That is, you have not given us a reason.
3. Some people (not I) believe NORMINV(RAND(),...) results in a very poor random sample, largely because of claims that the Excel RAND has poor random properties, especially before Excel 2010 and even the new implementation starting in Excel 2010.