JenniferMurphy
Well-known Member
- Joined
- Jul 23, 2011
- Messages
- 2,687
- Office Version
- 365
- Platform
- Windows
Some time ago, I was introduced to Z Scores by ExcelGzh in this thread:
I might have studied them in school, but that was a looooong time ago. I have found them very helpful in comparing different distributions, such as the ratings discussed in the thread above. And as discussed in that tread, I looked at how to handle missing values. Unlike the more complicated table and VB code in that thread, this 5-row table has just 4 columns. Col A has values in all 5 rows. Col B is missing 2 values. Col C is missing all but 1. Col D is missing all 5 values.
Above the table, I calculated the Counts, Sums, Means, and Standard Deviations for each of the 4 columns. For the means and standard deviations, I show the standard results, which have errors in columns C & D and the corrected results. The "correction" is to set the mean to 0 if there are 0 values and to set the standard deviation to 1 if there are less than 2 values.
For each of the 4 columns, I created two Z Score columns, Z1 & Z2. The Z1 columns use the standard Z Score formula, (X-Mean)/Stddev. They get errors for any columns with missing values. The Z2 columns apply a "corrected" Z Score formula that I think handles them correctly. Under the table are the means and standard deviations for the Z Score columns. The means should all be 0 and the standard deviations 1. I did not apply any corrections.
The missing values are highlighted in grey. The incorrect results based on those missing values are highlighted in red. And the corrected values using my formulas are highlighted in green.
I would appreciate any comments or suggestions.
Please critique this little UDF to average ratings
I am in the process of researching password managers. I found at least 8 websites with reviews and "ratings". The ratings are all on different scales, so it's difficult to make an apples-to-apples comparison. So I wrote this little UDF. It takes each rating and rescales it to [0,1]. The...
www.mrexcel.com
I might have studied them in school, but that was a looooong time ago. I have found them very helpful in comparing different distributions, such as the ratings discussed in the thread above. And as discussed in that tread, I looked at how to handle missing values. Unlike the more complicated table and VB code in that thread, this 5-row table has just 4 columns. Col A has values in all 5 rows. Col B is missing 2 values. Col C is missing all but 1. Col D is missing all 5 values.
Above the table, I calculated the Counts, Sums, Means, and Standard Deviations for each of the 4 columns. For the means and standard deviations, I show the standard results, which have errors in columns C & D and the corrected results. The "correction" is to set the mean to 0 if there are 0 values and to set the standard deviation to 1 if there are less than 2 values.
For each of the 4 columns, I created two Z Score columns, Z1 & Z2. The Z1 columns use the standard Z Score formula, (X-Mean)/Stddev. They get errors for any columns with missing values. The Z2 columns apply a "corrected" Z Score formula that I think handles them correctly. Under the table are the means and standard deviations for the Z Score columns. The means should all be 0 and the standard deviations 1. I did not apply any corrections.
The missing values are highlighted in grey. The incorrect results based on those missing values are highlighted in red. And the corrected values using my formulas are highlighted in green.
I would appreciate any comments or suggestions.
Z-Scores.xlsx | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
B | C | D | E | F | G | H | I | J | K | L | M | N | |||
3 | Counts | 5 | 3 | 1 | 0 | ||||||||||
4 | Sums | 15 | 11 | 4 | 0 | ||||||||||
5 | Means (standard) | 3.00 | 3.67 | 4.00 | #### | ||||||||||
6 | Means (corrected) | 3.00 | 3.67 | 4.00 | 0.00 | ||||||||||
7 | Std Devs (standard) | 1.58 | 1.53 | #### | #### | ||||||||||
8 | Std Devs (corrected) | 1.58 | 1.53 | 1.00 | 1.00 | ||||||||||
9 | |||||||||||||||
10 | A | B | C | D | A-Z1 | A-Z2 | B-Z1 | B-Z2 | C-Z1 | C-Z2 | D-Z1 | D-Z2 | |||
11 | 1 | -1.26 | -1.26 | -1.75 | #DIV/0! | #DIV/0! | |||||||||
12 | 2 | 2 | -0.63 | -0.63 | -1.09 | -1.09 | #DIV/0! | #DIV/0! | |||||||
13 | 3 | =0.00 | =0.00 | -0.44 | #DIV/0! | #DIV/0! | |||||||||
14 | 4 | 4 | 4 | +0.63 | +0.63 | +0.22 | +0.22 | #DIV/0! | =0.00 | #DIV/0! | |||||
15 | 5 | 5 | +1.26 | +1.26 | +0.87 | +0.87 | #DIV/0! | #DIV/0! | |||||||
16 | |||||||||||||||
17 | Means | 0.00 | 0.00 | -0.44 | 0.00 | #DIV/0! | 0.00 | #DIV/0! | #DIV/0! | ||||||
18 | Std Devs | 1.00 | 1.00 | 1.04 | 1.00 | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! | ||||||
Corrected Z Scores |
Cell Formulas | ||
---|---|---|
Range | Formula | |
C3 | C3 | =COUNT(Table4[A]) |
D3 | D3 | =COUNT(Table4[B]) |
E3 | E3 | =COUNT(Table4[C]) |
F3 | F3 | =COUNT(Table4[D]) |
C4 | C4 | =SUM(Table4[A]) |
D4 | D4 | =SUM(Table4[B]) |
E4 | E4 | =SUM(Table4[C]) |
F4 | F4 | =SUM(Table4[D]) |
C5 | C5 | =AVERAGE(Table4[A]) |
D5 | D5 | =AVERAGE(Table4[B]) |
E5 | E5 | =AVERAGE(Table4[C]) |
F5 | F5 | =AVERAGE(Table4[D]) |
C6 | C6 | =IF(COUNT(Table4[A])>0,AVERAGE(Table4[A]),0) |
D6 | D6 | =IF(COUNT(Table4[B])>0,AVERAGE(Table4[B]),0) |
E6 | E6 | =IF(COUNT(Table4[C])>0,AVERAGE(Table4[C]),0) |
F6 | F6 | =IF(COUNT(Table4[D])>0,AVERAGE(Table4[D]),0) |
C7 | C7 | =STDEV.S(Table4[A]) |
D7 | D7 | =STDEV.S(Table4[B]) |
E7 | E7 | =STDEV.S(Table4[C]) |
F7 | F7 | =STDEV.S(Table4[D]) |
C8 | C8 | =IF(COUNT(Table4[A])>1,STDEV.S(Table4[A]),1) |
D8 | D8 | =IF(COUNT(Table4[B])>1,STDEV.S(Table4[B]),1) |
E8 | E8 | =IF(COUNT(Table4[C])>1,STDEV.S(Table4[C]),1) |
F8 | F8 | =IF(COUNT(Table4[D])>1,STDEV.S(Table4[D]),1) |
G11:G15 | G11 | =([@A]-MeanA1)/StdDevA1 |
H11:H15 | H11 | =IF(ISNUMBER([@A]),([@A]-MeanA2)/StdDevA2,"") |
I11:I15 | I11 | =([@A]-MeanB1)/StdDevB1 |
J11:J15 | J11 | =IF(ISNUMBER([@B]),([@B]-MeanB2)/StdDevB2,"") |
K11:K15 | K11 | =([@C]-MeanC1)/StdDevC1 |
L11:L15 | L11 | =IF(ISNUMBER([@C]),([@C]-MeanC2)/StdDevC2,"") |
M11:M15 | M11 | =([@D]-MeanD1)/StdDevD1 |
N11:N15 | N11 | =IF(ISNUMBER([@D]),([@D]-MeanD2)/StdDevD2,"") |
G17 | G17 | =AVERAGE(Table4[A-Z1]) |
H17 | H17 | =AVERAGE(Table4[A-Z2]) |
I17 | I17 | =AVERAGE(Table4[B-Z1]) |
J17 | J17 | =AVERAGE(Table4[B-Z2]) |
K17,M17 | K17 | =AVERAGE(Table4[C-Z1]) |
L17 | L17 | =AVERAGE(Table4[C-Z2]) |
N17 | N17 | =AVERAGE(Table4[D-Z2]) |
G18 | G18 | =STDEV.S(Table4[A-Z1]) |
H18 | H18 | =STDEV.S(Table4[A-Z2]) |
I18 | I18 | =STDEV.S(Table4[B-Z1]) |
J18 | J18 | =STDEV.S(Table4[B-Z2]) |
K18 | K18 | =STDEV.S(Table4[C-Z1]) |
L18 | L18 | =STDEV.S(Table4[C-Z2]) |
M18 | M18 | =STDEV.S(Table4[D-Z1]) |
N18 | N18 | =STDEV.S(Table4[D-Z2]) |
Named Ranges | ||
---|---|---|
Name | Refers To | Cells |
'Corrected Z Scores'!MeanA1 | ='Corrected Z Scores'!$C$5 | G11:G15 |
'Corrected Z Scores'!MeanA2 | ='Corrected Z Scores'!$C$6 | H11:H15 |
'Corrected Z Scores'!MeanB1 | ='Corrected Z Scores'!$D$5 | I11:I15 |
'Corrected Z Scores'!MeanB2 | ='Corrected Z Scores'!$D$6 | J11:J15 |
'Corrected Z Scores'!MeanC1 | ='Corrected Z Scores'!$E$5 | K11:K15 |
'Corrected Z Scores'!MeanC2 | ='Corrected Z Scores'!$E$6 | L11:L15 |
'Corrected Z Scores'!MeanD1 | ='Corrected Z Scores'!$F$5 | M11:M15 |
'Corrected Z Scores'!MeanD2 | ='Corrected Z Scores'!$F$6 | N11:N15 |
'Corrected Z Scores'!StdDevA1 | ='Corrected Z Scores'!$C$7 | G11:G15 |
'Corrected Z Scores'!StdDevA2 | ='Corrected Z Scores'!$C$8 | H11:H15 |
'Corrected Z Scores'!StdDevB1 | ='Corrected Z Scores'!$D$7 | I11:I15 |
'Corrected Z Scores'!StdDevB2 | ='Corrected Z Scores'!$D$8 | J11:J15 |
'Corrected Z Scores'!StdDevC1 | ='Corrected Z Scores'!$E$7 | K11:K15 |
'Corrected Z Scores'!StdDevC2 | ='Corrected Z Scores'!$E$8 | L11:L15 |
'Corrected Z Scores'!StdDevD1 | ='Corrected Z Scores'!$F$7 | M11:M15 |
'Corrected Z Scores'!StdDevD2 | ='Corrected Z Scores'!$F$8 | N11:N15 |