How to adjust (correct) Z Scores for missing values?

JenniferMurphy

Well-known Member
Joined
Jul 23, 2011
Messages
2,687
Office Version
  1. 365
Platform
  1. Windows
Some time ago, I was introduced to Z Scores by ExcelGzh in this thread:


I might have studied them in school, but that was a looooong time ago. I have found them very helpful in comparing different distributions, such as the ratings discussed in the thread above. And as discussed in that tread, I looked at how to handle missing values. Unlike the more complicated table and VB code in that thread, this 5-row table has just 4 columns. Col A has values in all 5 rows. Col B is missing 2 values. Col C is missing all but 1. Col D is missing all 5 values.

Above the table, I calculated the Counts, Sums, Means, and Standard Deviations for each of the 4 columns. For the means and standard deviations, I show the standard results, which have errors in columns C & D and the corrected results. The "correction" is to set the mean to 0 if there are 0 values and to set the standard deviation to 1 if there are less than 2 values.

For each of the 4 columns, I created two Z Score columns, Z1 & Z2. The Z1 columns use the standard Z Score formula, (X-Mean)/Stddev. They get errors for any columns with missing values. The Z2 columns apply a "corrected" Z Score formula that I think handles them correctly. Under the table are the means and standard deviations for the Z Score columns. The means should all be 0 and the standard deviations 1. I did not apply any corrections.

The missing values are highlighted in grey. The incorrect results based on those missing values are highlighted in red. And the corrected values using my formulas are highlighted in green.

I would appreciate any comments or suggestions.

Cell Formulas
RangeFormula
C3C3=COUNT(Table4[A])
D3D3=COUNT(Table4[B])
E3E3=COUNT(Table4[C])
F3F3=COUNT(Table4[D])
C4C4=SUM(Table4[A])
D4D4=SUM(Table4[B])
E4E4=SUM(Table4[C])
F4F4=SUM(Table4[D])
C5C5=AVERAGE(Table4[A])
D5D5=AVERAGE(Table4[B])
E5E5=AVERAGE(Table4[C])
F5F5=AVERAGE(Table4[D])
C6C6=IF(COUNT(Table4[A])>0,AVERAGE(Table4[A]),0)
D6D6=IF(COUNT(Table4[B])>0,AVERAGE(Table4[B]),0)
E6E6=IF(COUNT(Table4[C])>0,AVERAGE(Table4[C]),0)
F6F6=IF(COUNT(Table4[D])>0,AVERAGE(Table4[D]),0)
C7C7=STDEV.S(Table4[A])
D7D7=STDEV.S(Table4[B])
E7E7=STDEV.S(Table4[C])
F7F7=STDEV.S(Table4[D])
C8C8=IF(COUNT(Table4[A])>1,STDEV.S(Table4[A]),1)
D8D8=IF(COUNT(Table4[B])>1,STDEV.S(Table4[B]),1)
E8E8=IF(COUNT(Table4[C])>1,STDEV.S(Table4[C]),1)
F8F8=IF(COUNT(Table4[D])>1,STDEV.S(Table4[D]),1)
G11:G15G11=([@A]-MeanA1)/StdDevA1
H11:H15H11=IF(ISNUMBER([@A]),([@A]-MeanA2)/StdDevA2,"")
I11:I15I11=([@A]-MeanB1)/StdDevB1
J11:J15J11=IF(ISNUMBER([@B]),([@B]-MeanB2)/StdDevB2,"")
K11:K15K11=([@C]-MeanC1)/StdDevC1
L11:L15L11=IF(ISNUMBER([@C]),([@C]-MeanC2)/StdDevC2,"")
M11:M15M11=([@D]-MeanD1)/StdDevD1
N11:N15N11=IF(ISNUMBER([@D]),([@D]-MeanD2)/StdDevD2,"")
G17G17=AVERAGE(Table4[A-Z1])
H17H17=AVERAGE(Table4[A-Z2])
I17I17=AVERAGE(Table4[B-Z1])
J17J17=AVERAGE(Table4[B-Z2])
K17,M17K17=AVERAGE(Table4[C-Z1])
L17L17=AVERAGE(Table4[C-Z2])
N17N17=AVERAGE(Table4[D-Z2])
G18G18=STDEV.S(Table4[A-Z1])
H18H18=STDEV.S(Table4[A-Z2])
I18I18=STDEV.S(Table4[B-Z1])
J18J18=STDEV.S(Table4[B-Z2])
K18K18=STDEV.S(Table4[C-Z1])
L18L18=STDEV.S(Table4[C-Z2])
M18M18=STDEV.S(Table4[D-Z1])
N18N18=STDEV.S(Table4[D-Z2])
Named Ranges
NameRefers ToCells
'Corrected Z Scores'!MeanA1='Corrected Z Scores'!$C$5G11:G15
'Corrected Z Scores'!MeanA2='Corrected Z Scores'!$C$6H11:H15
'Corrected Z Scores'!MeanB1='Corrected Z Scores'!$D$5I11:I15
'Corrected Z Scores'!MeanB2='Corrected Z Scores'!$D$6J11:J15
'Corrected Z Scores'!MeanC1='Corrected Z Scores'!$E$5K11:K15
'Corrected Z Scores'!MeanC2='Corrected Z Scores'!$E$6L11:L15
'Corrected Z Scores'!MeanD1='Corrected Z Scores'!$F$5M11:M15
'Corrected Z Scores'!MeanD2='Corrected Z Scores'!$F$6N11:N15
'Corrected Z Scores'!StdDevA1='Corrected Z Scores'!$C$7G11:G15
'Corrected Z Scores'!StdDevA2='Corrected Z Scores'!$C$8H11:H15
'Corrected Z Scores'!StdDevB1='Corrected Z Scores'!$D$7I11:I15
'Corrected Z Scores'!StdDevB2='Corrected Z Scores'!$D$8J11:J15
'Corrected Z Scores'!StdDevC1='Corrected Z Scores'!$E$7K11:K15
'Corrected Z Scores'!StdDevC2='Corrected Z Scores'!$E$8L11:L15
'Corrected Z Scores'!StdDevD1='Corrected Z Scores'!$F$7M11:M15
'Corrected Z Scores'!StdDevD2='Corrected Z Scores'!$F$8N11:N15
 

Excel Facts

Round to nearest half hour?
Use =MROUND(A2,"0:30") to round to nearest half hour. Use =CEILING(A2,"0:30") to round to next half hour.
What do you mean correct it? The standard deviation of a dataset with only one data point is undefined. You need at least 2 points to calculate the average distance away from the mean.
 
Upvote 0
What do you mean correct it? The standard deviation of a dataset with only one data point is undefined. You need at least 2 points to calculate the average distance away from the mean.
When I said "correct it", I was mainly referring to the Z Scores. The "corrected" means and standard deviations are just there so that the Z Score formulas will work. Perhaps I should have called them "adjusted". I did that so the Z Score formulas would be less complicated. Otherwise, I would have had to make 2-3 true/false tests in every formula.

The main question is whether the corrected (adjusted?) Z Scores are valid. I want valid Z Scores regardless of how many values are missing. This is so I can calculate a valid Z Sum (sum of all of the Z Scores), which I then convert to a 0-100 scale.

How about this table? I got rid of all of the "before" columns. I added a Z Sum column and a "100-0" column. Do you see any problems with it? Is there a better way?

Z-Scores.xlsx
BCDEFGHIJKL
3Counts5310
4Sums23.127.840Z Sum Max+1.770
5Means (adjusted)4.629.274.000.00Z Sum Min-1.834
6Std Devs (adjusted)0.280.211.001.00Z Sum Wid3.603
7
8Rtg1Rtg2Rtg3Rtg4Z Rtg1Z Rtg2Z Rtg3Z Rtg4Z Sum100-0
94.9+1.01   +1.0179
104.59.1-0.43-0.80  -1.2317
114.7+0.29   +0.2959
124.29.24-1.51-0.32=0.00 -1.830
134.89.5+0.65+1.12  +1.77100
14
15Z Score Means0.000.000.00#####
16Z Score Std Devs1.001.00##########
Corrected Z Scores
Cell Formulas
RangeFormula
C3C3=COUNT(Table4[Rtg1])
D3D3=COUNT(Table4[Rtg2])
E3E3=COUNT(Table4[Rtg3])
F3F3=COUNT(Table4[Rtg4])
C4C4=SUM(Table4[Rtg1])
D4D4=SUM(Table4[Rtg2])
E4E4=SUM(Table4[Rtg3])
F4F4=SUM(Table4[Rtg4])
C5C5=IF(COUNT(Table4[Rtg1])>0,AVERAGE(Table4[Rtg1]),0)
D5D5=IF(COUNT(Table4[Rtg2])>0,AVERAGE(Table4[Rtg2]),0)
E5E5=IF(COUNT(Table4[Rtg3])>0,AVERAGE(Table4[Rtg3]),0)
F5F5=IF(COUNT(Table4[Rtg4])>0,AVERAGE(Table4[Rtg4]),0)
C6C6=IF(COUNT(Table4[Rtg1])>1,STDEV.S(Table4[Rtg1]),1)
D6D6=IF(COUNT(Table4[Rtg2])>1,STDEV.S(Table4[Rtg2]),1)
E6E6=IF(COUNT(Table4[Rtg3])>1,STDEV.S(Table4[Rtg3]),1)
F6F6=IF(COUNT(Table4[Rtg4])>1,STDEV.S(Table4[Rtg4]),1)
K4K4=MAX(Table4[Z Sum])
K5K5=MIN(Table4[Z Sum])
K6K6=ZSumMax-ZSumMin
G9:G13G9=IF(ISNUMBER([@Rtg1]),([@Rtg1]-MeanA)/StdDevA,"")
H9:H13H9=IF(ISNUMBER([@Rtg2]),([@Rtg2]-MeanB)/StdDevB,"")
I9:I13I9=IF(ISNUMBER([@Rtg3]),([@Rtg3]-MeanC)/StdDevC,"")
J9:J13J9=IF(ISNUMBER([@Rtg4]),([@Rtg4]-MeanD)/StdDevD,"")
K9:K13K9=SUM(Table4[@[Z Rtg1]:[Z Rtg4]])
L9:L13L9=([@[Z Sum]]-ZSumMin)/ZSumWid*100
G15G15=AVERAGE(Table4[Z Rtg1])
H15H15=AVERAGE(Table4[Z Rtg2])
I15I15=AVERAGE(Table4[Z Rtg3])
J15J15=AVERAGE(Table4[Z Rtg4])
G16G16=STDEV.S(Table4[Z Rtg1])
H16H16=STDEV.S(Table4[Z Rtg2])
I16I16=STDEV.S(Table4[Z Rtg3])
J16J16=STDEV.S(Table4[Z Rtg4])
Named Ranges
NameRefers ToCells
'Corrected Z Scores'!MeanA='Corrected Z Scores'!$C$5G9:G13
'Corrected Z Scores'!MeanB='Corrected Z Scores'!$D$5H9:H13
'Corrected Z Scores'!MeanC='Corrected Z Scores'!$E$5I9:I13
'Corrected Z Scores'!MeanD='Corrected Z Scores'!$F$5J9:J13
'Corrected Z Scores'!StdDevA='Corrected Z Scores'!$C$6G9:G13
'Corrected Z Scores'!StdDevB='Corrected Z Scores'!$D$6H9:H13
'Corrected Z Scores'!StdDevC='Corrected Z Scores'!$E$6I9:I13
'Corrected Z Scores'!StdDevD='Corrected Z Scores'!$F$6J9:J13
'Corrected Z Scores'!ZSumMax='Corrected Z Scores'!$K$4K6
'Corrected Z Scores'!ZSumMin='Corrected Z Scores'!$K$5K6, L9:L13
'Corrected Z Scores'!ZSumWid='Corrected Z Scores'!$K$6L9:L13
 
Upvote 0
I would use =IFERROR(...,""). A tad shorter.

I was able to use IFERROR for the means and standard deviations, but not for the zscores. The zscore formula, (X-mean)/stddev, does not generate an error if X is an empty cell. Excel takes it to be zero. That's why I used IF(ISANUMBER(...),""). A revised table is below. I added a little table in N8:Q13 showing that the raw formulas do not generate an error and they do generate invalid results.

This is all helpful, but it is beside the point. My main questions is, whatever formulas I use to calculate those zscores, are they valid?

Z-Scores.xlsx
BCDEFGHIJKLMNOPQ
3Counts5310
4Sums23.127.840Z Sum Max+1.770
5Means (adjusted)4.629.274.000.00Z Sum Min-1.834
6Std Devs (adjusted)0.280.211.001.00Z Sum Wid3.603
7
8Rtg-ARtg-BRtg-CRtg-DZ Rtg-AZ Rtg-BZ Rtg-CZ Rtg-DZ Sum100-0Z Rtg-AZ Rtg-BZ Rtg-CZ Rtg-D
94.9+1.01   +1.0179+1.01-44.52-4.00=0.00
104.59.1-0.43-0.80  -1.2317-0.43-0.80-4.00=0.00
114.7+0.29   +0.2959+0.29-44.52-4.00=0.00
124.29.24-1.51-0.32=0.00 -1.830-1.51-0.32=0.00=0.00
134.89.5+0.65+1.12  +1.77100+0.65+1.12-4.00=0.00
14
15Z Score Means0.000.000.000.00
16Z Score Std Devs1.001.001.001.00
Corrected Z Scores
Cell Formulas
RangeFormula
C3C3=COUNT(Table4[Rtg-A])
D3D3=COUNT(Table4[Rtg-B])
E3E3=COUNT(Table4[Rtg-C])
F3F3=COUNT(Table4[Rtg-D])
C4C4=SUM(Table4[Rtg-A])
D4D4=SUM(Table4[Rtg-B])
E4E4=SUM(Table4[Rtg-C])
F4F4=SUM(Table4[Rtg-D])
C5C5=IFERROR(AVERAGE(Table4[Rtg-A]),0)
D5D5=IFERROR(AVERAGE(Table4[Rtg-B]),0)
E5E5=IFERROR(AVERAGE(Table4[Rtg-C]),0)
F5F5=IFERROR(AVERAGE(Table4[Rtg-D]),0)
C6C6=IFERROR(STDEV.S(Table4[Rtg-A]),1)
D6D6=IFERROR(STDEV.S(Table4[Rtg-B]),1)
E6E6=IFERROR(STDEV.S(Table4[Rtg-C]),1)
F6F6=IFERROR(STDEV.S(Table4[Rtg-D]),1)
K4K4=MAX(Table4[Z Sum])
K5K5=MIN(Table4[Z Sum])
K6K6=ZSumMax-ZSumMin
G9:G13G9=IF(ISNUMBER([@[Rtg-A]]),([@[Rtg-A]]-MeanA)/StdDevA,"")
H9:H13H9=IF(ISNUMBER([@[Rtg-B]]),([@[Rtg-B]]-MeanB)/StdDevB,"")
I9:I13I9=IF(ISNUMBER([@[Rtg-C]]),([@[Rtg-C]]-MeanC)/StdDevC,"")
J9:J13J9=IF(ISNUMBER([@[Rtg-D]]),([@[Rtg-D]]-MeanD)/StdDevD,"")
K9:K13K9=SUM(Table4[@[Z Rtg-A]:[Z Rtg-D]])
L9:L13L9=([@[Z Sum]]-ZSumMin)/ZSumWid*100
N9:N13N9=(Table4[@[Rtg-A]]-MeanA)/StdDevA
O9:O13O9=(Table4[@[Rtg-B]]-MeanB)/StdDevB
P9:P13P9=(Table4[@[Rtg-C]]-MeanC)/StdDevC
Q9:Q13Q9=(Table4[@[Rtg-D]]-MeanD)/StdDevD
G15:J15G15=IFERROR(AVERAGE(Table4[Z Rtg-D]),0)
G16G16=IFERROR(STDEV.S(Table4[Z Rtg-A]),1)
H16H16=IFERROR(STDEV.S(Table4[Z Rtg-B]),1)
I16I16=IFERROR(STDEV.S(Table4[Z Rtg-C]),1)
J16J16=IFERROR(STDEV.S(Table4[Z Rtg-D]),1)
Named Ranges
NameRefers ToCells
'Corrected Z Scores'!MeanA='Corrected Z Scores'!$C$5G9:G13, N9:N13
'Corrected Z Scores'!MeanB='Corrected Z Scores'!$D$5H9:H13, O9:O13
'Corrected Z Scores'!MeanC='Corrected Z Scores'!$E$5I9:I13, P9:P13
'Corrected Z Scores'!MeanD='Corrected Z Scores'!$F$5J9:J13, Q9:Q13
'Corrected Z Scores'!StdDevA='Corrected Z Scores'!$C$6G9:G13, N9:N13
'Corrected Z Scores'!StdDevB='Corrected Z Scores'!$D$6O9:O13, H9:H13
'Corrected Z Scores'!StdDevC='Corrected Z Scores'!$E$6I9:I13, P9:P13
'Corrected Z Scores'!StdDevD='Corrected Z Scores'!$F$6J9:J13, Q9:Q13
'Corrected Z Scores'!ZSumMax='Corrected Z Scores'!$K$4K6
'Corrected Z Scores'!ZSumMin='Corrected Z Scores'!$K$5K6, L9:L13
'Corrected Z Scores'!ZSumWid='Corrected Z Scores'!$K$6L9:L13
 
Upvote 0
It's valid when you have at least 2 points of data. Otherwise, you're just masking the error, which is the correct output as mentioned above.
 
Upvote 0

Forum statistics

Threads
1,223,880
Messages
6,175,157
Members
452,615
Latest member
bogeys2birdies

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top