Comparing errors

JenniferMurphy

Well-known Member
Joined
Jul 23, 2011
Messages
2,691
Office Version
  1. 365
Platform
  1. Windows
This is really a statistics question. If this is inappropriate, let me know. I've tried getting help in a couple of statistics forums, but I always get better answers here and I know Excel is often used for statistical calculations.

I am interested in comparing the "tally errors" is different "win streaks" in a random event, like a game, with a given probability of a win. The table shows results for win streaks of various lengths (Row 5). Columns D-G contain data for these 4 streak lengths over 4,000 games with an overall winning % (Wins/Games) of 60%.

Col H shows the formulas used for the calculations.

[TABLE="class: grid, width: 650"]
<tbody>[TR]
[TD="align: center"]R/C[/TD]
[TD="align: center"]C[/TD]
[TD="align: center"]D[/TD]
[TD="align: center"]E[/TD]
[TD="align: center"]F[/TD]
[TD="align: center"]G[/TD]
[TD="align: center"]H[/TD]
[/TR]
[TR]
[TD="align: center"]3[/TD]
[TD]#Games[/TD]
[TD="align: center"]4000[/TD]
[TD="align: center"]4000[/TD]
[TD="align: center"]4000[/TD]
[TD="align: center"]4000[/TD]
[TD]D3: 4000[/TD]
[/TR]
[TR]
[TD="align: center"]4[/TD]
[TD]Win %[/TD]
[TD="align: center"]60.00%[/TD]
[TD="align: center"]60.00%[/TD]
[TD="align: center"]60.00%[/TD]
[TD="align: center"]60.00%[/TD]
[TD]D4: 0.6[/TD]
[/TR]
[TR]
[TD="align: center"]5[/TD]
[TD]Streak Length[/TD]
[TD="align: center"]1[/TD]
[TD="align: center"]2[/TD]
[TD="align: center"]3[/TD]
[TD="align: center"]4[/TD]
[TD]D5: 1[/TD]
[/TR]
[TR]
[TD="align: center"]6[/TD]
[TD]Exp Streak %[/TD]
[TD="align: center"]24.00%[/TD]
[TD="align: center"]14.40%[/TD]
[TD="align: center"]8.64%[/TD]
[TD="align: center"]5.18%[/TD]
[TD]D6: =(D4^D5)*(1-D4)[/TD]
[/TR]
[TR]
[TD="align: center"]7[/TD]
[TD]Expected Tally[/TD]
[TD="align: center"]384.00[/TD]
[TD="align: center"]230.40[/TD]
[TD="align: center"]138.24[/TD]
[TD="align: center"]82.94[/TD]
[TD]D7: =D6*D3*(1-D4)[/TD]
[/TR]
[TR]
[TD="align: center"]8[/TD]
[TD]Actual Tally[/TD]
[TD="align: center"]390[/TD]
[TD="align: center"]227[/TD]
[TD="align: center"]141[/TD]
[TD="align: center"]81[/TD]
[TD]D8: 390[/TD]
[/TR]
[TR]
[TD="align: center"]9[/TD]
[TD]Error (Act-Exp)[/TD]
[TD="align: center"]6.00[/TD]
[TD="align: center"]-3.40[/TD]
[TD="align: center"]2.76[/TD]
[TD="align: center"]-1.94[/TD]
[TD]D9: =D8-D7[/TD]
[/TR]
[TR]
[TD="align: center"]10[/TD]
[TD]Error %[/TD]
[TD="align: center"]1.54%[/TD]
[TD="align: center"]-1.50%[/TD]
[TD="align: center"]1.96%[/TD]
[TD="align: center"]-2.40%[/TD]
[TD]D10: =D9/D8[/TD]
[/TR]
</tbody>[/TABLE]

Row 6 shows the expected probability for each streak length. Row 7 shows the expected tally for each streak length. Row 8 shows the actual tallies. Row 9 shows the difference in these percentages and Row 10 shows the percentage error.

My question is this: Are the values in Row 10 the best measures of how closely the actual tallies (Row 8) are to the expected tallies (Row 7)? If there is a better measure, I would appreciate some help understanding what it is and how to calculate it. I am looking for a measure that will allow me to rank the various tallies according to the error from the expected.

I hope I explained that adequately.
 

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.
Jennifer,
I'm not sure if this is appropriate, ie, does my logic hold true?

You have 4 different events (by category, 1, 2, 3, 4).
Each event differs from zero by the Error %.

I think that using a Standard Deviation might be the better way.
[TABLE="width: 320"]
<tbody>[TR]
[TD="width: 64, align: right"]1.54[/TD]
[TD="width: 64, align: right"]-1.5[/TD]
[TD="width: 64, align: right"]1.9[/TD]
[TD="width: 64, align: right"]-2.4[/TD]
[TD="width: 64"][/TD]
[/TR]
[TR]
[TD="align: right"]2.3716[/TD]
[TD="align: right"]2.25[/TD]
[TD="align: right"]3.61[/TD]
[TD="align: right"]5.76[/TD]
[TD="align: right"]13.9916[/TD]
[/TR]
[TR]
[TD="align: right"]0.330267[/TD]
[TD="align: right"]0.370267[/TD]
[TD="align: right"]0.029733[/TD]
[TD="align: right"]0.529733[/TD]
[TD="align: right"]1.870267[/TD]
[/TR]
</tbody>[/TABLE]

Row 1 is your error %. Row 2 is the Square of those values with the sum of the 4 at the end.
Beneath that sum is the STD DEV = Sqrt(SUM/4)
The other 4 values in row 3 are how far the absolute values sit from 1 Standard deviation.
My formula in the first cell of row 3 is: =ABS(ABS(A1)-$E3)

Be careful. I have a rather limited concept of statistics; as I said, my logic may NOT be exactly right.
 
Last edited:
Upvote 0
Jennifer,
I'm not sure if this is appropriate, ie, does my logic hold true?

You have 4 different events (by category, 1, 2, 3, 4).
Each event differs from zero by the Error %.

I think that using a Standard Deviation might be the better way.

I wanted to use the std dev, but couldn't figure out how to calculate it.

[TABLE="width: 320"]
<tbody>[TR]
[TD="width: 64, align: right"]1.54[/TD]
[TD="width: 64, align: right"]-1.5[/TD]
[TD="width: 64, align: right"]1.9[/TD]
[TD="width: 64, align: right"]-2.4[/TD]
[TD="width: 64"][/TD]
[/TR]
[TR]
[TD="align: right"]2.3716[/TD]
[TD="align: right"]2.25[/TD]
[TD="align: right"]3.61[/TD]
[TD="align: right"]5.76[/TD]
[TD="align: right"]13.9916[/TD]
[/TR]
[TR]
[TD="align: right"]0.330267[/TD]
[TD="align: right"]0.370267[/TD]
[TD="align: right"]0.029733[/TD]
[TD="align: right"]0.529733[/TD]
[TD="align: right"]1.870267[/TD]
[/TR]
</tbody>[/TABLE]

Row 1 is your error %. Row 2 is the Square of those values with the sum of the 4 at the end.
Beneath that sum is the STD DEV = Sqrt(SUM/4)

The other 4 values in row 3 are how far the absolute values sit from 1 Standard deviation.
My formula in the first cell of row 3 is: =ABS(ABS(A1)-$E3)

Be careful. I have a rather limited concept of statistics; as I said, my logic may NOT be exactly right.

Hmmm... Is that really the std dev? And do we want to find out how far a value is from one std dev or how many std devs it is from the mean (z-score)?

How about this? I've added the mean and std dev to rows 9 & 11. I'm not sure if these are valid measures because I'm not sure if they are from the same population.

Then in rows 10 & 12, I calculate the Z score based on those values.

[TABLE="class: grid, width: 870"]
<tbody>[TR]
[TD="align: center"]R/C[/TD]
[TD="align: center"]C[/TD]
[TD="align: center"]D[/TD]
[TD="align: center"]E[/TD]
[TD="align: center"]F[/TD]
[TD="align: center"]G[/TD]
[TD="align: center"]I[/TD]
[TD="align: center"]J[/TD]
[TD="align: center"]K[/TD]
[TD="align: center"]L[/TD]
[/TR]
[TR]
[TD="align: center"]3[/TD]
[TD]Streak Length[/TD]
[TD="align: center"]1[/TD]
[TD="align: center"]2[/TD]
[TD="align: center"]3[/TD]
[TD="align: center"]4[/TD]
[TD="align: center"]Mean[/TD]
[TD="align: center"]Formula[/TD]
[TD="align: center"]Std Dev[/TD]
[TD="align: center"]Formula[/TD]
[/TR]
[TR]
[TD="align: center"]4[/TD]
[TD]#Games[/TD]
[TD="align: right"]4000[/TD]
[TD="align: right"]4000[/TD]
[TD="align: right"]4000[/TD]
[TD="align: right"]4000[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="align: center"]5[/TD]
[TD]Win %[/TD]
[TD="align: right"]60.00%[/TD]
[TD="align: right"]60.00%[/TD]
[TD="align: right"]60.00%[/TD]
[TD="align: right"]60.00%[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="align: center"]6[/TD]
[TD]Exp Streak %[/TD]
[TD="align: right"]24.00%[/TD]
[TD="align: right"]14.40%[/TD]
[TD="align: right"]8.64%[/TD]
[TD="align: right"]5.18%[/TD]
[TD][/TD]
[TD]D6: =(D5^D3)*(1-D5)[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="align: center"]7[/TD]
[TD]Expected Tally[/TD]
[TD="align: right"]384.00[/TD]
[TD="align: right"]230.40[/TD]
[TD="align: right"]138.24[/TD]
[TD="align: right"]82.94[/TD]
[TD][/TD]
[TD]D7: =D6*D4*(1-D5)[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="align: center"]8[/TD]
[TD]Actual Tally[/TD]
[TD="align: right"]390[/TD]
[TD="align: right"]227[/TD]
[TD="align: right"]141[/TD]
[TD="align: right"]81[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="align: center"]9[/TD]
[TD]Error (Act-Exp)[/TD]
[TD="align: right"]6.00[/TD]
[TD="align: right"]-3.40[/TD]
[TD="align: right"]2.76[/TD]
[TD="align: right"]-1.94[/TD]
[TD="align: right"]0.8540[/TD]
[TD]I9: =AVERAGE(D9:G9)[/TD]
[TD="align: right"]4.32[/TD]
[TD]K9: =STDEV(D9:G9)[/TD]
[/TR]
[TR]
[TD="align: center"]10[/TD]
[TD]Z-Score[/TD]
[TD="align: right"]1.19[/TD]
[TD="align: right"]-0.98[/TD]
[TD="align: right"]0.44[/TD]
[TD="align: right"]-0.65[/TD]
[TD][/TD]
[TD]D10: =(D9-$I9)/$K9[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD="align: center"]11[/TD]
[TD]Error %[/TD]
[TD="align: right"]1.54%[/TD]
[TD="align: right"]-1.50%[/TD]
[TD="align: right"]1.96%[/TD]
[TD="align: right"]-2.40%[/TD]
[TD="align: right"]-0.1005%[/TD]
[TD]I11: =AVERAGE(D11:G11)[/TD]
[TD="align: right"]2.17%[/TD]
[TD]K11: =STDEV(D11:G11)[/TD]
[/TR]
[TR]
[TD="align: center"]12[/TD]
[TD]Z-Score[/TD]
[TD="align: right"]0.75[/TD]
[TD="align: right"]-0.64[/TD]
[TD="align: right"]0.95[/TD]
[TD="align: right"]-1.06[/TD]
[TD][/TD]
[TD]D12: =(D11-$I11)/$K11[/TD]
[TD="align: center"][/TD]
[TD="align: center"][/TD]
[/TR]
</tbody>[/TABLE]

Is any of this meaningful?
 
Upvote 0
Jennifer,
I recognise two things in which I was probably was in error. I made the assumption that the "mean" would be zero, ie, no error. Secondly my divisor should have been 1 less that the values, ie, 3.

What are you doubting about "same population"? Are not the 4 categories of errors coming from the same 4000 game population?
That said, and you'd be more conversant than I, the most accurate means to report deviations would be based upon STDs and Z-scores are a more meaningful way to interpret them.

Yes, your table is meaningful to me and would seem to be the way to go.
 
Upvote 0
Jennifer,
I recognise two things in which I was probably was in error. I made the assumption that the "mean" would be zero, ie, no error. Secondly my divisor should have been 1 less that the values, ie, 3.
I think the mean should be zero, if this is a true random variable, but since it's about human behavior, it's almost certainly not perfectly random. Does a human play the same after a loss as after 10 wins in a row? I think not.

What are you doubting about "same population"? Are not the 4 categories of errors coming from the same 4000 game population?
Yes, all of the results come from the same 4000 games. But it seems to me that by tallying the number the different streaks separately, I may have created different (sub)populations. This is over my skill level.

That said, and you'd be more conversant than I, the most accurate means to report deviations would be based upon STDs and Z-scores are a more meaningful way to interpret them.

Yes, your table is meaningful to me and would seem to be the way to go.
I think I'll try this on one of the stats forums.

Thanks
 
Upvote 0
Ah well, if we didn't get to "the prize" then hopefully our exchange has cleared some thoughts or at least offered some new directions.
All the best.
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,182
Members
453,020
Latest member
Mohamed Magdi Tawfiq Emam

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top