JenniferMurphy
Well-known Member
- Joined
- Jul 23, 2011
- Messages
- 2,687
- Office Version
- 365
- Platform
- Windows
I have been looking into ways to "adjust" ratings like the 5-star ratings on Amazon and elsewhere based on the number of reviews. I think anyone would put more confidence in a product with a 4.9 rating based on 10,000 ratings than one with a 5.0 rating based on just 1 rating. But what about one with a 4.8 rating based on 100 ratings vs one with a 4.7 rating based on 300 ratings?
I initially thought about using confidence intervals, but then I found several websites that mention Bayesian Averages. The problem is that the formulas don't seem to all agree and the derivations are beyond my math skills.
This is what I think the various parameters mean:
This equation seemed to agree with several websites.
This equation was here Using the Bayesian average in custom ranking | Algolia. But it doesn't seem to even work with their own data.
Anyway, I will include two minisheets. This first one has some sample product ratings. The Average1 column uses the first equation above. The Average2 column uses the second equation above, which gets crazy results.
This minisheet has a table I created using data from the Algolia website. The Claimed column is the adjusted ratings it says it will get. The Average1 column is the ratings using their equation. They do not match. Tne Average2 is the results using the first equation above. I think I must have done something wrong, because those results are crazy. But I cannot find the error.
I would love it if someone can put me out of my misery. Any statisticians out there?
I initially thought about using confidence intervals, but then I found several websites that mention Bayesian Averages. The problem is that the formulas don't seem to all agree and the derivations are beyond my math skills.
This is what I think the various parameters mean:
ProdRtg | The actual product rating. |
productRatingsAvg | I think this is the same as ProdRtg. |
NumRtgs | The number of ratings or reviews. |
productRatingsCount | I think this is the same as NumRtgs. |
AvgRtg | The average of the ratings of all of the products. I am not sure if this is just the products in the table or all products on sale or what. |
C | I think this is the same as AvgRtg. |
m | This is some kind of a confidence interval. Different websites had different definitions. |
This equation seemed to agree with several websites.
This equation was here Using the Bayesian average in custom ranking | Algolia. But it doesn't seem to even work with their own data.
Anyway, I will include two minisheets. This first one has some sample product ratings. The Average1 column uses the first equation above. The Average2 column uses the second equation above, which gets crazy results.
BayesRtg.xlsx | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
C | D | E | F | G | H | I | J | |||
2 | 278 | m | ||||||||
3 | ||||||||||
4 | Product | Rank | Rating | # Ratings | Bayesian Average1 | Rank 1 | Bayesian Average2 | Rank 2 | ||
5 | A | 1 | 5.0 | 1,000 | 4.967 | 1 | 6.318 | 13 | ||
6 | B | 2 | 5.0 | 100 | 4.890 | 2 | 17.628 | 9 | ||
7 | C | 3 | 5.0 | 10 | 4.855 | 5 | 94.162 | 5 | ||
8 | D | 4 | 5.0 | 1 | 4.851 | 7 | 231.333 | 1 | ||
9 | E | 5 | 4.9 | 1,000 | 4.889 | 3 | 6.218 | 14 | ||
10 | F | 6 | 4.9 | 100 | 4.863 | 4 | 17.533 | 10 | ||
11 | G | 7 | 4.9 | 10 | 4.852 | 6 | 94.094 | 6 | ||
12 | H | 8 | 4.9 | 1 | 4.850 | 8 | 231.316 | 2 | ||
13 | I | 9 | 4.8 | 1,000 | 4.811 | 14 | 6.119 | 15 | ||
14 | J | 10 | 4.8 | 100 | 4.837 | 13 | 17.437 | 11 | ||
15 | K | 11 | 4.8 | 10 | 4.848 | 11 | 94.027 | 7 | ||
16 | L | 12 | 4.8 | 1 | 4.850 | 9 | 231.299 | 3 | ||
17 | M | 13 | 4.7 | 1,000 | 4.733 | 16 | 6.019 | 16 | ||
18 | N | 14 | 4.7 | 100 | 4.810 | 15 | 17.342 | 12 | ||
19 | O | 15 | 4.7 | 10 | 4.845 | 12 | 93.960 | 8 | ||
20 | P | 16 | 4.7 | 1 | 4.849 | 10 | 231.282 | 4 | ||
21 | Averages | 4.85 | 278 | |||||||
Simple |
Cell Formulas | ||
---|---|---|
Range | Formula | |
G5:G20 | G5 | = (([@Rating] * [@['# Ratings]]) + (Table16[[#Totals],[Rating]] * m)) / ([@['# Ratings]] + m) |
H5:H20 | H5 | =RANK.EQ([@[Bayesian Average1]],[Bayesian Average1]) |
I5:I20 | I5 | = ([@Rating]*[@['# Ratings]] + Table16[[#Totals],[Rating]]*m) / ([@['# Ratings]]+Table16[[#Totals],[Rating]]) |
J5:J20 | J5 | =RANK.EQ([@[Bayesian Average2]],[Bayesian Average2]) |
D5:D20 | D5 | =MATCH([@Product],SORT([Product],1,1),0) |
E21 | E21 | =SUBTOTAL(101,[Rating]) |
F21 | F21 | =SUBTOTAL(101,['# Ratings]) |
Named Ranges | ||
---|---|---|
Name | Refers To | Cells |
m | =Simple!$E$2 | G5:G20, I5:I20 |
This minisheet has a table I created using data from the Algolia website. The Claimed column is the adjusted ratings it says it will get. The Average1 column is the ratings using their equation. They do not match. Tne Average2 is the results using the first equation above. I think I must have done something wrong, because those results are crazy. But I cannot find the error.
BayesRtg.xlsx | ||||||||
---|---|---|---|---|---|---|---|---|
B | C | D | E | F | G | |||
2 | m | 3.5 | ||||||
3 | c | 100 | ||||||
4 | Item | Rating | # Ratings | Claimed Bayesian Average | Bayesian Average1 | Bayesian Average2 | ||
5 | A | 5.0 | 10 | 4.3 | 3.636 | 11.429 | ||
6 | B | 4.8 | 100 | 4.8 | 4.150 | 2.371 | ||
7 | C | 4.6 | 1,000 | 4.6 | 4.500 | 1.414 | ||
8 | Average | 4.8 | ||||||
Algolia |
Cell Formulas | ||
---|---|---|
Range | Formula | |
F5:F7 | F5 | = ([@Rating]*[@['# Ratings]] + c_num*m) / ([@['# Ratings]]+c_num) |
G5:G7 | G5 | =(([@Rating]*[@['# Ratings]]) + (c_num*m)) / ([@['# Ratings]]*m) |
C8 | C8 | =SUBTOTAL(101,[Rating]) |
Named Ranges | ||
---|---|---|
Name | Refers To | Cells |
c_num | =Algolia!$D$3 | F5:G7 |
m | =Algolia!$D$2 | F5:G7 |
I would love it if someone can put me out of my misery. Any statisticians out there?