Prior to 2015, the Internet Movie Database (IMDb) publicly listed the formula used to rank their Top 250 movies list. To quote:
The formula for calculating the Top Rated 250 Titles gives a true Bayesian estimate:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
- R = average for the movie (mean)
- v = number of votes for the movie
- m = minimum votes required to be listed in the Top 250 (currently 25000)
- C = the mean vote across the whole report (currently 7.0)
For the Top 250, only votes from regular voters are considered.
It's not so hard to understand. The formula is:
rating = (v / (v + m)) * R +
(m / (v + m)) * C;
Which can be mathematically simplified to:
rating = (R * v + C * m) / (v + m);
The variables are:
- R – The item's own rating. R is the average of the item's votes. (For example, if an item has no votes, its R is 0. If someone gives it 5 stars, R becomes 5. If someone else gives it 1 star, R becomes 3, the average of
[1, 5]
. And so on.)
- C – The average item's rating. Find the R of every single item in the database, including the current one, and take the average of them; that is C. (Suppose there are 4 items in the database, and their ratings are
[2, 3, 5, 5]
. C is 3.75, the average of those numbers.)
- v – The number of votes for an item. (To given another example, if 5 people have cast votes on an item, v is 5.)
- m – The tuneable parameter. The amount of "smoothing" applied to the rating is based on the number of votes (v) in relation to m. Adjust m until the results satisfy you. And don't misinterpret IMDb's description of m as "minimum votes required to be listed" – this system is perfectly capable of ranking items with less votes than m.
All the formula does is: add m imaginary votes, each with a value of C, before calculating the average. In the beginning, when there isn't enough data (i.e. the number of votes is dramatically less than m), this causes the blanks to be filled in with average data. However, as votes accumulates, eventually the imaginary votes will be drowned out by real ones.
In this system, votes don't cause the rating to fluctuate wildly. Instead, they merely perturb it a bit in some direction.
When there are zero votes, only imaginary votes exist, and all of them are C. Thus, each item begins with a rating of C.
See also:
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…