For example, if you look up a movie on IMDB and find that it has an average rating of 5.0, what does that mean? Intuition suggests that because IMDB rates on a 10-scale, the movie should be near the middle of the pack – not the greatest movie in the world, but not an outright stinker, either.
Intuition, however, would be wrong. In reality, the movie is a stinker. It is, in fact, in the worst one-fourth of movies ever made.
How did our intuition lead us so far astray? The problem is that IMDB movie ratings don't reliably indicate a movie's "goodness" with respect to other movies. A 5.0 doesn't really have any particular meaning – other than being about halfway between awful and excellent, the two extremes on IMDB's rating scale. Yes, we know that a 5.0-rated movie is probably "better" than a 4.8-rated movie, but how much better? 0.2 better? What on earth does that mean?
If we want to ascribe a more useful meaning to that 5.0, we'll need to turn to descriptive statistics. One of the most useful things to look at first when analyzing a data set is its distribution. So I downloaded IMDB's database and generated this histogram, which shows the distribution of ratings across the entire IMDB movie database:
From the histogram we can see that almost all movies are rated between 4 and 8. If a movie is rated lower than 4, it's one of the worst movies ever made; avoid it. If a movie is rated higher than 8, it's one of the best ever made – almost certainly worth viewing. Of that much, we can be fairly confident just by looking at the histogram.
But what about the ratings in between, the ratings in that big lump in the middle? How does our hypothetical 5.0-rated movie really stack up? To answer those questions, we must turn to the cumulative distribution function for the ratings:
Pinpoint a movie's rating on the "Rating" axis, and then trace a line straight up from that point until it intersects the stair-step CDF curve in the middle of the graph. From there, go straight left until you hit the "Proportion of movies ..." axis. Where you land on that axis gives you the magic number that tells you how your movie stacks up against all other movies.
For example, for a 6.0-rated movie, we trace up from the 6 on the Rating axis to the CDF curve and then straight left until we hit about 0.4 on the Proportion axis. That means that the movie is better than about 40% of all other movies, or to look at it another way, 60% of movies are better than our 6.0-rated movie. Repeating the process for our hypothetical 5.0-rated movie shows that it's at the 20% mark – pretty bad.
Since it's a pain in the neck to read the graph, I have made a small decoder ring that is more useful:
IMDB MOVIE RATINGWith the decoder ring, we can turn a movie's nearly meaningless IMDB rating into genuinely useful information – a single percentage that tells us where that movie stands within the world of movies.
DECODER RING
Movie's % of movies
rating it beats
------- ------------
4.00- 9
5.00 21
5.25 24
5.50 30
5.75 35
6.00 42
6.25 48
6.50 57
6.75 63
7.00 72
7.25 78
7.50 87
7.75 91
8.00 95
8.25 97
8.50 98
8.75 99
9.00+ 100
All you do is look up your movie's IMDB rating in the left-hand column and take the corresponding percentile rank from the right-hand column. For example, Spider-Man 2 currently has a rating of 8.0, which corresponds to 95% on the decoder ring. That's how I knew earlier it's in the top 5% of movies ever made.
I use the decoder ring all the time, and it has made it much easier to select movies that truly are worth watching. It's a great tool. I hope that you find it as useful as I have.
Notes
I obtained the data for my analysis via IMDB's "Alternate Interfaces" page.
I removed from consideration any movie with fewer than 100 ratings.
When I use the phrase "top 5% of movies ever made," I am really saying "top 5% of movies ever to receive at least 100 ratings on IMDB."
I performed the analysis using R from The R Project for Statistical Computing.
[Yes, this was shamelessly reposted from an earlier post I made to my community projects site.]
| < Gravity. | BBC White season: 'Rivers of Blood' > |

Post to Twitter
