Print Story Magic decoder ring for IMDB movie ratings
Diary
By tmoertel (Fri Jul 23, 2004 at 02:53:43 PM EST) (all tags)
Which piece of information is more useful?
  • Spider-Man 2 has an average rating of 8.0 on IMDB.
  • Spider-Man 2 is in the top 5 percent of movies ever made.
Keep reading, and I'll show you how to make a magic decoder ring that turns the first into the second.


The Internet Movie Database is my favorite source of movie information, but it has a failing: The ratings aren't particularly meaningful, and that makes finding good movies harder than it ought to be.

For example, if you look up a movie on IMDB and find that it has an average rating of 5.0, what does that mean? Intuition suggests that because IMDB rates on a 10-scale, the movie should be near the middle of the pack – not the greatest movie in the world, but not an outright stinker, either.

Intuition, however, would be wrong. In reality, the movie is a stinker. It is, in fact, in the worst one-fourth of movies ever made.

How did our intuition lead us so far astray? The problem is that IMDB movie ratings don't reliably indicate a movie's "goodness" with respect to other movies. A 5.0 doesn't really have any particular meaning – other than being about halfway between awful and excellent, the two extremes on IMDB's rating scale. Yes, we know that a 5.0-rated movie is probably "better" than a 4.8-rated movie, but how much better? 0.2 better? What on earth does that mean?

If we want to ascribe a more useful meaning to that 5.0, we'll need to turn to descriptive statistics. One of the most useful things to look at first when analyzing a data set is its distribution. So I downloaded IMDB's database and generated this histogram, which shows the distribution of ratings across the entire IMDB movie database:

 

 

From the histogram we can see that almost all movies are rated between 4 and 8. If a movie is rated lower than 4, it's one of the worst movies ever made; avoid it. If a movie is rated higher than 8, it's one of the best ever made – almost certainly worth viewing. Of that much, we can be fairly confident just by looking at the histogram.

But what about the ratings in between, the ratings in that big lump in the middle? How does our hypothetical 5.0-rated movie really stack up? To answer those questions, we must turn to the cumulative distribution function for the ratings:

 

 

Pinpoint a movie's rating on the "Rating" axis, and then trace a line straight up from that point until it intersects the stair-step CDF curve in the middle of the graph. From there, go straight left until you hit the "Proportion of movies ..." axis. Where you land on that axis gives you the magic number that tells you how your movie stacks up against all other movies.

For example, for a 6.0-rated movie, we trace up from the 6 on the Rating axis to the CDF curve and then straight left until we hit about 0.4 on the Proportion axis. That means that the movie is better than about 40% of all other movies, or to look at it another way, 60% of movies are better than our 6.0-rated movie. Repeating the process for our hypothetical 5.0-rated movie shows that it's at the 20% mark – pretty bad.

Since it's a pain in the neck to read the graph, I have made a small decoder ring that is more useful:

IMDB MOVIE RATING
DECODER RING

Movie's   % of movies
rating    it beats
-------  ------------

 4.00-         9 

 5.00         21
 5.25         24
 5.50         30
 5.75         35

 6.00         42
 6.25         48
 6.50         57
 6.75         63

 7.00         72
 7.25         78
 7.50         87
 7.75         91

 8.00         95
 8.25         97
 8.50         98
 8.75         99

 9.00+       100
With the decoder ring, we can turn a movie's nearly meaningless IMDB rating into genuinely useful information – a single percentage that tells us where that movie stands within the world of movies.

All you do is look up your movie's IMDB rating in the left-hand column and take the corresponding percentile rank from the right-hand column. For example, Spider-Man 2 currently has a rating of 8.0, which corresponds to 95% on the decoder ring. That's how I knew earlier it's in the top 5% of movies ever made.

I use the decoder ring all the time, and it has made it much easier to select movies that truly are worth watching. It's a great tool. I hope that you find it as useful as I have.

Notes

I obtained the data for my analysis via IMDB's "Alternate Interfaces" page.
I removed from consideration any movie with fewer than 100 ratings.
When I use the phrase "top 5% of movies ever made," I am really saying "top 5% of movies ever to receive at least 100 ratings on IMDB."
I performed the analysis using R from The R Project for Statistical Computing.
[Yes, this was shamelessly reposted from an earlier post I made to my community projects site.]

< Gravity. | BBC White season: 'Rivers of Blood' >
Magic decoder ring for IMDB movie ratings | 19 comments (19 topical, 0 hidden) | Trackback
Very cool by jacob (6.00 / 4) #1 Fri Jul 23, 2004 at 03:37:09 PM EST
Though there's one important limitation of this analysis, though it's probably more significant in theory than in practice: in some sense comparing any two movies' ratings is problematic because they weren't rated by the exact same group of people. You can say that as long as there's no reason to suspect that the raters of movie A and the raters of movie B are substantially different groups, the comparisons are valid, but in the case of movies that makes no sense most of the time. For instance, it's hard to believe that the same people who thought The Little Mermaid merited a 7.3 also thought that Elephant just edged it out with a 7.4.

I think it'd be difficult for that reason to conclude Elephant is objectively better than The Little Mermaid even though the numbers indicate that it is. The moral of the story: if it sounds like the sort of movie you wouldn't like, a high rating may not mean you'll like it anyway; and a movie that sounds like just your kind of thing might be even though it's got a low rating.

--



In practice, however, . . . by tmoertel (6.00 / 2) #2 Fri Jul 23, 2004 at 04:50:47 PM EST
The system seems to work well because the ratings for each movie generally reflect the views of the subset of the population who thought the movie would be worth seeing in the first place. (People outside of that subset probably wouldn't have seen the movie and thus wouldn't have rated it.) What this means is that the IMDB ratings tend to be an accurate reflection of the movie's merit, as judged by people whom you would reasonably expect to be the typical audience for that genre of movie.

Thus, if we consider each genre independently, the IMDB ratings within will tend to be consistent. The Little Mermaid and Finding Nemo will tend to be measured against the same standard. So, if we rank movies within each genre against each other, the percentile rankings will tend to be meaningful.

Now, the question you raise is, do we muddy the waters by combining all the genres and then ranking the movies? While I haven't performed tests to verify it, I would suspect that the muddying is minimal. I have little reason to suspect that Disney-esque cartoons as a genre rate higher or lower than coming of age drama/thrillers. I'd bet that the distributions for each genre are similar enough to one another that you can combine genres without disturbing the rankings much.

--
Write Perl code? Check out LectroTest. Write markup-dense XML? Check out PXSL.

[ Parent ]

VS2FP by TheophileEscargot (3.00 / 0) #3 Fri Jul 23, 2004 at 07:41:29 PM EST
Brilliant.
--
Butch and Petey are harsh and unforgiving in their estimation of female beauty.


Genuinely useful information? by DesiredUsername (3.00 / 0) #4 Sat Jul 24, 2004 at 01:14:20 AM EST
I think you have missed the forest for the trees. Where did this data come from? Ratings from people. Which people? Was it a random cross-section of the world or America? Would it be "genuinely useful" to you if it had been?

The kind of game you've played here is the same kind that Hollywood studios use to create a SUMMER BLOCKBUSTER!!!111 It elevates the number above the actual movie.

That's not to say the numbers are meaningless. I generally give creedence to very low or very high ratings, especially on movies (or books) that have only had a few dozen ratings. But I also read the plot summary and review text, which gives me a much better idea if I'll like it.

---
Now accepting suggestions for a new sigline


Yup. by tmoertel (6.00 / 1) #5 Sat Jul 24, 2004 at 03:03:53 AM EST
I didn't include much advice for interpreting the percentile scores in the article because, to be blunt, it never occurred to me that anybody would think that the scores are more than just a more-useful form of rating – just one small piece of information that goes into the process selecting movies to view. I don't think that I said to turn off your brain and just trust my new rating instead, but if that's what you're reading – or if it's even possible to read it that way – I screwed up my explanation.

To answer your question, I say that the scores genuinely useful because, in practice, they have been. I did this analysis over a year ago (see The Hulk and the true meaning of IMDB movie ratings), and it's been a useful tool for me since.

Where the percentile score has shown itself to be especially handy is in sudden-death situations where there is no time to do "proper" research. You know what I'm talking about: It's Friday night and the wife or some friends say – at 8:30 PM, naturally – hey, let's go see a movie tonight. Great! Now we have about thirty minutes to pick a movie and drive to the theater. In those situations, the decoder ring has proved its worth.

--
Write Perl code? Check out LectroTest. Write markup-dense XML? Check out PXSL.

[ Parent ]

Amazon reviews by em (6.00 / 2) #6 Sat Jul 24, 2004 at 07:04:15 AM EST
This reminds me of my experiences with Amazon reviews.  The problem with those is most obvious with pop record reviews-- if Linkin Park (WARNING: I don't think I've ever heard a Linkin Park song) is the current hit group, you get hundreds upon hundreds of kids posting their "reviews" of them, giving them 5/5, saying they're the best ever; and complementarily to this, a few kids who think they utterly suck, give it a 1/5, say its's the worst thing ever.

Same thing happens with book reviews, though often more subtle: the most clueless readers all tend to give fives.  (Or even worse, politically-minded people gang up to dump on some book of an opposite persuasion).

Usually the most informative reviews of a book or album are somewhere around the middle.  The most insightful reviews of an item that has an average of 5 stars tend to be very low, in my experience (e.g. the only people who really understand Steven Pinker's immensely popular books think they suck).

The reviewers are a very, very self-selected group, and that's a problem.

--em


+6, Encourage by infinitera (6.00 / 1) #8 Sat Jul 24, 2004 at 09:55:58 AM EST
Bashes Pinker.

On a related note, I heard of more than a few intro linguistics classes using Words & Rules as its text. :( Travesty, I say.

____
How many successful trolls does it take to earn the title of "pundit"? Also, if 10 trolls work together, is that a "think tank"? — ENOENT
[ Parent ]

Well... by ucblockhead (3.00 / 0) #12 Sun Jul 25, 2004 at 04:04:29 PM EST
I presumes that Pinker understands his own book.
---
[ucblockhead is] useless and subhuman
[ Parent ]

if he did, he'd be able to debate its merits by infinitera (6.00 / 1) #13 Mon Jul 26, 2004 at 01:53:11 AM EST
But he's a complete dogmatist.

____
How many successful trolls does it take to earn the title of "pundit"? Also, if 10 trolls work together, is that a "think tank"? — ENOENT
[ Parent ]

Respectfully, balls by Rogerborg (6.00 / 1) #7 Sat Jul 24, 2004 at 07:33:31 AM EST
Ratings are immaterial.  Even the demographic breakdowns (and even assuming accurate information) aren't useful.  What you need to know are the ratings given by people like you.

I find the most useful guide is to read the user comments.  For example, I discard comments with speling errurs or that enthuse about how totily hot kristen durnst or famkey jansun are, and make my judgement based on the two or three that are left.  It's working out rather well so far.

-
Metus amatores matrum compescit, non clementia.


Great article by Big Sexxxy Joe (6.00 / 2) #9 Sat Jul 24, 2004 at 11:08:08 AM EST
A very interesting page to look at is the IMDB top 250.  Number one is The Godfather with a rating of 9.0.  Number 250 is Fantasia with a rating of 7.7.

Also at the bottom of the page it gives the formula that they use to calculate the rank.

I'm like Jesus, only better.


i voted this up by 256 (6.00 / 1) #10 Sat Jul 24, 2004 at 11:27:15 AM EST
and i'm glad to see that others did as well.

if only because it helps to prevent the front page of HuSi, a predominantly UKian site, from needing the subtitle: "Tales from Southern Ontario."
---
I don't think anyone's ever really died from smoking. --ni


I never cared for IMDB too much... by Xray Shoesize (3.00 / 0) #11 Sun Jul 25, 2004 at 07:24:23 AM EST
I primarily use allmovie.com....

It seems to have a lot less crap on it.



As long as.. by Psychopath (3.00 / 0) #19 Tue Aug 31, 2004 at 11:19:51 PM EST
..allmovie.com doesn't change to such a fucked up interface as allmusic.com I might agree. :-)
There's a party in my mind, and I can never leave.
[ Parent ]

Interesting! by fae (3.00 / 0) #14 Mon Jul 26, 2004 at 09:15:47 PM EST
I wonder if there's a name for that kind of statistical distribution. Perhaps it can be approximated as a gaussian times an envelope forcing it to zero.



Pretty much a Poisson distribution I reckon by gazbo (3.00 / 0) #15 Tue Jul 27, 2004 at 02:54:20 AM EST
IIRC a poisson distribution is basically a normal distribution that is bounded at 0 (and hence skewed).  The only disparity is that a Poisson distribution doensn't have an upper limit, but I imagine that for most purposes it would model it fine.

"Engarde!" cried the larvae, huskily. - Scrymarch

[ Parent ]

statistics by dr k (5.00 / 1) #16 Tue Jul 27, 2004 at 08:29:05 AM EST
The unfortunate thing about the IMDB ratings is that they contain a lot of information that is not really used to calculate the "weighted average". And the sample sizes are absurdly high.

Part of the problem is that it is difficult to work with a 10 point scale, there is no easy formula for the mean deviation.

Using a binomial method, let's take a look at three of the top 250 movies, the Lord of the Rings trilogy. Part 3 is currently ranked as #3, part 2 is #8, part 1 is #9.

I need to split the ratings into two group, so I'll compare "10" votes against everything else. Here are the numbers:

(3) RotK: 45261 / 69408 / 65%
(8) TT: 45774 / 90211 / 51%
(9) FotR: 71988 / 132356 / 54%

With p = .5, mean deviations are

RotK: 131.7
TT: 150.2
FotR: 363.8

And the final numbers [(actual votes - expected votes) / deviation]

RotK: 80.2 sigma
TT: 4.5 sigma
FotR: 16 sigma

By this measurement, the Two Towers should be ranked much lower.

:| :| :| :| :|



Please elaborate so I can follow your logic. by tmoertel (3.00 / 0) #17 Tue Jul 27, 2004 at 10:02:19 AM EST
In particular, what do you mean by "binomial method," and why do you think it's a good method to use for this data? What do you think application of this method will show? What do you think the results mean?

From what I can gather, you are treating the act of rating as an independent trial with the probability of a 10-rating being approximately 0.5. Thus you are modeling the rating process with a binomial distribution having p=0.5 and n in {69408, 90211, 132356} depending on the movie. The expected number of 10-ratings for each movie is given by the mean n p and the expected deviation by sqrt(n p (1 – p)).

After comparing the actual number of 10-ratings with the expected and normalizing by the expected deviation, you find that TT is only 4.5 deviations from the expected mean where FotR and RotK are both considerably farther away. Thus you argue that TT wasn't truly as good as the other two.

Am I correct in interpreting your argument to be that an unusually large proportion of 10-ratings suggests that the movie probably would have been rated higher, had the scale allowed for it?

--
Write Perl code? Check out LectroTest. Write markup-dense XML? Check out PXSL.

[ Parent ]

yes by dr k (6.00 / 1) #18 Tue Jul 27, 2004 at 10:36:20 AM EST
You've got the right idea. I left out the background details because I get tired of repeating them.

The key advantage to treating the ratings as a binomial distribution is that it allows you to meaningfully compare differently sized samples. This is more helpful when sample sizes are relatively low (less than, say, 1000). Working with deviations also allows for an open-ended scale.

:| :| :| :| :|

[ Parent ]

Magic decoder ring for IMDB movie ratings | 19 comments (19 topical, 0 hidden) | Trackback