02 Jan

Why most rating systems suck

Most websites feature the possibility to rate articles, files, videos and other content. It’s a good way to keep popular content the most visible. If the ratings are to be used seriously, we need accurate data. Most rating systems do not necessarily provide accurate data.

The biggest reason why rating content doesn’t work as well as it could is the fact humans use the system. Humans are very noisy when it comes to input. While the law of large numbers eventually ensures consensus, choosing a good rating system could make rating stable much faster.

Pick a number

Probably the most common way to rate content is to give it a score, usually between one and five. Some sites use a larger scale, notably IMDB, which scores movies between one and ten points. Now, here’s the big revelation: this is not a very good way to do this.

The more options you have, the harder it is to choose accurately. IMDB probably has an user base of hundreds of thousands (most movies have tens of thousands of voters), which means they could simply have two ratings: “I liked this movie” and “I didn’t like this movie”. Maybe they could throw in “I didn’t like this but I don’t hate it either”, too. Considering the huge amount of votes, they still would get an average score that provided enough accuracy for their charts and whatnot.

Another downside in such a system is the fact people give biased scores. The reasons might vary, my personal excuse is that school grades in Finland go traditionally from four to ten, roughly representing a score of 40 % to 100 % respectively. This means, I rarely vote outside the familiar scale, unless a movie has to be punished and given the lowest rating possible. Which is another example of the noisy input people provide.

Pros: Easy to get accurate data from small user base
Cons: Very noisy, hard to decide between similar options

Thumbs

A slightly more modern way to rate things is ironically a very old method: thumbing things up or down, much like a Roman emperor. As stated above, the limited scale still provides accurate ratings thanks to the law of large numbers. Giving the user two options removes the statistical noise (excluding accidental votes), it is easier to extract the information by asking simply “Did you like this or not?”.

However, giving the user less options limits his or her ability to rank items on their personal lists and favorites, if that is needed. This could be solved with additional questions such as “Did you like X more than Y?”, or simply allowing the user to order items from best to worst in a list. In addition, an ordered list based approach would obviously give more perspective for the user if he or she wants to provide accurate input, having to constantly think of an item really is better than the item below it.

Pros: Easy to vote, less noise
Cons: Needs a bigger data set to provide accurate data

Obviously, all the above is to taken seriously only if you really want to either get better input or if you want to avoid a bit of work: a fancy system doesn’t necessarily provide that much and could only be confusing to the users. My personal choice would be a minimalistic approach, thumbing up or down, that is. Any comments?