Stars and Bars
We are often asked why we use a histogram visualization we refer to as a "snip" for ratings rather than the traditional stars or numeric scores used on most other sites. The answer is a bit more complex than "it looks cool" and thus seemed like a great topic for a post. Star and numeric single value summaries for a product often hide valuable information. This hurts users because controversial books are often the great books, or the negative review is often the most insightful. So today I am going to give a quick review of the history of attitude encoding, examine the issues associated with central tendency summaries (single numeric or star value) and demonstrate how a histogram allows users to better understand the review-o-sphere.
Reviews, surveys and questionaries allow people to express their attitudes and opinions on various topics. While verbal or textual free-form methods of capturing sentiment allow people the greatest freedom in terms of what they can talk about, they are not well suited for comparison or summary statistics. How do you compare the text of two movie reviews if you only have text? In 1932 Rensis Likert was working on creating methods for encoding attitudes on various topics and came up with encoding agreement with a question or topic by using a numbered scale. His system is now called the Likert scale.
(Likert, Rensis (1932), "A Technique for the Measurement of Attitudes", Archives of Psychology 140: pp. 1-55).
While many are not familiar with his name, we have all encountered this technique in questionnaires, reviews or ratings when we are asked to rate something on a scale of 1 to X or if we are asked how much we agree or disagree with a statement. Because attitudes are encoded into a number, there is no question how to compare text, just use the numeric value (stars) as the value and then summaries of that data can easily be calculated.
The idea of encoding a rating into a 1 to X point scale has been used for years for ratings. Online review companies have used this idea also; examples are IMDb, Epinions.com, Yahoo! Movies, Amazon.com, and many more. By having users express their ratings numerically, the natural reaction is to use a central tendency statistic to quickly understand the data set. Here is where the common use of the mean (average) as the statistic becomes a problem.
A central tendency statistic is a single value that is quoted as representative of an entire data set. Most commonly we see average or mean as the statistic used to represent a set of reviews. The problem with using average for reviews is simple -- people often disagree! In other words, reviews often follow a bi-modal distribution, not a normal or Laplace-Gaussian distribution.
Take the following example bi-modal distribution. If we were to take the average from this data, it would give us a value between the two humps. That value would not show us that there are two camps of views on a topic. Other central tendency statistics like mode or median have the same problem.
Examining our data we don't see ratings following a normal distribution. As in the examples below, reviews often have several opinion groups being represented in the data, thus a single value can never tell the full story.
Only looking at the average number of stars also does a better job
of hiding another inherent bias in reviews: Not only do people tend to
write only polarizing ones, but they tend to write positive ones. If
you look at our home page we
provide a visualization for all the reviews we have indexed. Looking at
it you can see that green (positive or 5 star) reviews far out-number
the other classes. You might expect that there are about the same
number of one-star books, movies, gadgets as there are five-star, but
in fact the majority have a high average rating. That just makes it
harder to make a decision!
We were not the first to see this pattern and conclude that the mean (average) is problematic with online reviews. Hu, Nan, et.al have a great paper examining the review distributions.
Also, the NY Times did a piece a few years ago looking at the Amazon ratings and found that some of the best sellers out there have a bi-modal distribution (what they called "horseshoe-shaped") of ratings. The reason is simple: some of the best books are the most controversial. There is a little feature on Summize that helps you find those gems, it is called "disagree on". Check it out for the query "non-fiction books" click on "disagree on" and find books like:
- The Fair Tax Book: Saying Goodbye to the Income Tax and ...
- Treason: Liberal Treachery from the Cold War to the War ...
- Unfit for Command - Jerome R., Ph.D. Corsi
- Let Freedom Ring : Winning the War of Liberty over ...
- The Truth About Hillary: What She Knew, When She Knew ...
all are controversial books of some kind and best sellers.
When
Greg Pass (one of the co-founders of Summize) was looking at the best
method to visualize our sentiment classification of reviews he wanted
to express more information in the same amount of space as current star
representations. He has a post coming up with his story on that
visualization, but I will give a brief spoiler: His visualization or
histogram shows the full data set, this visualization removes all the
issues with central tendency statistics like mean, median, mode or
fancier encodings. I think his visualization does that quite
effectively, but of course I am a bit biased on that point. The
visualization is sometimes called a stacked bar histogram or spino-gram.
Summize is not the only company trying to help people fully understand reviews. A few months after Summize released its visualization Amazon came out with one of their own. Below are two visualizations of the review distribution data for "The Fair Tax.." book, Amazon's vs Summize. Both offer interesting views into the data, I will let you pick which one you like better.
One good theory for why ratings do not follow a normal distribution is called "sampling bias". People tend not to write reviews unless they have a strong opinion on a topic. Hu, Nan, et.al. looked at this sampling bias in reviews and proposed that if all users were forced to review a product, a normal distribution would occur for most products. The reality is most people do not write reviews on the products they consume. The people writing reviews tend to have polar opinions on a topic, thus this sampling bias occurs.
Our goal in summarizing reviews is to expose the many facets they cover about a product. A single value hides all that great insight people have. While I am all for simplification, I think that showing the full set of data in a quick and easy visualization makes for a more informative, more interesting experience.
Abdur
abdur@summize.com




Comments