by Stefano Baccianella, Andrea Esuli and Fabrizio Sebastiani
Researchers from ISTI-CNR, Pisa, are working on an automatic rating system for online product reviews based on an analysis of their textual content.
Online product reviews are now available on a variety of Web sites, and are being used by consumers with increasing frequency in order to make purchase decisions between competing products. For example, according to a study performed on TripAdvisor (one of the most popular online review sites for tourism-related activities) of users of the TripAdvisor online booking system, 97.7% are influenced by other travellers' reviews, and of those, 77.9% use the reviews as an aid in choosing the best place to stay.
It is obvious, therefore, that there is a growing market for software tools that can organize product reviews and make them easily accessible to prospective customers. Among the issues that the designers of these tools need to address are: (a) content aggregation, such as pulling together reviews from sources as disparate as newsgroups, blogs and community Web sites; (b) content validation, as in filtering out fake reviews authored by people with vested interests; and (c) content organization, as in automatically ranking competing products in terms of the satisfaction of consumers who have already purchased the product.
We address a problem related to issue (c), namely rating, which involves attributing a numerical score of satisfaction to consumer reviews based on their textual content. This problem arises from the fact that while some online product reviews consist of a textual evaluation of the product and a score expressed on some ordered scale of values, many other reviews only contain a textual part. Such reviews are difficult for an automated system to manage, especially when a qualitative comparison of them is needed in order to determine whether product x is better than product y, or to identify the best product of the lot. Tools capable of interpreting a text-only product review and scoring it according to how positive it is, are thus of the utmost importance.
Our work looks at the problem of rating a review when the value to be attached must range on an ordinal (ie discrete) scale. This scale may be in the form either of an ordered set of numerical values (eg one to five stars), or of an ordered set of non-numerical labels (such as 'poor', 'good', 'very good', 'excellent'). We also focus on multi-faceted rating of product reviews, where the review of a product (eg a hotel) must be rated several times according to several orthogonal aspects (eg cleanliness, location etc).
We focus on generating the vectorial representations of the reviews that must be given as input to the learning device used to generate a review rater, rather than on the learning device itself (for which we use an off-the-shelf package). These representations cannot simply consist of the usual 'bag of words' used when classifying texts by topic, since classifying texts by opinion (which is the key content of reviews) requires a much subtler approach. Two expressions such as "A great hotel in a horrible town!" and "A horrible hotel in a great town!" would receive identical 'bag of words' representations despite expressing opposite opinions.
We have focused on three aspects of the generation of meaningful representations of product reviews: (i) the extraction of complex features based on speech patterns; (ii) making the extracted features more robust through the use of a lexicon of opinion-laden words; and (iii) the selection of discriminating features through techniques explicitly devised for ordinal regression (an issue which until now has received practically no attention in the literature). In order to test the techniques we have developed, we crawled the Web to create a dataset of hotel reviews. The dataset is now available to the research community for experimentation. Several experiments that we have run on it confirm that a combination of these three techniques provides the best performance on this particular type of data.
The system we have realized could work as a building block for other larger systems that implement more complex functionality. For instance, a Web site containing product reviews whose users only seldom rate their own reviews could use our system to learn from already rated reviews how to rate the others; another Web site containing only unrated product reviews could learn to rate its own reviews, from the rated reviews of some other site.
Tel: +39 050 3152 892