Discriminating Between the Wheat and the Chaff in Online Recommendation Systems

by Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi and Maurizio Tesconi

MyChoice is an ambitious research project that aims to model, detect, and isolate outliers (aka fake) in online recommendation systems, as well as in online social networks. The final outcome of the project will be the prototype of an automated engine able to recognize fake information, such as reviews, and fake friends/followers, and able to filter out malicious material, in order to return reliable and genuine content to the user .

MyChoice aims to provide novel models and tools to search genuine and unbiased content on Web platforms, while filtering out partial and fake information. The expected outcome of the project is twofold: firstly, focusing on real online recommendation systems, MyChoice intends to tackle the (malicious) bias that may influence a high percentage of users. Secondly, the project pays attention to fake accounts on social networks and provides automatic fake detection techniques. As an example on Twitter, “fake followers” are those accounts created to inflate the number of followers of a target account, to make it more trustworthy and influential, in order to stand out from the crowd and attract other genuine followers.

Fake reviews are currently so widespread that this phenomenon has captured the attention of academia and the mass media. Fake reviews can influence the opinions of users, having the effect of either promoting or damaging a particular target, thus a strong incentive exists for opinion spamming. Markets are strongly influenced by review scores: a recent survey by Cornell University on Internet travel booking revealed that online advice had a strong impact on bookings, occupancy rates, and revenue of commercial accommodation establishments. Defining efficient methodologies and tools to mitigate proliferation of fake reviews has become a compelling issue that MyChoice is addressing. One main outcome of the project will be a prototype for an unbiased ranking system, which identifies and evicts malicious reviews and provides a more appropriate choice of services and products, based on the fusion of their objective characteristics and subjective tastes and interests of the individual user.

Figure 1: Screenshot of the Twitter account @TheFakeProject, used to launch the campaign for recruiting the real humans in a training dataset of Twitter accounts

The project started its activity in 2012, monitoring some of the most popular websites providing online advice for hotels, such as TripAdvisor, and online services for e-booking, such as Booking. A crawler was used to collect several million reviews relating to thousands of different hotels all around the world. Starting from the state-of-the-art in the field, the researchers involved in the project quantified the robustness of the rating aggregators used in such systems, against the malicious injection of fake reviews. The current experimental outcomes, for example, enrich past results attesting that a simple arithmetic mean of the ratings by the hotel guests (which is the usual way to provide aggregated information to users) is not the most robust aggregator, since it can be severely affected by even a small number of outliers. Experiments have been carried out considering different kinds of attack, such as batch injections, hotel-chain injections, and local competitor injections. To improve the robustness of the ranking, the project is defining new aggregators to more effectively tackle the activity of malicious reviewers.

To enhance the comprehension of the fake phenomenon, the project is also looking at other instances of the concept of “fake”. In particular, a research effort is focusing on the proliferation of fake Twitter followers, which has also aroused a great deal of interest in the mainstream media, such as New York Times and Financial Times. We have created a “gold standard”, namely a collection of both truly genuine (human) and truly fake accounts. In December, 2012, MyChoice launched a Twitter campaign called “The Fake Project”, with the creation of the Twitter account @TheFakeProject, whose profile claims “Follow me only if you are NOT a fake”. To obtain the status of “certified human”, each account that adheres to the initiative was the target of further checks to attest its credibility. The “certified fake” set was collected by purchasing fake accounts, which are easily accessible to the general public. Based on experiments over the gold standard, we are determining to what extent the problem of detecting fakes can be considered similar to the problem of detecting spam. To this end we are leveraging machine-learning classifiers trained on the gold standard to evaluate how the state-of-the art proposals for spammer Twitter account detection [1-2] perform on fake account detection [3]. This has been achieved by crawling 600,000 tweets and around one million Twitter accounts. Classifiers instrumented with the best features are able to obtain highly accurate fake detections (higher than 95%); hence, they can be used to estimate the number of fake followers of any Twitter account.

The classifiers highlight a behavioural difference between humans, spammers and fake accounts. In our opinion, this is the basis for a more thorough comprehension of the fake phenomenon, which can lead to formal modelling that can help discriminate between an anomalous (possibly fake) account and a standard (possibly legitimate) one. Using this formalization as a reference model, the definition of fakes could be exported into different contexts, even to online reviews and reviewers.

In conclusion, combining a robust metrics with the formalization of “fakeness” leads to the final goal of MyChoice: to develop a prototype for a ranking system able to discriminate between genuine and unbiased information, getting rid of malicious content, and providing the user with reliable search results.

MyChoice is a regional project funded by the Tuscany region under the “Programma operativo regionale Competitività e Occupazione (Por Cro)” Program, within the European Union Social Fund framework 2007-2013, the Institute for Informatics and Telematics of the Italian National Research Council (IIT-CNR), and the start up company Bay31. It is a two-year project, which started in November 2012.

Links:
http://twitter.com/TheFakeProject
http://wafi.iit.cnr.it/TheFakeProject/

References:
[1] C. Yang et al.: “Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers”, in proc. of RAID 2011, Springer, http://dx.doi.org/10.1007/978-3-642-23644-0_17

[2] G. Stringhini et al.: “Detecting spammers on social networks”, in proc. of ACM ACSAC '10, http://dl.acm.org/citation.cfm?id=1920263

[3] S. Cresci et al.: “Fake accounts detection on Twitter”, Tech Rep IIT-CNR nr. 15-2013,
http://www.iit.cnr.it/node/22730

Please contact:
Marinella Petrocchi
IIT-CNR, Italy
Tel: +390503153432
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

{jcomments on}