Collecting location data of an individual is one of the greatest offences against privacy. The main objective of this paper is to raise the awareness about the use and collection of such data by illustrating which types of personal information can be inferred from it.
The advent of ubiquitous devices and the growing development of location-based services have lead to the large scale collection of the mobility data of individuals. For instance, the location of an individual can be:
- deduced from the computers IP address,
- collected by applications running on a smartphone providing information tailored to the current location or for collaborative tasks such as traffic monitoring (for example with waze, www.waze.com),
- revealed in form of a geotag, for example added to picture he has taken without him noticing or
- explicitly by checking-in to a geo-social network such as Foursquare (www.foursquare.com).
Among all the Personally Identifiable Information (PII), learning the location of an individual is one of the greatest threats against privacy. In particular, an inference attack, can use mobility data (together with some auxiliary information) to deduce the points of interests characterizing his mobility, to predict his past, current and future locations or even to identify his social network.
A mobility trace of an individual comprises simply a location and a time stamp along with the identifier of the entity behind the trace. From a trail of traces of an individual (ie, a chronological sequence of mobility traces), an inference attack extracts the point of interests (POIs), which correspond to locations frequented by that an individual. A POI could be, for instance, a “home”, a “workplace”, a restaurant visited on a regular basis, a sport centre, a place of worship, the headquarters of a political party or a specialist medical clinic. The semantic of a POI can therefore leak highly personal information about this individual.
The application of a heuristic as simple as finding the location of the last mobility trace before midnight is likely to reveal a person’s “home” and the location whereat the individual spends most of his or her time during the day is likely to be the workplace. In order to identify all of a person’s POIs, a more systematic approach requires the removal of all the traces in which the individual is moving, squashing all the subsequent redundant traces in which the individual stays is immobile for at least 20 minutes into a single point and then running a clustering algorithm on the remaining traces. The output of the clustering algorithm is a set of clusters composed of traces that are close to each others and such that the median of each cluster can be considered as a meaningful POI.
Once the POIs characterizing an individual have been discovered, a mobility model can be constructed. For instance, a mobility Markov chain (MMC), built from the trail of a persons mobility traces, is a probabilistic automaton in which each state corresponds to a POI and an edge indicates a probabilistic transition between two states . The MMC, which represents the mobility behaviour of an individual in a compact and accurate way, can easily be used to predict the next location visited by considering the most probable transition leaving from the current state. Predictions derived from this naïve approach fare 70% to 90% accurate but more sophisticated models can be obtained by remembering the n last visited locations (for n=2 or n=3) instead of simply the current one or building a MMC for different periods of time (eg, by differentiating between the work days and the week-end or splitting one day into different time slices).
Once a mobility model has been created, it can also be used to perform a de-anonymization attack to identify an individual behind a mobility trace. Suppose, for example, that we have observed Alices movements over a period of time (eg, several days or weeks) during a training phase and that an MMC was derived from her traces. Later, if another geolocated dataset containings mobility traces of Alice is publicly released, the new dataset can be de-anonymized by linking it to the corresponding individuals (Alice) within the training dataset. Simply replacing the names of individuals with pseudonyms before releasing a dataset is rarely sufficient to preserve anonymity because the mobility traces themselves contain information that can be uniquely linked to an individual .
Finally, there exists a type of inference attack that partially reconstructs the social graph by assuming that two persons that are regularly in the same neighbourhood at the same time have a high chance of sharing a social link. In the future, it is likely that more and more personal information will be mined from mobility data as the collection of such data increases. For instance, some companies, such as Sense Networks (http://www.sensenetworks.com/macrosense.php), aim to use mobility data to develop detailed profiles of individuals (eg, predicting their income, their social habits or their age) by applying advanced machine learning techniques. The main remaining open question, therefore, is to determine which personal information cannot be inferred from mobility data. In summary, inference attacks highlight the privacy risks raised by the collection of mobility data and show the need to further investigate and design privacy-preserving variants of location-based services providing a fair balance between privacy and the utility of the resulting service .
 S. Gambs, M.-O. Killijian and M. del Prado Cortez, “Show me how you move and I will tell you who you are”, Transactions on Data Privacy 4(2), pp. 103-126, 2011.
 P. Golle and K. Partridge, “On the anonymity of home/work location pairs”, Pervasive 2009, pp. 390-397.
 M. Damiani, “Third party geolocation services in LBS: privacy requirements and research issues”, Transactions on Data Privacy 4(2), pp. 55-72, 2011.
Inria – Université de Rennes 1, France
Tel: +33 2 99 84 22 70