by Nicolas Anciaux, Benjamin Nguyen and Michalis Vazirgiannis
When requesting bank loans, social care, tax reduction, and many other services, individuals are required to fill in application forms with hundreds of data items. It is possible, however, to drastically reduce the set of completed fields without impacting the final decision. The Minimum Exposure Project investigates this issue. It aims at proposing an analysis, framework and implementation of an important privacy principle, called Limited Data Collection.
Personal data collection is a prerequisite to well-tailored services, which are in the interest of both service provider and applicant. A classical way to collect such data is to issue application forms. When considering privacy from the applicant's point of view it is unquestionable that the personal information harvested in these forms must be reduced to a minimum necessary to make the correct decision.
Minimizing the data collected has also become a financial issue for service providers. Collected records are threatened by data breaches, which are not a marginal problem. In 2011, various sources, such as the Open Security Foundation, have reported tens of millions of personal records subject to such breaches. The average cost per exposed record has been estimated at $194 by security organizations such as the Ponemon Institute. Moreover, recent laws enacted worldwide (in 46 US states and EU) now compel companies to publicly report incidents and assist victims in minimizing their effect.
While better securing servers is a hot research topic, it is still not possible to provide a 100% secure environment. Our project proposes an orthogonal approach. With regards to legislation, our work assumes a strict understanding of the privacy principle termed “Limited Data Collection”, which states that requested sets of personal data must be limited to the minimum necessary to achieve the purpose the user consents to.
The precise contribution of the Minimum Exposure project is to provide guidelines, algorithms, and a framework to implement Limited Data Collection, since nothing exists currently.
Current practices fail to comply with this principle for the following reason: It is impossible to distinguish a priori which data will be useful (or not) to make the decision at the time the application form is filled; not only does information harvesting depend on the purpose, it also depends on the contents. Consider the (simple) collection rules based on the following tax rate reduction example: revealing an income under $30,000 and an age below 25 may be enough, or simply an income below $10,000, regardless of age. Alternatively, revealing simply a sufficient number of dependants (eg two) could suffice. For a user with values u1=[income=$25,000, age=21, nb_dependants=1] the minimum data set would be [income, age]. For a user with u2=[income=$40,000, age=35, nb_dependants=2] it would be [nb_dependants]. Hence, the organization issuing the form cannot specify a priori a minimum set of attributes needed to make its decision since this decision depends on looking at the value of all attributes available.
To circumvent this issue, our framework proposes to bind collection rules with application forms. The framework is depicted below, and illustrated by the following scenario : When a user wants to apply to a service, she ① downloads the application forms provided by the service provider and fills in the form given the information she owns, ② runs a Minimum Exposure process to compute the minimum set of data items to provide in the application form to obtain the benefits she wants using the collection rules, ③ validates and signs the resulting application form and sends it to the service provider. The service provider can ④ run its decision processes using the contents of the form and calibrate its offer.
Figure 1: Scenario for limiting data collection
The Minimum Exposure project faces challenging problems at the intersection of data mining, secure data computation, and operational research. First, the collection rules attached to application forms must cover any decision making system ranging from simple disjunction of conjunction of predicates enacted by law (eg e-administration scenario) to highly complex systems based on data mining techniques (eg neural networks used in credit scoring for bank loans applications). Second, the decision making process is often private for the service provider, and the related collection rules must not be revealed, leading to adoption of secure tokens (like smart cards) in the architecture. Third, identifying the minimum set of information to be sent while still preserving the final decision is a NP-Hard problem. Application forms can be very large in practice (eg hundreds fields in loan application forms or tax declarations), leading to introduction of approximation algorithms based on heuristics adapted to the topology of these rules.
The Minimum Exposure project is a collaboration started in 2011 between INRIA, University of Versailles and Ecole Polytechnique. It is partially funded by the DIGITEO Letevone Grant and by the INRIA CAPPRIS initiative.
 N. Anciaux, B. Nguyen, M. Vazirgiannis, “Limiting Data Collection in Application Forms: A real-case application of a Founding Privacy Principle”, in IEEE 10th annual conference on Privacy, Security and Trust (PST), 2012, to appear.
University of Versailles and INRIA, France