Mirko Manea and Marinella Petrocchi
Sharing data among groups of organizations and/or individuals is essential in a modern web-based society, being at the very core of scientific and business transactions. Data sharing, however, poses several problems including trust, privacy, data misuse and/or abuse, and uncontrolled propagation of data. We describe an approach to preserve privacy whilst data sharing based on scientific Data Sharing Agreements (DSA).
The EU FP7 funded project Coco Cloud (Confidential and Compliant Clouds) is a three-year collaborative project, which started in November 2013. The project aims to facilitate data sharing in the Cloud by providing end-to-end data centric security from the client to the cloud, based on the automated definition and enforcement of Data Sharing Agreements (DSA).
Coco Cloud focuses, in part, on a case study provided by the Quiron Spanish hospital group. A common practice at Quiron is to sign agreements with external doctors, seen as ‘small hospitals’ that generate their own sensitive data whilst at the same time accessing patients’ data stored on the Quiron Cloud. The main purposes of this data sharing are: i) to provide health information to the physicians treating the hospital’s patients and ii) to refine diagnoses and therapies by including additional opinions on patients’ medical data from specialists and healthcare researchers.
Traditionally, hospitals use legal documents to regulate the terms and conditions under which they agree to share data. A key problem in the digital world is that the constraints expressed in such contracts remain inaccessible from the software infrastructure supporting the data sharing and management processes . Coco Cloud approaches this issue by adopting electronic DSA.
An electronic DSA is a human-readable, yet machine-processable contract, regulating how organizations and/or individuals share data. A DSA consists of:
- Predefined legal background information, like traditional contracts. A subject matter expert (e.g., a lawyer) usually provides this information. Such data is unstructured, i.e., not organized in a predefined manner.
- Structured user-defined information, including the validity period, the involved parties, the data and, most importantly, the policy rules that constrain how data can be shared among the parties. Domain experts and end users compile these fields.
One of the key areas for Coco Cloud is the design and implementation of the DSA infrastructure, from the definition of DSA templates (predefined drafted agreements encoding, for instance, rules of law) to DSA disposal . A DSA has a complex lifecycle, consisting of several stages, as depicted in Figure 1.
Figure 1: DSA Lifecycle.
The template definition stage is a preliminary phase in which a pool of available DSA templates is created, according to the purpose of the data sharing and the data classification to be regulated by the DSA.
The authoring stage is an editing tool-assisted phase during which the stakeholders prepare the DSA itself. The result of the authoring stage is an electronic, human readable DSA document.
The human readable DSA is then translated into a machine-readable document with rules for verification and formal checking in the analysis stage. The authoring and analysis stages are iterative: they are repeated until the DSA rules satisfy all the required properties being checked (e.g., conflicting rules checking). The DSA rules are then translated to a set of enforceable security policies during the mapping stage. The enforcement stage is the phase in which the DSA is enacted on the specific data being shared. A DSA enters the final disposal stage when the contracting parties agree that this DSA is no longer useful.
During the first year of the project, activities concentrated on: 1) design of a user-friendly authoring tool, guiding the users throughout DSA definition; 2) formalizing the agreement writing by programmatically encoding the typical sections that lawyers currently embed in paper; 3) studying the applicable Terms of Law (both national and international), to define the legal constraints that must hold when scientific or medical data are to be shared and used within a community.
In particular, we have proposed a `three-step authoring phase’. In step 1, legal experts populate a DSA template, encoding the applicable legal policies (e.g., EU Directive 95/46/EC on personal data protection). In step 2, domain experts define a context-specific policy (e.g., healthcare policy professionals define the organization-specific constraints for medical data subject to scientific investigations). Finally, in step 3, the end-users optionally fill some input forms to set their privacy preferences (e.g., consenting to the processing of their data).
The Coco Cloud project partners are HP Italy (coordinator), the Italian National Research Council, SAP, Imperial College London, Bird & Bird, ATOS, University of Oslo, AGID, and Grupo Hospitalario Quiron.
Figure 2: DSA Authoring Tool.
Future work will investigate ways to easily define policies for a specific domain or context (e.g., healthcare or government). We are planning to use standard Web ontologies (e.g., SNOMED CT  for healthcare) to define domain vocabularies and leverage them to implement an authoring tool that is easy-to-use but able to express semantically sound policy rules with guiding wizards. A first mockup of the DSA authoring tool is shown in Figure 2.
 M. Casassa-Mont et al.: “Towards safer information sharing in the Cloud”, Intl. Journal of Information Security, Springer, August 2014.
 R. Conti et al.: “Preserving Data Privacy in e-Health”, Engineering Secure Future Internet Services and Systems, Springer, 2014, 366-392.
 International Health Terminology Standards Development Organisation (IHTSDO), http://ihtsdo.org/snomed-ct/, 2014
Hewlett-Packard Italiana, Italy