by Pierangela Samarati, (Università degli Studi di Milano)
MOSAICrOWN - Multi-Owner data Sharing for Analytics and Integration respecting Confidentiality and OWNer control - is a Horizon 2020 project that aims at enabling data sharing and collaborative analytics in multi-owner scenarios in a privacy-preserving way, ensuring proper protection of private, sensitive, and confidential information. MOSAICrOWN will provide effective and deployable solutions allowing data owners to maintain control on the data sharing process, enabling selective and sanitized disclosure providing for efficient and scalable privacy-aware collaborative computations.
The application of data analysis techniques over large data collections provides great benefits, to the personal, business, research, and social domains. The availability of large data collections recording actions and choices of individuals and organizations can lead to great improvement in the understanding of how the world operates. The continuous evolution of ICT is enabling the realization of such vision at a fast pace, supporting the realization of architectures enabling collaborative data sharing and analytics. Clear obstacles towards the realization of such potential and vision are security and privacy concerns. Indeed, the loss of control over data and potential compromise of their confidentiality can have a strong detrimental impact on the realization of an open framework for enabling the sharing of data from multiple independent data owners.
The goal of providing effective data protection in multi-owner scenarios entails several challenges. MOSAICrOWN tackles such challenges with a gradual approach, addressing first policy specification and data governance, and then developing enabling technologies providing data wrapping and data sanitization techniques for enforcing data protection.
Policy specifications – Data governance framework
MOSAICrOWN provides a data governance framework for managing data and for specifying policies in multi-owner collaborative scenarios. MOSAICrOWN first identifies all relevant requirements and protection needs. The step from requirements to specifications, understandable for data owners, requires capturing the different concepts that need to be expressed providing a metadata model for referencing data. As data owners need to regulate the use, sharing and processing of their data, MOSAICrOWN is designing a formal model and a declarative policy language that also non-specialists can use for specifying different protection regulations. The model is based on solid foundations to understand the effect of policy specifications and to reason on actual protection guarantees. The language supports restrictions on the whole data processing life-cycle and is compatible with existing technology, so that it can be deployed in real systems. As data collections of different owners may also need to be combined or processed together to conduct analysis, MOSAICrOWN is also investigating solutions for policy management.
MOSAICrOWN is defining techniques to wrap data with a protection layer, guaranteeing access functionality while preserving protection. Data wrapping techniques need to support different kinds of functionality, as they need to be used in all phases of the data-life cycle: in data ingestion by data owners, to move self-protected data to the market while enabling fine-grained data retrieval; in data storage by the data market provider, before releasing data to external third parties for enabling their elaboration while satisfying the protection policies; and in data analytics by the data market provider, to combine different data sources and produce a result that satisfies the policies of all the data owners. The design of data wrapping techniques is complicated by the need of ensuring efficiency and scalability of computations over wrapped data. MOSAICrOWN also considers economic incentives, which can be given to data owners for the use of data, and economic benefits that can derive from the use of less expensive Cloud Infrastructures.
MOSAICrOWN is designing efficient and scalable enforcing techniques that work on whole data collections to provide an obfuscated or aggregated version of the data, robust against possible re-identification, linkage, and correlation attacks. The distributed and multi-owner nature of the considered scenario makes the design of such techniques a difficult task, which requires the consideration of several challenges. First, sanitization techniques must protect data while preserving their utility for the expected computations. Second, sanitization must be applied in respect of the policy associated with such a data collection, regulating the required level of privacy and utility. Third, analysis and computations can involve data collections under the control of different data owners and possibly subject to usage and sharing restrictions. To address such challenges, MOSAICrOWN is designing sanitization techniques selectively operating at different granularity level and techniques for supporting computations over sanitized data.
Figure 1: The MOSAICrOWN protected data and data market scenario.
The result of MOSAICrOWN will be a set of modular tools providing for an enriched data market scenario and protection to data across the whole life-cycle.
MOSAICrOWN is coordinated by Università degli Studi di Milano. ERCIM EEIG is a partner of the project. The consortium also includes Dell EMC Information Systems International, Mastercard, SAP SE, and Università degli Studi di Bergamo.
Università degli Studi di Milano, Italy