by Terence Delsate, Xavier Lessage, Mohamed Boukhebouze and Christophe Ponsard (CETIC)
INAH (The Institute of Analytics for Health) platform is created to enable ethical and secure use of medical data in statistical and medical research. This platform could benefit society and improve both patient life quality and public health while ensuring medical data privacy.
Like all domain, healthcare is revolutionized by the access and analysis of (big)data to create social and economic value. Several health players such as universities, life science companies and authorities are interested in accessing and analyzing medical data to accelerate medical research and move to personalized, predictive and preventive medicine, which enables improving the patient life quality and helps to implement health policies accordingly. However, the use of these medical data has to deal with several technical, juridical, and ethical challenges to preserve privacy of patients and ensure the quality and security of medical data.
To deal with these challenges, the Walloon government launches the development of the INAH platform that allows secure and ethical access to medical data. This platform, which is led by our research center and the FRATEM (Regional Federation of Medical Telematics Associations [L1]), enables multicentric analysis of medical data as part of medical and statistical research projects by respecting following five key principles:
- Sovereignty of data providers who remain free to participate in the submitted projects as well as have control and governance over their data.
- Trust, based on ethical and secure access, forbidding both direct and indirect identification of patients.
- Partnership though collaboration between data users, data providers and practicians around research projects.
- Velocity ensuring fast, efficient access, and freshness of data.
- Conformity to medical standards (e.g., SNOMED) and legislation (e.g., GDPR).
Figure 1 depicts the technical architecture of INAH platform. To ensure the data sovereignty principle, the medical data do not leave data providers infrastructure. Consequently, INAH platform relies on distributed data warehouses (virtual data lake), hosted at the data providers premises (infrastructure). We refer these data warehouses, together with the data access control component that is implemented as INAH Remote. These remote data warehouses follow the same data model and contain the extraction of the key medical concepts. To preserve privacy, no clear patient identifiers are encoded inside the remote data warehouse. For each ingested data, the patient identifier is pseudonymized with a secret key, each key being different for each remote instance . These keys are hosted in a Trusted Third Party (TTP), which exposes a pseudonymization service. Therefore, two remote instances are not compatible with each other, since not a single patient will be represented in the same way. The remaining suite of the remote instance is used to manage the communication between the data sources and the central platform, implementing all the necessary security checks, such as the fact that a specific request has been authorized by the data source manager, and to perform the actual analytical tasks.
Figure 1: Architecture of the INAH platform.
In INAH platform, the medical data access is requested by submitting a project. This latter is evaluated by an approbation committee, which pre-approves or not the project from an ethical and a scientifical point of view. Once the first validation is passed, the project will be sent to each data providers (ethical committee), who can decide to participate or not to the project, and further choose which part of its data could be used for the project. After the approval of a project, the access authorization token is generated. Based on this token, the data users can express their queries from the INAH to define a statistical population . The expressed queries are sent to the data providers in order to locally identify the patient forming this population. A multisite synchronization procedure can then be applied to merge the result. In this procedure, a project specific pseudonymisation is applied using a secret key in the TTP. This second pseudonymisation helps to prevent inter-project data crossing for the same data user. INAH platform enables to exploit the defined statistical population in different ways: the population could be monitored in real time (for instance for an epidemic survey), a specific dataset can be requested (such as comorbidity of a newly vaccinated population), specific statistical quantities could be extracted (to provide a non-exhaustive list of examples of possible use cases).
The INAH platform is currently deployed as pilot project within three major Walloon hospitals and also involves local life science companies. It is raising the interest of health actors (e.g., public authorities, universities, pharmaceutical companies). The next step is to launch the exploitation of the platform with the collaboration of the hospital federations.
We thank Dr. A. Vandenberghe for his fruitful contribution. We also thank all the INAH project partners as well as our internal technical team, R. Michel, O. Dridi, A. Nuttinck for their great job. This work is supported by AVIQ and SPW-EER and funded by the Walloon government.
 D. Darquennes: “Privacy models in healthcare data processing”, CETIC Lecture, November 2019.
 T. Delsate and M. Boukhebouze: “Anonymous multi-source counting statistics New methodology applied to the health sector”, ICTS4HEALTH workshop, Barcelona, 2019.
+32 497 78 59 51