Eliot Salant (IBM Israel Research Lab)
We have created a framework to enable secure, policy-driven data exchange, applying sophisticated, fine-grained access control to data stores. Incorporated into the EU Horizon2020 HEIR project, we have demonstrated the flexibility and power of our Privacy-aware Framework (PAF) in a number of use cases, including automatic anonymisation of data on export to a third party, and redaction of data based on user role and organisation affiliation.
The ability to exchange digital data between data owners and data requesters is critical to today’s society. However, as the amounts and types of data proliferate, so do the requirements for data governance to protect the rights of the data subject – the person who can be identified by the data. Whereas in the past access to data was typically controlled by Access Control Lists (ACL) or Role Based Access Control (RBAC), the rise of new legislation such as GDPR (General Data Protection Regulation) requires a policy-driven model, such as Attribute Based Access Control (ABAC), to enforce broader conditions such as geographic or purpose of use constraints.
HEIR [L1] is a three-year EU H2020 project with fifteen partner organisations, which started in 01/09/2020. HEIR’s focus is on the world of healthcare and aims both to provide cyber protection for hospital/medical centres, and a framework for securely sharing medical data.
There were many requirements behind the design of the PAF in HEIR. We needed a framework that could work with data stores such as FHIR  servers or healthcare registries. Fine-grained redaction at the attribute level within healthcare FHIR records was required – the FHIR standard only dictates an access classification at the resource level. Passwords to access data stores need to be securely stored, and not distributed to application developers. The transfer of potentially large amounts of data needs to be done efficiently. The data path must be securely locked down to prevent data leakage.
To help meet these goals, the HEIR PAF built on of Fybrik ([L2]), an open-source framework being developed by IBM Research. The Fybrik framework receives declarative, human-readable files to configure a secured path between the data requester and the data source, known as a Data Plane. These files would typically be created by different actors in the healthcare centre environment as shown in Figure 1.
Figure 1: Use of Fybrik to create a PAF.
The Data Plane constitutes a workflow, extracting data from the source, performing any policy-mandated redaction actions, and potentially transforming the data into a form required by the data requestor.
How does this work?
Before Fybrik can bring up the data path, a data access policy must be defined, data source(s) catalogued, and PII fields in the schema tagged. In our use cases, the data access policy will be based on user information. To this end, a data requestor needs to log in to an Identity and Access Management system to obtain a token (JSON Web Token), which authenticates the user and encodes information such as the user role and organisation affiliation. This token will need to accompany every request for data from the PAF.
The PAF securely stores the access credentials to the FHIR server, keeping them hidden from application programmes. This guarantees that all access requests FHIR data must pass through the PAF, and therefore are subject to governance rules.
Data policy definition and evaluation is handled by use of the Open Policy Agent (OPA) [L3] in the Fybrik Policy Manager. OPA policies are defined in a language called Rego, and allow for evaluation of parameters supplied at runtime, such as the role of the requestor. Policy rules that match evaluation time conditions return an action, such as “RedactColumn <column list>” or “DeleteColumn <column>” which will be enforced before the data is returned to the requestor. Evaluation of policy allows the Fybrik Module to decide whether to allow an incoming request to pass to the FHIR server backend, and it then decides if the returned data needs to be transformed to support data redaction or anonymisation requirements.
In our healthcare use cases, FHIR resources are our data sources. While Fybrik can support commercial data catalogues such as IBM’s Watson Knowledge Catalog, our HEIR implementation uses an internal catalogue that holds metadata to assign FHIR resource attributes with labels, such as PII (Personal Identifiable Information). Consequently, we are able to define policy rules that are specific to the attributes of the requestor (e.g. role, affiliation) that redact on the FHIR attribute level. For example, we can have a rule that will anonymise all PII information from Observation records being returned to a researcher, whereas another rule will allow patients to view all their information without redaction.
The PAF allows much finer-grained access to FHIR data than envisioned by the FHIR standard, significantly improving the ability to share data while conforming to policy. Additionally, policies can be dynamically determined and can factor in a wide variety of parameters beyond those associated with typical Access Based Control, such as geographic and time period constraints.
While extremely powerful in the world of healthcare, this framework can be used to protect data in virtually any other sector.
 HL7 FHIR. (n.d.). HL7 International. Retrieved from https://hl7.org/fhir/.
Eliot Salant, IBM Research, Israel