Developing a Distributed Electronic Health-Record Store for India

There has been much recent interest in information services that offer to manage an individual's healthcare records in electronic form, with systems such as Microsoft HealthVault and Google Health receiving widespread media attention. These systems are, however, proprietary and fears have been expressed over how the information stored in them will be used. In relation to these developments, countries with nationalized healthcare systems are also investigating the construction of healthcare information systems that store Electronic Health Records (EHRs) for their citizens.

Electronic Health Records for the more than one billion citizens of India. Photo: Staffan Truvé

The DIGHT (Distributed Information store for Global Healthcare Technology) project is addressing the challenges of building a scalable and highly reliable information store for EHRs for the citizens of India. The project partners are SICS and the Indian Centre for Development of Advanced Computing (C-DAC), where SICS is responsible for the distributed storage aspects of the project, while C-DAC will work towards evolving an EHR standard for India. The project will embrace both open-source technology and open standards to ensure that information is managed and secured in an accountable and transparent manner. We are not aware of any existing government-run information system that manages the enormous number of users that would be stored in an Indian EHR information store.

In many Western countries, the main problem of building a one-stop shop for patients' EHRs is the cost of integrating disparate existing healthcare systems. Typically, these systems do not easily interoperate due to the use of different relational databases and different media storage software, which makes data transfer across systems inconvenient or impossible. Another challenge for such systems is the use of distributed storage, as a centralized system of this scale would lead to limited scalability and poor availability characteristics. We are building a healthcare system from the ground up; less emphasis is placed on the integration problem due to a relatively small number of existing healthcare systems and EHRs in India.

The requirements for our EHR storage system include:

high data availability even in the presence of faults in the network or computer hardware (eg due to power outages, environmental disasters and regional strife)
high performance to ensure the system can function even under the high loads that may arise in emergency situations (such as a pandemic, large-scale accident or war)
security to protect patient data from misuse, unauthorized access or attacks.

While current relational database technology has matured to the extent that systems can store terabytes of data in a database cluster, existing centralized information storage architectures provide impediments to scalability and high availability. DIGHT will make use of lower-cost computer clusters that can be used to provide higher availability and better performance characteristics for lower hardware costs.

As part of this distributed approach, the project will develop data replication algorithms to ensure that security, performance and data availability requirements are met. The EHR store will be huge in size and the network environment will be challenging, with frequent network partitions. Our replication algorithms must take into consideration Brewer's Conjecture: it is impossible for a data store in an asynchronous network to simultaneously provide (i) partition tolerance, (ii) availability and (iii) consistency. Typically, systems can be built that simultaneously provide two of these three properties and it is generally assumed that designers of new systems should pick the two properties that are most important to their requirements. In DIGHT we will investigate the design of partially synchronous networks that can enable us to overcome this limitation.

State-of-the-art open-source replication for wide area networks (WANs) such as MySQL cluster, only support asynchronous replication: they provide no data consistency guarantees between clusters for data replicated between geographical locations. Consequently, in the event of the crash of a cluster there is potential for data loss. This is not acceptable in the healthcare domain. While strict consistency of multiple copies of replicated EHRs is intuitively the most desirable consistency model, this may unnecessarily degrade performance due to high latencies over WANs. However, since an EHR is a huge set of information, not all information will require strict consistency of data for the information system to function correctly. Hence, there is scope to identify and implement weaker consistency models to enable the system to function both efficiently and correctly. The idea is to define a set of consistency models with varying degrees of consistency and to associate them with different sets of information in an EHR depending on the consistency needs of the system.

As the system will be of extreme scale, with many clusters located throughout India, client software that accesses data in the information store will need to provide routing and lookup functionality to enable data updates and queries to be sent to the cluster where the replicated EHR of current interest is stored. In particular, we are investigating the use of Distributed Hash Table technology to build a scalable solution for discovery and retrieval of EHRs from clusters. Finally, we will incorporate suitable security policies and mechanisms to prevent corruption, misuse or theft of EHR data in distributed environments.

Link:
http://dight.sics.se/

Please contact:
Jim Dowling
SICS, Sweden
Tel: +46 8 633 1694
E-mail: jdowlingsics.se