by Haridimos Kondylakis, Lefteris Koumakis, Manolis Tsiknakis, Kostas Marias and Stephan Kiefer
The iManageCancer project is developing a data management infrastructure for a cancer specific self-management platform designed according to the patients’ needs.
A recent report by the eHealth Task Force entitled “Redesigning health in Europe for 2020” focuses on how to achieve a vision of affordable, less intrusive and more personalized care, ultimately, increasing quality of life as well as lowering mortality. Such a vision depends on the application of ICT and the use of data, and requires a radical redesign of e-health to meet these challenges. Two levers for change as identified by the report are “liberate the data” and “connect up everything”. Fully exposing, integrating, linking and exploring health data will have a tremendous impact on improving the integrated diagnosis, treatment and prevention of disease in individuals. In addition it will allow for the secondary use of care data for research transforming the way in which care is provided .
The iManageCancer H2020 EU project, started in February 2015, aims to provide a cancer specific self-management platform designed according to the needs of patient groups while focusing, in parallel, on the wellbeing of the cancer patient with special emphasis on avoidance, early detection and management of adverse events of cancer therapy but also, importantly, on psycho-emotional evaluation and self-motivated goals. The platform will be centered in a Personal Health Record which will regularly monitor the psycho-emotional status of the patient and will periodically record the everyday life experiences of the cancer patient in terms of side effects of therapy, while different groups of patients and their families will share information through diaries, and clinicians are provided with clinical information.
The data collected in this context are complex, with hundreds of attributes per patient record that will continually evolve as new types of calculations and analysis/assessment results are added to the record over time (volume). In addition, data exist in many different formats, from textual documents and web tables to well-defined relational data and APIs (variety). Furthermore, they pertain to ambiguous semantics and quality standards resulting from different collection processes across sites (veracity). The vast amount of data generated and collected comes in so many different streams and forms — from physician notes, personal health records, images from patient scans, health conversations in social media (variability), to continuous streaming information collected from wearables and other monitoring devices (velocity). These data, if used to their full potential, may have a tremendous impact on healthcare, delivering better outcomes at a lower cost (value). As such, key questions to address include: How can we develop optimal frameworks for large-scale data-sharing? How can we exploit and curate data from various electronic and patient health records, assembling them into ontological descriptions relevant to the practice of systems medicine? And how can we manage the problems associated with large scale medical data?
Figure 1: The data management architecture of the iManageCancer platform.
The high level data management architecture the iManageCancer is shown in Figure 1. Within iManageCancer, Apache Cassandra is used as an instantiation of a “data lake” concept to store this vast amount of raw data in its native format including structured, semi-structured and unstructured data. Cassandra is an open-source, peer-to-peer and key value based store, where data are stored in key spaces, has built-in support for the Hadoop implementation of MapReduce and advanced replication functions and is currently considered state of the art for real-time big data analysis. On top of the Cassandra, the Semantic Integration Layer  pushes selected data to the Semantic Data Warehouse. Using this architecture we can select which of the available data should be semantically linked and integrated by establishing the appropriate mappings to a modular ontology . Then these data are queried, transformed into triples and loaded to the Semantic Warehouse where they are available for further reasoning and querying. A benefit of the approach is that we can recreate from scratch the resulting triples at any time. However, for reasons of efficiency the data integration engine periodically transforms only the newly inserted information by checking the data timestamps.
In addition to off-line transformation, on-line transformation is also possible by issuing transformation events through an event bus. As such, our architecture adopts a variation of the command-query responsibility segregation principle where, in order to update information, one uses a different model to the model that one is using to read. We choose to store the original data using NoSQL technologies owing to their ability to handle enormous data sets and their “schema-less” nature. But the limited flexibility of their query mechanisms is a real barrier for any application that has not predetermined access use cases. The Semantic Warehouse component in the iManageCancer platform fills these gaps by effectively providing a semantically enriched and search optimized index to the unstructured contents of the Cassandra repository. Therefore, our approach tries to offer best of both worlds: efficient persistence and availability of heterogeneous data, and semantic integration and searching of the “essence” of the ingested information.
A key next step is to develop the eHealth services on top of this data management infrastructure and to test the whole platform in a real-world context in two clinical pilots, one for children starting in 2016 and one for adults starting in 2017. Big Data management is undoubtedly an important area for healthcare, and it will only become more critical as healthcare delivery continues to grapple with current challenges.
Redesigning health in Europe for 2020: http://ec.europa.eu/digital-agenda/en/news/eu-task-force-ehealth-redesigning-health-europe-2020
 H. Kondylakis et al.: “Digital Patient: Personalized and Translational Data Management through the MyHealthAvatar EU Project”, EMBC, 2015.
 H. Kondylakis and D. Plexousakis: “Exelixis: Evolving Ontology-Based Data Integration System”, SIGMOD/PODS, 2011.
 H. Kondylakis et al.: “Agents, Models and Semantic Integration in support of Personal eHealth Knowledge Spaces”, WISE 2014.