by Fabrício A. B. da Silva, Mário J. Silva and Francisco M. Couto
The Epidemic Marketplace is a distributed data management platform where epidemiological data can be stored, managed and made available to the scientific community. The Epidemic Marketplace is part of a computational framework for organising and distributing data for epidemic modelling and forecasting, dubbed Epiwork. The platform will assist epidemiologists and public health scientists in sharing and exchanging data.
The Epidemic Marketplace is an e-Science platform for collecting, storing, managing and providing epidemic semantically annotated data collections. In recent years, the availability of a huge volume of quantitative social, demographic and behavioural data has spurred an interest in the potential of innovative technologies to improve disease surveillance systems, by providing faster and better geo-referenced outbreak detection capabilities. These capabilities depend on the availability of finely-tuned models, which require accurate and comprehensive data. However, the increasing amount of data introduces the problem of data integration and management. New solutions are needed to ensure that data are correctly stored, managed and made available to the scientific community.
The Epidemic Marketplace is part of a European research effort, the Epiwork project, a four-year project started in 2009. Epiwork supports multidisciplinary research aimed at developing the appropriate framework of tools and knowledge needed for the design of epidemic forecast infrastructures, to be used by epidemiologists and public health scientists. The project is a truly interdisciplinary effort, anchored to the research questions and needs of epidemiology research by the participation of epidemiologists, public health specialists, mathematical biologists and computer scientists. The Epidemic Marketplace is the Epiwork data integration platform, where epidemiological data can be stored, managed and made available to investigators, thus fostering collaboration. The objectives of Epiwork in which the Epidemic Marketplace will play a direct role: (1) the development of large scale, data driven computational models endowed with a high level of realism and aimed at epidemic scenario forecast; (2) the design and implementation of original data-collection schemes motivated by identified modelling needs, such as the collection of real-time disease incidence. This is achieved with the use of innovative Internet and ICT applications; (3) the set up of a computational platform for epidemic research and data sharing that will generate important synergies between research communities and states.
The architectural requirements of the Epidemic Marketplace are directly related to the objectives of the Epiwork project and have been defined according to the feedback from its partners. The main functional requirements of the Epidemic Marketplace are:
- Support the sharing and management of epidemiological data sets. Registered users should be able to upload annotated data sets, and a data set quality assessment mechanism should be available.
- Support the seamless integration of multiple heterogeneous data sources. Users should be able to have a unified view of related data sources. Data should be available from streaming, static and dynamic sources.
- Support the creation of a virtual community for epidemic research. The platform will serve as a forum for discussion that will facilitate the sharing of data between providers and modellers.
- Distributed Architecture. The Epidemic Marketplace should implement a geographically distributed architecture deployed in several sites for improved data access performance, availability and fault-tolerance.
- Support secure access to data. Access to data should be controlled. The marketplace should provide single sign on, distributed federated authorization and multiple access policies, customizable by users.
- Support data analysis and simulation in grid environments. The Epidemic Marketplace will provide data analysis and simulation services in a grid environment.
- Workflow. The platform should provide workflow support for data processing and external service interaction.
The main non-functional requirements that have been identified for the Epidemic Marketplace are:
- Interoperability: The Epidemic Marketplace must interoperate with other software. Its design must take into account the future possibility that systems developed by other researchers worldwide may need to query the Epidemic Marketplace catalogue for access to its datasets.
- Open-source: All software packages, and new modules required for the implementation and deployment of the Epidemic Marketplace should be open source.
- Standards-based: To guarantee software interoperability and the seamless integration of all geographically dispersed sites of the Epidemic Marketplace, the system will be built according to standards defining web services, authentication and metadata.
The Epidemic Marketplace can be defined as a distributed virtual repository, a platform supporting transparent, seamless access to distributed, heterogeneous and redundant resources. It is a virtual repository because data can be stored in systems that are external to the Epidemic Marketplace, and it provides transparent access because several heterogeneities are hidden from its users. The Epidemic Marketplace is composed of a set of interconnected data management nodes geographically distributed, sharing common canonical data models, authorization infrastructure and access interfaces.
Figure 1: An envisioned deployment of the Epidemic Marketplace distributed among several locations. Currently, only the Lisbon node has been deployed. Each Epidemic Marketplace node is composed of four modules: repository, MEDcollector, forum and mediator. The mediator will be the contact point for other applications, such as Internet Monitoring System nodes (eg Gripenet), and for clients that show data in a graphical and interactive way using geographical maps and trend graphs.
As shown in Figure 1, each Epidemic Marketplace node has the following modules:
- Repository: stores epidemic data sets and an epidemic ontology to characterize the semantic information of the data sets.
- Mediator: a collection of web services that will provide access to internal data and external sources, based on a catalogue describing existing epidemic databases through their metadata, using state-of-the-art semantic-web/grid technologies.
- MEDcollector: retrieves information relating to real-time disease incidences from publicly available data sources, such as social networks. After retrieval, the collector groups the incidences by subject and creates data sets to store in the repository.
- Forum: allows users to post comments on integrated data from other modules, fostering collaboration among modellers.
A first prototype version of the Epidemic Marketplace is already in use internally. This prototype implements several of the main features of the outlined architecture such as data management and sharing support and secure access to data, and is currently being populated with epidemic data collections. Several open-source tools and open standards are being used in the Epidemic Marketplace implementation and deployment process, such as the Fedora Commons for the implementation of the main features of the repository. Access control in the platform uses the XACML, LDAP and Shibolleth standards. The front-end of this first prototype is based in Muradora, but the next version will also include a front-end based in the Drupal content management system.
The Epiwork project is funded by the European Commission under the Seventh Framework Programme. The authors want to thank the Epiwork project partners, the students that have been working in the initial implementation of the Epidemic Marketplace (Luis Filipe Lopes, Patricia Sousa, Hugo Ferreira and João Zamite), the CMU-Portugal partnership and FCT (Portuguese research funding agency) for its support.
Epiwork Project: http://www.epiwork.eu
Epidemic Marketplace: http://epiwork.di.fc.ul.pt/
Fedora Commons: http://www.fedora-commons.org/
Mario J. Silva
Universidade de Lisboa, Portugal