View other issues

Contents

The Planets Interoperability Framework

by Ross King

The Planets Interoperability Framework is a software infrastructure for the preservation of digital documents, and was developed as part of the European Integrated Project Planets. It provides the technical environment that governs the integration of the Planets end-user applications with preservation services and data repositories. Since most of the institutions that will be interested in the Planets applications already have some kind of archiving system in place, archival storage is not part of what Planets delivers. Instead, our approach is to provide a framework and services that can be integrated with existing systems. The design of the framework was driven by the requirements of logical preservation in libraries and archives, including demands for a robust and extensible infrastructure for the characterization and migration of digital documents.

The Planets Interoperability Framework (henceforth referred to as the IF) provides a service-based infrastructure that leverages a number of standards and open source tools. The core of the IF implementation is based on the Java Enterprise Edition (Java EE 5) standard, which among other things provides a framework for the efficient implementation of Web services and Web applications. The IF installation package includes a pre-configured JBoss application server that provides common services like single-sign-on, user management, authorization and authentication. This application server provides the container for web-based preservation applications (the Planets Testbed application and the Planets Preservation Planning Tool, PLATO). In addition, a number of commonly required software components and their associated APIs are bundled with the IF; for example, a component for user management, a component for service registration and discovery, and a component for executing preservation workflows. Preservation action tools are deployed as Web services are hosted as a distributed network.

The Interoperability Framework enforces a set of standard Web service profiles for preservation services and a common model for the digital objects on which preservation actions are carried out. The interfaces define atomic preservation actions such as Identify, Characterize, Compare, Modify, Migrate, and View. Preservation tools that are provided using these interfaces can be easily registered with a Planets IF instance and immediately used within Planets workflows. A preservation workflow typically consists of a sequence of parameterized preservation actions, carried out in a specific order, in which the output parameters of one action are validly mapped to the input parameters of the following action. An example of a preservation workflow could be: for a given file, first identify a file format, then validate the document against the format, then determine a number of significant characteristics, then migrate it to a new format, then characterize the new file, then compare with the original. Each of these steps involves a number of different services within the Planets architecture; hence, the orchestration of these services is required. In general, the IF allows one to formally and technically describe preservation processes through a workflow template system. The aim of this approach is to shield the user from the complexity of the underlying architecture and implementation issues, allowing non-experts (i.e. librarians and archivists) to create and execute preservation workflows.

Thus the Workflow Execution Engine (WEE) provides an essential component within the IF service environment. We surveyed service orchestration approaches and experimented with WS-BPEL (Web Service Business Process Execution Language). WS-BPEL is an XML-based workflow description language for SOAP-based Web services. Within the IF, experimental preservation workflows were implemented using WS-BPEL v2.0 definitions and the JBPM (JBoss Business Process Management module) as a Workflow Execution Engine. The Eclipse BPEL Visual Designer served as a graphical interface for designing and visualizing the process flow. However, work in this direction was hindered by two difficulties; first, that the BPEL language is quite powerful but also low-level and hence complex; and second, at the time we conducted the experiments, BPEL related-tools proved to be not yet mature. Both points turned out to be a major hindrance for implementing preservation workflows by non-BPEL experts.

Consequently, we chose to implement a much simplified, custom workflow description language and corresponding execution engine. The Planets WEE is based on a high-level application programming interface (API) and a corresponding template mechanism. This allows workflow developers to build abstract workflow definitions from Java components and serialize them into XML document. The Java components may act upon a preservation service or provide utility functions such as metadata manipulations. We implemented a template-repository service that allows users to choose from various abstract workflow scenarios (templates). Using the WEE, selected workflow templates can be dynamically configured and executed based on simple XML descriptions, which also can be generated from a visual workflow design tool.

In the fourth and final year of the project, we have explored different options for improving the scalability of preservation workflows. First, we have improved the IF architecture to allowing clustering of the application server and database layer. Second, we have demonstrated the use of Planets services with open source workflow engines like Taverna and Triana. Finally, we have performed experiments with the IF making use of data-intensive computations using the Amazon utility cloud infrastructure (AWS) .

To summarize, the Planets Interoper-ability Framework provides the glue that holds together the Planets user applications and preservation services. It enforces a technical contract (the service interfaces) and semantic interoperability (the digital object model) between the various services of a preservation workflow and provides a number of commonly required software components. Making use of the Planets IF, workflow templates, and preservation tool suite can save an organization effort, time, and money by basing preservation workflows on existing best practices, or by re-using existing preservation patterns.

Links:
http://www.planets-project.eu

Farquhar, A., Hockx-Yu, H. “Planets: Integrated Services for Digital Preservation.” International Journal of Digital Curation, Vol. 2, No. 2 (2007):
http://www.ijdc.net/index.php/ijdc/article/view/45/31

Rainer Schmidt, Christian Sadilek, Ross King. “A Workflow System for Data Processing on Virtual Resources.” International Journal on Advances in Software, IARIA, ISSN 1942-2628, Vol. 2, No.2&3 (2009):
http://www.iariajournals.org/software/tocv2n23.html

Please contact:
Ross King
AIT Austrian Institute of Technology GmbH/AARIT, Austria
Tel: +43 50550 4271
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.