The digital objects that are so fundamental to 21st-century life may have a precarious future due to the rapid pace of technological change. Digital preservation specialists have proposed an almost overwhelming variety of preservation actions and tools that may help to mitigate this risk, but there is a lack of empirical evidence to help librarians, archivists and non-specialists to make informed decisions about the most applicable and effective preservation tools. The Planets project (Preservation and Long-term Access through Networked Services) has developed a digital preservation Testbed that aims to provide such an evidence base.
The Planets Testbed is a freely available and easy-to-use controlled environment in which users can experience and compare different preservation tools and approaches through their Web browser. The Planets approach of a service-oriented architecture makes preservation tools available on heterogeneous platforms and in well-defined and controlled surroundings. The tools are given a Web service wrapper which exposes certain aspects of their functionality through a standardized vocabulary and therefore allows users to access them through the Testbed’s Web-based interface, as shown in Figure 1.
Figure 1: The testbed interface, listing services.
Through the Testbed, users can design and execute a variety of experiments, such as migration, emulation and executable preservation plan experiments. The focus of a migration experiment may be to analyse the performance and trustworthiness of tools that transform digital objects from one format (such as obsolete word processor files) into more up-to-date or preservable formats (such as PDF/A). The focus of an emulation experiment may be to investigate how effectively and accurately an obsolete digital object is perceived within an emulated hardware and software environment.
In order to perform such experiments, users can either provide their own data by uploading it to a dedicated FTP area or choose content from several large corpora of test files provided by the Testbed. These corpora cover a variety of popular and important file formats, and include edge-cases such as malformed PDFs and GIF files that have experienced bit rot. The Testbed not only provides access to this vast array of sample data, it also allows the exploration of corpora annotations that have been documented using the Extensible Characterization Definition Language (XCDL) developed by Planets, making them ideal control files for experimentation.
One of the principal aims of the Testbed is to create a shared knowledge base of digital preservation tool performance both on aggregated comparative measurements and on an individual experiment level. For this reason, experiment details, input files and outcomes are made available to all Testbed users. Furthermore, the Testbed facilitates the reproducibility of experiments: users can rerun any experiment to prove the validity of the results, or even adapt an existing experiment to specific requirements.
Testbed experiments follow a six-step process that is simple to use and flexible, as shown in Figure 2. At Step 1 the basic properties for the experiment are defined, including the overall experiment aims and objectives, contact details and references. In Step 2, the user formulates the design of the experiment, which includes selecting an experiment type and workflow, choosing the required preservation services and ‘fine-tuning’ parameters of the tool. Experiment input files are also selected at this stage. Step 3 executes the experiment workflow, gathers statistics relating to service execution and stores all results as output data, interim results and tool-specific log information within the Testbed.
Figure 2: The 6-Step Experiment Process.
At Step 4 the results of the experiment execution are displayed. Input and output files are listed (together with file properties such as name, size and a thumbnail if appropriate) and can be opened or downloaded if required. Additionally, this page displays records of every operation that was conducted on each digital object, even in the case of a failure.
In Step 5 the results can be analysed. In order to assess the effectiveness of a preservation tool we need to investigate how digital objects that have undergone a preservation action differ from their original form. A variety of characterization and identification services that can automatically extract digital object properties are offered at this stage, together with options for manually recording properties. By comparing the significant characteristics in the original objects with the post-preservation action objects (ie migrated files or perceived objects within an emulated environment) it is possible to gain an understanding of the effectiveness of a tool. Finally, in Step 6 the user can provide an overall evaluation of their experiment, stating how well the requirements were met along with any other factors the user wishes to document.
Planets is partially supported by the European Community under the Information Society Technologies (IST) Programme of the 6th FP for RTD - Project IST-033789. Development of the Testbed will continue until May 2010.
Humanities Advanced Technology and Information Institute (HATII)
University of Glasgow
Tel: +44 141 330 3392
Future Networks and Services
AIT Austrian Institute of Technology GmbH
Tel: +43 50550 4272