In order to be handled, viewed or executed, digital objects require software environments. Most of these environments were designed with human interaction in mind, and this represents a major challenge for organizations wishing to use these now obsolete environments to handle huge numbers of objects in non-interactive ways for migration or in emulation.
Archiving and preservation organizations already have a large quantity of digital objects of various types created with a wide range of different GUI-oriented tools. At some point, new computer environments become unable to open or execute the original format of these objects. As a consequence, these organizations will need to convert the objects to a current, sustainable file format or would like to set up and emulate the original environment. Due to the scale of typical collections, the only financially and organizationally feasible way to support both actions and achieve both goals is by automated procedures.
A major challenge for deploying automated processes is the availability of suitable tools. In most cases, a digital object is best viewed using the application with which it was created or in its original environment. Most of these applications were programmed to be handled in an interactive manner, and little effort was put into automation for tasks such as batch handling of large numbers of files. A particular issue from the viewpoint of a digital archive manager is that spreadsheet, product design, audio/video or word processing programs cannot execute basic tasks such as the opening and saving of a file in another format as an unattended and fully automated task.
The attempt to later add such functions on to an application whose lifespan has already ended is in many cases simply impossible, since the source code and the required knowledge are no longer available. It is also becoming increasingly difficult to find staff able to operate the rising number of obsolete user environments.
Traditionally, so-called macro-recorders have been employed to help users automate interactive tasks to a certain degree. These are specialized tools or functions of an application or the user interface of an operating system that capture sequences of actions carried out; eg create a new file, open the address database and select an address, copy some text and save or print the file for serial letters. However, this functionality is not standardized, differs in its usability and features and might not be present in certain ancient environments at all.
Given these problems, the Planets Working Group (Preservation and Long-term Access through Networked Services) at Albert Ludwig University has suggested a different approach. The authors, together with a group of students, are exploring the option of handling typical repetitive tasks within specially wrapped hardware emulators. We hope to gain a perspective that is abstract enough to handle quite different tasks on applications in a reliable manner regarding defined in and outputs. The proposed method uses an operating system and application-independent interactive workflow for the migration or execution of digital objects using an emulated environment.
Figure 1: The Java VNC interface for the archivist to record workflows running interactively within the open-source processor emulator QEMU.
The approach is to interactively record a particular workflow once, such as installing a specific printer driver for PDF output, loading an old Word Perfect document in its original environment and converting it by printing into a PDF file. Such a recording can serve as the base for a deeper analysis and the generation of a machine script for the then completely automated repetition. An interactive workflow is defined as an ordered list of actions which are passed on to the emulated environment through a defined interface like the well-known Virtual Network Computing (VNC). These events may be mouse movements or keystrokes, and each is linked with a precondition and an expected outcome which can be observed as a state of the emulated environment. Until this effect is seen, the next event cannot occur. To link events with special preconditions and outcomes is necessary, since a workflow depends on the level of capacity of the emulation environment: programs will take different amounts of time to run depending on the load of the hosting machine, the size of the object being handled or the number of blocks already cached in memory. In the interactive case, this can occur, for example, through visual control by the user. For an automated run, the definition of expected states and a reliable verification is indispensable. We hope to produce time-independent action files which abstract in a machine way from written installation guidelines.
During the recording of a given workflow, the archivist is supported by an interface to a software archive for storing all additional necessary components like applications, operating systems, codecs, font sets and hardware drivers for the emulated machine. This additional service would extend the Planets framework, offering the possibility of interactively ingesting the software into the archive and enriching it with sufficient metadata. Such a supporting service would help to resolve the software dependencies originating with the object.
Dirk von Suchodoletz
University Freiburg, Germany
University Freiburg, Germany