The SHAMAN project (Sustaining Heritage Access through Multivalent Archiving) will develop a next-generation digital preservation framework. Furthermore, it involves developing the relevant preservation tools for analysing, ingesting, managing, accessing and reusing information objects and data across libraries, archives or any other deployment scenario in which the SHAMAN ‘Theory of Preservation’ proves to be relevant.
The SHAMAN Theory of Preservation makes assertions about the ability to maintain the context, arrangement and management of information objects and the preservation environment itself, while taking into consideration:
- authenticity (the provenance of objects)
- 'respect du fonds' (the arrangement of objects)
- integrity (the management of objects)
- chain of custody (the ownership of objects)
- context of production (preservation, access and reuse).
These assertions also require that the functions performed by preservation processes remain consistent over time. We note that for these principles to apply, a theory of preservation must also make assertions about the information context managed by the preservation environment. Special attention must be given to the reuse of information objects across distributed repositories, as well as to securing the authenticity and integrity of the objects through time. These requirements led us to the present project.
Figure 1: The informal SHAMAN context (which is supporting the work in progress towards the final definition of the SHAMAN Reference Architecture).
Three prototypes will support the testing and validation of the results. These Integration & Demonstrator Subprojects (ISPs) cover real cases in memory institutions (ISP1), industrial design and engineering (ISP2) and scientific application domains in scenarios of e-Science (ISP3).
To achieve these goals, SHAMAN is focusing its research on: integrating data grid, digital library and persistent archive technology; developing support for context representation and annotation, with deep linguistic analysis and corresponding semantics; and modelling of preservation processes. In the end, SHAMAN is also expected to deliver a reference architecture for the design and development of solutions for digital preservation in distributed scenarios.
Figure 2: The informal and generic conceptual view for the SHAMAN architectural framework.
Until now, the project has been busy performing state-of-the-art analyses (such as reviewing OAIS), better understanding the real usage scenarios, defining solution architectures, and developing the first set of demonstrators. In first half of the project, the focus was on the analysis and development of the ISP1 scenarios, and the analysis of the ISP2 scenarios. These results will be presented at the review to be held in early 2010, and will be publicly disseminated after that.
The third year will focus on the revision of the ISP1 scenarios, the implementations of the ISP2 scenarios, and the analysis of the ISP3 scenarios. The final year will focus on the implementation of the ISP3 scenarios, the revision of the other two and the consolidation of the results. The final definition of the SHAMAN reference architecture will be an especially significant result expected for this term.
The consortium comprises a well-balanced group from academia, research labs, industry and intended final users. The full list of partners is available on the project Web site. SHAMAN also has strong connections to the United States, including collaboration with the Data Intensive Cyber Environments (DICE) research group, which leads the development of the open-source iRODS (Integrated Rule-Oriented Data System). The group is based at the DICE Center at the University of North Carolina at Chapel Hill, and the Institute for Neural Computation at the University of California, San Diego. We expect iRODS to play a fundamental role in SHAMAN, with its openness and flexibility supporting our vision and proposed strategies.
SHAMAN is a Large Integrated Project co-financed by the European Union within the 7th Framework Programme. It will run from 2008 to 2011.