Long-term preservation of earth science data in the European Space Agency has been studied using a framework constituted by components developed in the EU CASPAR project.
Earth Observation (EO) Space Missions provide global coverage of the Earth across both space and time, generating on a routine and continuous basis huge amounts of data (from a variety of sensors) that must be acquired, processed, elaborated, appraised and archived by dedicated systems. ESA-ESRIN, the European Space Agency Centre for Earth Observation, is the largest European EO data provider and operates as the reference European centre for EO payload data exploitation. The long-term preservation of both EO data and the ability to discover, access and process them is clearly a fundamental issue and a major challenge at all levels (programmatic, technological and operational). The need to address this challenge is one of the reasons why ESA participates in several EU-funded projects in addition to conducting an internal research program.
CASPAR (Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval), an Integrated Project co-financed by the EU within the Sixth Framework Programme (Priority IST-2005-2.5.10, ‘Access to and preservation of cultural and scientific resources’), has built a framework to support the end-to-end preservation life cycle for digital information, based on the OAIS reference model, with a strong focus on the preservation of the knowledge associated with the data. Three testbeds have been established to validate CASPAR solutions in the domains of Cultural Heritage, Contemporary Performing Arts and Earth Science. ESA plays the role of both user and data/infrastructure provider in the Earth Science domain and has built a number of dedicated scenarios for testing purposes.
The main objective of the ESA scientific testbed is the preservation of the ability to process data over different levels, ie the ability to generate higher-level products (using auxiliary data and suitable processors) starting from raw satellite-acquired data. ESA’s first demonstrator focused on GOME (Global Ozone Monitoring Experiment, a sensor on board ESA ERS-2 satellite) data; specifically the ability to produce Level 1C data (fully calibrated) from Level 1B data (raw signals plus calibration parameters). The ESA testbed can demonstrate the preservation of this GOME processing chain with respect to changes of the operating system or compilers/libraries/drivers which affect the ability to run the GOME Data Processor.
Figure 1: opening of the ozone hole during austral spring. This image was produced from GOME derived data. Image: ESA
The preservation scenario is the following: a complete and OAIS-compliant GOME L1 processing dataset has been ingested into the ESA CASPAR System (a framework developed in ESA-ESRIN using only CASPAR components by Advanced Computer Systems ACS SpA, technical partner for the testbed implementation). At a certain point an external event affects the ability to run the processor (eg a library or the operating system changes) and a new L1B->L1C processor has to be developed/ingested to preserve the ability to process data from L1B to L1C. These changes must be catered for by ensuring correct information flow through the ESA CASPAR System, the system administrators and the users.
The ESA testbed is divided into four phases:
1. ESA CASPAR System setup: a basic EO ontology has been developed based on a specialization of the ISO 21127:2006 CIDOC-CRM ontology, representing relationships and dependencies of GOME data and OAIS representation information stored on the CASPAR system. (Representation information maps a data object onto more meaningful concepts, eg the ASCII definition that describes how a sequence of bits – the data object – is mapped onto a symbol.)
2. Ingestion of data and related representation information: the ingestion process allows the data producer to ingest an OAIS compliant dataset composed of GOME L1B data, the L1B->L1C processor and the representation information including all knowledge related to the GOME data and processor.
3. Data access, browsing, searching and retrieval: CASPAR is able to provide a user asking for L1C data not only with the related L1B data plus the processor needed to generate them, but also with all the information needed to perform this process, depending on the user’s needs and knowledge (ie different Designated Communities will retrieve different representation information during the same search-and-retrieve session to fill their specific knowledge gap).
4. Software processing preservation (upgrade): the preservation phase can be summarized as follows.
- an external event affects the processor (eg a library or the operating system changes) and an alert sent through CASPAR by informed users is forwarded to the system administrator
- the system administrator uses the ontology to identify which modules need to be updated
- the system administrator is able to retrieve, download and work on the source code of the processor to deliver a new version of the processor
- the new processor, with appropriate (updated) OAIS preservation description information and representation information, is reingested into the CASPAR system
- an alert mechanism notifies the users that a new version of the processor is available
- the new processor can be directly used to generate Level 1C products, meaning the scientific capabilities of users are maintained.
The above scenario has been implemented in ESA-ESRIN through a Web-based interface and has demonstrated the effectiveness of the CASPAR preservation framework in the Earth Science domain. The ESA CASPAR System is available (for further enhancement and testing) to users and data owners/providers interested in a practical approach to preservation using CASPAR solutions.
The need to preserve and link Earth Science tools and data has become more evident recently and the ESA-ACS team is confident that the CASPAR solutions will be increasingly adopted in the years to come.
ACS c/o ESA-ESRIN, Italy
Tel: +39 06 94180561