Digital Preservation Research: An Evolving LandscapeKeynote by Pat Manson
Digital preservation research tackles the problems of keeping - preserving - digital content, particularly that which is born digital and, therefore, by definition does not exist in any other format. As early as the mid 1990s the European Commission recognised that this was an emerging and important issue and started funding pioneering research projects in digital preservation. At that time the challenge of managing digital content so that it could be accessed and used reliably in the future was one that was being confronted mainly by national libraries and archives, the key institutions with the mandate to keep publications and records for the future. They were at the sharp end of facing the problems posed by the new shifts towards electronic journals and towards electronic records.
Today the picture has changed and continues to change rapidly. In 2007, the International Data Corporation (IDC) estimated that the current size of the digital universe was 161 billion gigabytes or 161 exabytes and that this would increase sixfold by 2010. By 2008 it calculated it had already expanded to 281 exabytes and revised its four year estimate upwards from sixfold to tenfold. From a problem faced by archives and libraries, digital preservation is an issue affecting all domains which rely on digital data be it administrations, industry, research. Even as individuals we record all aspects of our lives and create and store our memories in digital form, through photos, blogs, e-mails etc. <>/p
One irony of the information age is that keeping information has become more complex than it was in the past. We not only have to save physical media and electronic files, we also need to make sure that they remain compatible with the hardware and software of the future.
So what does this mean for research? Of course, research is continuing to explore and develop solutions that will support libraries and archives, including audio-visual archives, in more efficient and cost-effective preservation, through automating workflows and decision making. However, there are new challenges for research arising out of: the increasing dependency on digital resources; the increasing volumes and complexity of digital resources; and the risks of losing digital resources or of having information that is no longer usable or understandable. At the same time, research needs to address the needs of organisations that are only now beginning to face the problems of keeping their digital content so that its authenticity and integrity can be maintained while ensuring that it can be transformed and used by new systems in the future.
European research is at the forefront of anticipating these challenges. Through FP6 and FP7 the objectives for the research have moved from a library/archive centric view to one that is increasingly focused on understanding the challenges posed by the nature of the digital content itself. This is leading our research projects to tackle new methods for web archiving ensuring the authenticity and integrity of the archived content which is characteristically distributed, dynamic and disappearing if not captured at the right point in time. The average life of a web page is less than that of the house fly. Scientific data (eg earth observation data) requires preservation systems that can handle significantly large volumes, document the original context, and curate the data so that it is usable and combinable in future uses. Often it is not an option to go back and re-capture these data and, for example, our models of climate change depend heavily on being able to understand and use data collected in the past often for different purposes. Digital objects are increasingly complex, combining text, image and embedded software. And that describes the objects of today without taking account of emerging softwares that may impact on their use in the future. At the same time, research needs to support the needs of organisations that are only now beginning to face the problems of keeping their digital content so that its authenticity and integrity can be maintained while ensuring that it can be transformed and used by new systems in the future.
As the volumes of information, the diversity of formats and types of digital object increase, digital preservation becomes a more pervasive issue and one which cannot be handled by the current approaches which rely heavily on human intervention. Research is needed on making the systems more intelligent. We need to accelerate the move from human monitoring and decision making to embedding reasoning and intelligence in the systems themselves.
For the research community, the challenge is also to build new cross-disciplinary teams that integrate computer science with library and archival science (and even with social and historical sciences). We need to ensure that future technological solutions for preservation are well founded and grounded in understanding what knowledge from the past and from today we need to keep for the future.
Head of Unit
Unit E3 "Cultural Heritage & Technology Enhanced Learning"
Information Society and Media Directorate-General
The views expressed in the article are the sole responsibility of the author and in no way represent the view of the European Commission and its services.