by Filip Kruse and Jesper Boserup Thestrup
Enter the matrix: Trials, tribulations – and successes – of doing an inter-institutional data management project in a matrix organization
“Neo, sooner or later you’re going to realize just as I did that there’s a difference between knowing the path and walking the path.” (Morpheus, The Matrix, 1999)
The aim of the project "Data Management in Practice" was to establish a Danish infrastructure setup with services covering all aspects of the research data lifecycle: from application and initial planning, through discovering and selecting data and finally to the dissemination and sharing of results and data. Further, the setup should include facilities for training and education. Researchers’ needs and demands from active projects – hence the “in Practice” – should form the basis of the services. Finally, the project should explore the role of research libraries regarding research data management.
The project can be described as a hybrid between a purely case-based project with individual institutions each working on their own sub-projects, and a thematic project with institutions working within one or more themes. Six themes were active: Data Management Planning; Data capture, storage and documentation; Data identification, citation and discovery; Select and deposit for long-term preservation; Training and marketing toolkits; and Sustainability. Each of the participating institutions worked on specific cases, such as ongoing research projects, well-defined data collections etc. The cases covered the main academic fields of Humanities, Social Sciences, Science and Technology.
The Humanities and Health cases spanned audio visual data collections, data on Danish web materials, and Soeren Kierkegaard’s writings. The Social Science (SAM) cases consisted of survey data from local elections, and qualitative linguistic data, while the Health case (SUN) covered data on liver diseases (cirrhosis). The Science Technology cases dealt with data from the Kepler mission, on wind energy, and on the registration and preservation of artic flora and fauna.
If we take the Humanities case of LARM (The Royal Danish Library’s Sound Archive for Radio Media) as an example, on the one hand the result was a Danish operational version of the DCC’s DMP online [L1], freely available via DeiC [L2] to Danish researchers. On the other hand, it turned up new challenges. Regarding Data identification, citation and discovery, sharing of the data encountered the problem that some of the data are sensitive or protected by copyright. This had two implications. Firstly, an additional facility for deposit of data with restricted access. This repository is at the moment awaiting decision for activation. Secondly, a requirement for a legal framework for handling data, leading to a model agreement on data management.
As the projects within the cases used different infrastructures already available on their respective mother institutions, the work on the second theme “Data capture, storage and documentation” produced no common results, but a wide array of local experiences. This unintended consequence demonstrated that an all-encompassing infrastructure able to cover the needs of research projects from all scientific areas is an impossibility, at least for now.
It was a requirement of the third theme “Data identification, citation and discover” that the different cases should deposit data in institutional repositories. These, however, were not readily available at the project institutions. Instead, the work led to the outline of recommendations based on the cases to facilitate the theme’s objective – datasets should provide metadata based on the DataCite format, they should also have a DOI identifier and researchers should have an ORCID.
The fourth theme “Select and deposit for long-term preservation” led to the establishment of an open access data repository: Library Open Access Repository (LOAR) by The Royal Library, Aarhus. The work included assessment of PURE as a possible institutional repository concluding that PURE has many, but not all, of the features necessary for an institutional research repository.
The fifth theme “Training and marketing toolkits” developed the freely accessible DataflowToolkit [L3] in order to assist researchers in doing data management. This tool thus synthesises experiences gathered from the activities in the different cases.
The sixth and final theme of the project “Sustainability” addressed how (and if) infrastructure services developed as part of the work on the specific cases could continue after the termination of project.
The matrix organization of the project ensured both a high degree of adaptability to new conditions and an adherence to the project objectives. One might say that it overcame the difference between knowing and walking the path.
Figure 1:
The Matrix Organization of the Project Data Management in Practice.
RUC – Roskilde University,
KB – The Royal Library (merged in 2017 with the State and University Library as The Royal Danish Library),
DDA – Danish Data Archive
DTIC – DTU Library, Technical Information Center of Denmark
SB – State and University Library, now The Royal Danish Library
AUB – Aalborg University Library
SUB – University Library of Southern Denmark.
The project was funded evenly by DEFF, Denmark’s Electronic Research Library [L4] and the participating institutions. The project period: March 2015 – June 2017, final report January 2018.
Links:
[L1] https://dmponline.dcc.ac.uk/
[L2] https://www.deic.dk/en
[L3] https://dataflowtoolkit.dk/
[L4] https://www.deff.dk/english/
Reference:
[1] Data Management in Practice, Results and Evaluation http://ebooks.au.dk/index.php/aul/catalog/book/243
Please contact:
Filip Kruse, Jesper Boserup Thestrup, Royal Danish Library, Denmark,