by Keith G. Jeffery and Rebecca Koskela
RDA is all about facilitating researchers to use data (including scholarly publications and grey literature used as data). This encompasses data collection, data validation, data management (including preservation/curation), data analysis, data simulation/modelling, data mining, data visualisation and interoperation of data. Metadata are the key to all of these activities because they present to persons, organisations, computer systems and research equipment a representation of the dataset so that the dataset can be acted upon.
Metadata are defined by some as ‘data about data’. In fact there is no difference between metadata and data except for the purpose for which they are used. An electronic library catalogue card system provides metadata for the researcher finding an article or book but data for the librarian counting the articles on biochemistry. Metadata are used both by humans and by computers; however for scalability and virtualisation (hiding unnecessary complexity from the end-user), it is necessary to ensure that metadata can be both read and ‘understood’ by computer systems. This leads to the mantra ‘formal syntax and defined semantics’: humans can overcome inconsistencies and vagueness but computers cannot.
Figure 1: Purposes of Metadata and their Relationships
Metadata Purposes
Metadata are used commonly for (a) discovery, (b) contextualisation and (c) detailed processing [1]. In the case of discovery the metadata must be sufficient for the human or computer system to find the datasets / data objects of interest for the purpose. The higher the quality of the discovery metadata, the greater the precision (accuracy) and recall (completeness) of the discovery. Typical ‘standards’ in the discovery area are DC (Dublin Core) and CKAN (Comprehensive Knowledge Archive Network)
In the area of contextual metadata perhaps the most widely used ‘standard’ (An EU Recommendation to Member States, used in 43 countries, adopted by Elsevier and Thomson-Reuters) is CERIF (Common European Research Information Format) which covers persons, organisations, projects, products (including datasets), publications, patents, facilities, equipment, funding and – most importantly – the relationships between them expressed in a form of first order logic with both role and temporal attributes [2].
The detailed processing metadata are typically specific to a research domain or even an individual experiment or observation method. They include schema information to connect software to data and also parameters necessary for correct data processing such as precision, accuracy or calibration information.
Metadata in RDA: As indicated above, metadata are used extensively in all aspects of RDA activity. However there are four groups that are specialising in metadata. They are: MIG (Metadata Interest Group): the overarching long-term group to discuss metadata and to work with Working Groups (WGs ) of 18-month duration doing specific tasks; MSDWG (Metadata Standards Directory WG): developing a directory of metadata standards so a user can look up appropriate standards for their purpose and/or research domain; DICIG (Data in Context IG): developing through use cases the requirements within and across research domains for contextual metadata; RDPIG (Research Data Provenance IG): concentrating on providing provenance information for datasets. These groups arose spontaneously ‘bottom-up’ but are now coordinating among themselves to form a strong metadata presence in RDA.
Moving Forward
The metadata groups have agreed on a joint forward plan. It consists of the following steps:
1. Collect use cases: a form has been prepared and is available on the website together with a use case example both written and on the form;
2. Collect metadata ‘standards’ into the MSDWG directory;
3. Analyse content of (1) and (2) to produce a superset list of all elements required and a subset list of common elements by purpose – so called ‘packages’ of metadata elements:
4. Test those ‘packages’ with research domain groups in RDA (we have already volunteers!) and adjust based on feedback;
5. Present the ‘packages’ to the TAB (Technical Advisory Board) of RDA for authorising as recommendations from RDA to the community.
The metadata groups plan to meet jointly, and jointly with chairs of other groups, at RDA Plenary 5 in San Diego. You are welcome to join RDA, register for P5 and be involved.
Acknowledgements
The authors acknowledge the contributions of colleagues in the metadata groups of RDA and particularly: Jane Greenberg, Alex Ball, Bridget Almas, Sayeed Choudhury, David Durbin.
Links:
http://dublincore.org/metadata-basics/
http://dublincore.org/
http://ckan.org/
http://www.eurocris.org/Index.php?page=CERIFreleases&t=1
https://rd-alliance.org/groups/metadata-ig.html
https://rd-alliance.org/groups/metadata-standards-directory-working-group.html
https://rd-alliance.org/groups/data-context-ig.html
https://rd-alliance.org/groups/research-data-provenance.html
References:
[1] K.G. Jeffery, A. Asserson, N. Houssos, B. Jörg: “A 3-layer model for Metadata”, in CAMP-4-DATA Workshop, proc. International Conference on Dublin Core and Metadata Applications, Lisbon September 2013. http://dcevents.dublincore.org/IntConf/dc-2013/schedConf/presentations?searchField=&searchMatch=&search=&track=32
[2] K.G. Jeffery, N. Houssos, B. Jörg, A. Asserson: “Research Information Management: The CERIF Approach”, Int. J. Metadata, Semantics and Ontologies, Vol. 9, No. 1, pp 5-14 2014.
Please contact:
Keith G. Jeffery
Keith G. Jeffery Consultants, UK
E-mail: