by Pierre-Antoine Champin (ERCIM/W3C)
RDF-star, an extension to the Resource Description Framework (RDF) is the next big thing in the field of linked data and knowledge graphs. The European H2020 project MOSAICrOWN was instrumental in its development towards a W3C standard.
Linked data [L1] is a set of W3C standards for exchanging raw data on the web, in a syntactically and semantically interoperable way. While this notion emerged several years before the current trend of knowledge graphs, linked data can be viewed (and is often presented) as a foundation for a web-scale distributed knowledge graph. However, the power of linked data can be leveraged in other contexts beyond the open world-wide web.
In the MOSAICrOWN project (Multi-Owner Data Sharing for Analytics and Integration Respecting Confidentiality and Owner Control), datasets uploaded to a data market are described by a wealth of meta-data. Among these meta-data are the policies, which describe how, by whom, and for what purposes, the data owner authorises the dataset to be used. There are advantages to using linked data to describe the policies and the rest of the meta-data. First, the flexible structure and explicit semantics of linked data allow it to efficiently integrate heterogeneous metadata from multiple providers and provide a natural way to link that metadata to its data. Second, policies are expressed using an existing linked data format, also recommended by W3C: the Open Digital Rights Language (ODRL) [L2]. Finally, with linked data being rooted in standard technologies, a number of robust open-source implementations are available, which we were able to deploy and adapt for the needs of the MOSAICrOWN use-cases.
Recently, the linked data ecosystem has been challenged by the emergence of property graphs, a family of graph databases. Property graphs share with linked data the graph structure that makes them flexible and expressive. Property graphs, however, are not a standard technology since each system vendor has its own “flavour” of property graph. This causes interoperability problems and vendor lock-in, but it also hampers the emergence of a consolidated stack of tools for data querying, data validation, etc.
The strength of property graphs, however, lies elsewhere. Their graph data model is rich and intuitive, and has gained much popularity among software developers. It is also perceived by many as easier to use than linked data. Furthermore, many design patterns that are frequently used in property graphs do not directly translate straightforwardly to linked data. This is unexpected, as both are based on a graph model, and this raises questions about the ability of linked data to continue serving as an interoperability layer in the age of property graphs.
Clearly, linked data needs to evolve. This has been discussed within the linked data community from as far back as 2012 during the Dagstuhl seminar on semantic Data Management [L3]. But it really gained traction during the 2019 W3C workshop on Web Standardisation for Graph Data [L4] where Olaf Hartig presented his and Bryan Thompson’s extension to linked data, called RDF* (read “RDF star”). In October 2020, eleven commercial and open-source products were known to implement RDF*. However, these implementations were based on different versions and different interpretations of Hartig and Thompson’s work and were not fully interoperable.
The partners in the MOSAICrOWN project were no strangers to the limitations of linked data that RDF* was aiming to solve. It was quite clear that MOSAICrOWN use cases could benefit from the additional expressiveness. The group decided that some time must be dedicated to building consensus around RDF*, and in the long term this must be integrated into the linked data ecosystem as a proper W3C standard.
Under the umbrella of the RDF-DEV W3C Community Group [L5], a group of RDF* implementers and enthusiasts gathered in October 2020 to produce a common specification for RDF* (and its query language SPARQL*). In the process, the effort was renamed RDF-star, in part to avoid confusion with previous versions. In December 2021, the resulting specification [L6] is considered as nearly finished by the group, which is now focusing on moving this work to the W3C standard track.
In the meantime, interest in RDF-star continued to grow, with invited talks given at Lotico [L7] (~200 attendees) and the Knowledge Graph conference [L8]. MOSAICrOWN partners also organised a workshop [L9] in conjunction with the SEMANTiCS conference. The workshop received nine submissions and attracted around 40 participants.
RDF-star is attractive to linked data users who want to benefit from the added expressiveness inspired by the property graphs world. It is attractive to property graph users, as it bridges the gap between a popular data model and the standard and interoperable tools that linked data provides. Through cross-fertilisation, we expect that a future RDF-star W3C Recommendation will make linked data an even more powerful set of standards.
Pierre-Antoine Champin, ERCIM/W3C, France