by Irini Fundulaki and Sören Auer
The Linked Data paradigm has emerged as a powerful enabler for publishing, enriching and sharing data, information and knowledge in the Web. It offers a set of best practices that promote the publication of data on the Web using semantic web technologies such as URIs and RDF, support the exchange of structured data to be done as easily as the sharing of documents, allow the creation of typed links between Web resources and offer a single, standardized access mechanism. In particular, the Linked Data shift is based on (1) using Universal Resource Identifiers (URIs) for identifying all kinds of “things”, (2) making these URIs accessible via the HTTP protocol and (3) providing a description of these things in the Resource Description Format (RDF) along with (4) URI links to related information (see Tim Berners-Lee’s Linked Data design principles http://www.w3.org/DesignIssues/LinkedData.html).
The RDF format is a relatively simple but a powerful formalism for representing information (very close to natural language) in triple statements consisting of a subject, predicate and object, where again each of them can be a URI (or an atomic value in the case of object). As a result, the World Wide Web of documents (and Intranets) is complemented with a Web of Linked Data, where everybody can publish, interlink and enrich data. This Linked Data Web has some interesting characteristics:
- URIs serve two purposes: identifying “things” and serving as locators and access paths for information about these “things”.
- Since everybody can coin his own URIs by simply using the address of some webspace under his control, the Web of Linked Data is as distributed and democratic as the Web itself.
- Identifiers defined by different people or organizations can be mixed and meshed.
- Linked Data published in various locations can be easily integrated by merging the sets of RDF triple statements thus dramatically simplifying data integration using special purpose links as defined in semantic web languages such as OWL.
- The same triple statement formalism is used for defining structure and data, thus overcoming the strict separation found in relational or XML databases.
Linked Data management goes beyond the classic data management approaches that assume complete control over schema and data. In the Web that is distributed and open, users and applications do not have control over the data. For this reason, the Linked Data lifecycle was introduced that involves issues such as (a) data extraction from unstructured or semi-structured sources and representing them in the RDF data model (b) storage and querying of data (c) manual revision and authoring (d) interlinking and fusing of related data in order to enable data integration across highly heterogeneous sources (e) classification and enrichment with additional upper level structures such as ontologies and rich vocabularies (f) quality analysis (g) evolution and repair and (h) search/browsing and exploration [1]. The LOD2 project (http://lod2.eu) has created a comprehensive stack of tools for supporting these different aspects of Linked Data management [2].
Linked Data deployments benefit society, research and enterprises and support a great variety of different application areas. The Linked Open Data (LOD) movement is a growing trend for a variety of organizations and in particular governmental ones, to make their data accessible in a machine-readable form. The result of this effort is the creation of the Linked Open Data Cloud that during its last inventory in 2011 consisted of already 31 billion RDF triples from 295 datasets. These datasets contain user-generated content as well as content covering a variety of different domains, such as media, geographic, government, bibliographic data, and life sciences. A number of datasets have sprung from this effort, the most prominent one being DBpedia that is a community effort to extract structured information from Wikipedia, widely used in data integration efforts. DBpedia data are published in a consistent ontology and are accessible through multiple SPARQL endpoints.
The European Commission is one of the main evangelists of the adoption of Linked Data practices for opening government data to Europe’s people and institutions. Through its ISA Programme the Commission provides “good practices and helpful examples to help public administration apply Linked Data technologies to eGovernment” (https://joinup.ec.europa.eu/sites/default/files/D4.3.2_Case_Study_Linked_Data_eGov.pdf). Moreover, the European Commission has been funding a number of projects related to Linked Data in the context of the 7th Framework Programme (FP7).
This special theme on Linked Data comprises 22 articles presenting some of the diversity of Linked Data research, technology and applications in Europe. The variety of topics include:
- foundational issues such as benchmarks of Linked Data infrastructure (Angles et al.: “Benchmarking Linked Open Data Management Systems”),
- experimental evaluation (Ferro & Silvello: “Making it easier to Discover, Re-Use and Understand Search Engine Experimental Evaluation Data”),
- Linked Data metadata catalogs (Vandenbussche & Vatant: “Linked Open Vocabularies”, Stegmaier et al.: “Lost in Semantics? Ballooning the Web of Data”) and
- registries (Vandenbussche et al: “SPARQL: A Gateway to Open Data on the Web?”) as well as
- archiving and evolution (Papastefanatos & Stavrakas: “Diachronic Linked Data: Capturing the Evolution of Structured Interrelated Information on the Web”).
Two articles tackle the management of spatial (Athanasiou et al.: “GeoKnow: Making the Web an Exploratory for Geospatial Knowledge”) and statistical (Petrou & Papastefanatos: “Publishing Greek Census Data as Linked Open Data”) linked data.
The heterogeneity of linked data demands for:
- approaches for querying, browsing and visualization (Pham & Boncz: “MonetDB/RDF: Discovering and Exploiting the Emergent Schema of RDF Data”, Hoefler & Mutlu: “CODE Query Wizard and Vis Wizard: Supporting Exploration and Analysis of Linked Data”, Micsik et al.: “Browsing and Traversing Linked Data with LODmilla”, Sack & Plank: “AV-Portal - The German National Library and Technology’s Semantic Video Portal” ) as well as
- analytics (Roatiș:“Analyzing RDF Data: A Realm of New Possibilities”, Hall et al.: “The Web Science Observatory - The Challenges of Analytics over Distributed Linked Data Infrastructures).
An important application area of Linked Data are:
- research infrastructures, for example, for marine research (Fugazza et al: “RITMARE: Linked Open Data for Italian Marine Research”, Tzitzikas et al.: “Ontology-based Integration of Heterogeneous and Distributed Information of the Marine Domain”),
- virtual earth observatories (Kyzirakos et al.: “Building Virtual Earth Observatories Using Scientific Database and Semantic Web Technologies”) or
- history (Marx: “Linking Historical Entities to the Linked Open Data Cloud) and
- employing Linked Data in education (d’ Aquin & Dietze: “Open Education: A Growing, High Impact Area for Linked Open Data”, Mikroyannidis et al.: “Raising the Stakes in Linked Data Education”).
Further application areas represented through articles in this special issue are:
- real-time data (Martínez-Prieto et al: “A SOLID Architecture to Weather the Storm of Real-Time Linked Data”) and the
- publishing domain (Dirschl et. al., “Supporting the Data Lifecycle at a Global Publisher using the Linked Data Stack”).
References:
[1] S. Auer et al.: “Introduction to Linked Data and its Lifecycle on the Web”, in Reasoning Web, Semantic Technologies for Intelligent Data Access - 9th International Summer School 2013, Mannheim, Germany, Springer LNCS. ISBN 978-3-642-39783-7, http://dx.doi.org/10.1007%2F978-3-642-39784-4_1
[2] S. Auer et al.: “Managing the life-cycle of Linked Data with the LOD2 Stack” in proc. of ISWC 2012, http://iswc2012.semanticweb.org/sites/default/files/76500001.pdf
Please contact:
Irini Fundulaki
ICS-FORTH, Greece
E-mail:
Sören Auer
University of Bonn and Fraunhofer IAIS, Germany
E-mail: