Special Theme

Image ERCIM News 96 cover page

ERCIM News 96
January 2014
Special theme: Linked Open Data
Guest editors: Irini Fundulaki (Institute of Computer Science, FORTH) and Sören Auer (University of Bonn and Fraunhofer IAIS)

This issue in pdf (52 pages)

ERCIM news 135

ERCIM news 135

ERCIM news 134

ERCIM news 134

ERCIM news 133

ERCIM news 133

ERCIM news 132

ERCIM news 132

ERCIM news 131

ERCIM news 131

ERCIM news 130

ERCIM news 130

Back Issues Online

Back Issues Online

Contents

Linked Open Data - Introduction to the Special Theme

by Irini Fundulaki and Sören Auer

The Linked Data paradigm has emerged as a powerful enabler for publishing, enriching and sharing data, information and knowledge in the Web. It offers a set of best practices that promote the publication of data on the Web using semantic web technologies such as URIs and RDF, support the exchange of structured data to be done as easily as the sharing of documents, allow the creation of typed links between Web resources and offer a single, standardized access mechanism. In particular, the Linked Data shift is based on (1) using Universal Resource Identifiers (URIs) for identifying all kinds of “things”, (2) making these URIs accessible via the HTTP protocol and (3) providing a description of these things in the Resource Description Format (RDF) along with (4) URI links to related information (see Tim Berners-Lee’s Linked Data design principles http://www.w3.org/DesignIssues/LinkedData.html).

Building Virtual Earth Observatories Using Scientific Database and Semantic Web Technologies

by Kostis Kyzirakos, Stefan Manegold, Charalampos Nikolaou and Manolis Koubarakis

TELEIOS is a recent European project that addresses the need for scalable access to petabytes of Earth Observation (EO) data and the identification of hidden knowledge that can be used in applications. To achieve this, TELEIOS builds on scientific databases, linked geospatial data and ontologies. TELEIOS was the first project internationally that introduced the Linked Data paradigm to the EO domain, and developed prototype services such as the real-time fire monitoring service that has been used for the last two years by decision makers and emergency response managers in Greece.

GeoKnow: Making the Web an Exploratory Place for Geospatial Knowledge

by Spiros Athanasiou, Daniel Hladky, Giorgos Giannopoulos, Alejandra Garcia Rojas and Jens Lehmann

The GeoKnow project aims to make geospatial data accessible on the Web of Data, transforming the Web into a place where geospatial data can be published, queried, reasoned, and interlinked, according to Linked Data principles.

Open Education: A Growing, High Impact Area for Linked Open Data

by Mathieu d'Aquin and Stefan Dietze

Education is now entering a revolution in the form of open education, where Linked Open Data has the potential to play a vital role. The Web of Linked Educational Data is growing with information about courses and resources, and emerging as a collective information backbone for open education.

Education has often been a keen adopter of new information and communication technologies. This is not surprising given that education is all about informing and communicating, and it is currently entering a revolution in the form of open education. This requires the use of state-of-the-art technologies for sharing, publishing and connecting information globally, free from technological barriers and cultural frontiers: namely, Linked Data [1].

Raising the Stakes in Linked Data Education

by Alexander Mikroyannidis, John Domingue and Elena Simperl

There is currently a revolution going on in education generally, but nowhere more so than in the ICT field, owing to the availability of high quality online learning resources and MOOCs (Massive Open Online Courses). The EUCLID project is at the forefront of this initiative by developing a comprehensive educational curriculum, supported by multimodal learning materials and highly visible eLearning distribution channels, tailored to the real needs of data practitioners.MOOCs (Massive Open Online Courses) offer large numbers of students the opportunity to study high quality courses with prestigious universities. These initiatives have led to widespread publicity as well as strategic dialogue in the higher education sector. The consensus within higher education is that after the Internet-induced revolutions in communication, business, entertainment and the media, it is now the turn of universities. Exactly where this revolution will lead is not yet known but some radical predictions have been made, including the end of the need for university campuses (http://www.theguardian.com/education/2012/nov/11/online-free-learning-end-of-university).

RITMARE: Linked Open Data for Italian Marine Research

by Cristiano Fugazza, Alessandro Oggioni and Paola Carrara

The RITMARE (la Ricerca ITaliana per il MARE – Italian Research for the sea) Flagship Project is one of the National Research Programmes funded by the Italian Ministry of University and Research. Its goal is the interdisciplinary integration of national marine research. In order to design a flexible Spatial Data Infrastructure (SDI) that adapts to the audience's specificities, the necessary context information is drawn from existing RDF-based schemata and sources. This enables semantics-aware profiling of end-users and resources, thus allowing their provision as Linked Open Data.

Lost in Semantics? Ballooning the Web of Data

by Florian Stegmaier, Kai Schlegel and Michael Granitzer

Although Linked Open Data has increased enormously in volume over recent years, there is still no single point of access for querying the over 200 SPARQL repositories. The Balloon project aims to create a Meta Web of Data focusing on structural information by crawling co-reference relationships in all registered and reachable Linked Data SPARQL endpoints. The current Linked Open Data cloud, although huge in size, offers poor service quality and is inadequately maintained, thus complicating access via SPARQL endpoints. This issue needs to be resolved before the Linked Open Data cloud can achieve its full potential.

Publishing Greek Census Data as Linked Open Data

by Irene Petrou and George Papastefanatos

Linked Open Data technology is an emerging way of making structured data available on the Web. This project aims to develop a generic methodology for publishing statistical datasets, mainly stored in tabular formats (e.g., csv and excel files) and relational databases, as LOD. We build statistical vocabularies and LOD storage technologies on top of existing publishing tools to ease the process of publishing these data. Our efforts focus on census data collected during Greece’s 2011 Census Survey and provided by the Hellenic Statistical Authority. We develop a platform through which the Greek Census Data are converted, interlinked and published.

Linked Open Vocabularies

by Pierre-Yves Vandenbussche and Bernard Vatant

The “Web of Data” has recently undergone rapid growth with the publication of large datasets – often as Linked Data - by public institutions around the world. One of the major barriers to the deployment of Linked Data is the difficulty data publishers have in determining which vocabularies to use to describe the semantics of data. The Linked Open Vocabularies (LOV) initiative stands as an innovative observatory for the re-usable linked vocabularies ecosystem. The initiative goes beyond collecting and highlighting vocabulary metadata. It now plays a major social role in promoting good practice and improving overall ecosystem quality.

Linking Historical Entities to the Linked Open Data Cloud

by Maarten Marx

We investigate the coverage of Wikipedia for historical public figures. Unsurprisingly, the probability of a figure having a Wikipedia entry declines with time since the person was active. Nevertheless, two thirds of the Dutch members of parliament that have been active in the last 140 years have a Wikipedia page. The need to link historical figures to existing knowledge bases like Wikipedia/DBpedia comes from current large scale efforts to digitize primary data sources, including proceedings of parliament and historical newspapers. Linking entries to knowledge bases can provide values of key background variables, such as gender, age, and (party) affiliation.

Benchmarking Linked Open Data Management Systems

by Renzo Angles, Minh-Duc Pham and Peter Boncz

With inherent support for storing and analysing highly interconnected data, graph and RDF databases appear as natural solutions for developing Linked Open Data applications. However, current benchmarks for these database technologies do not fully attain the desirable characteristics in industrial-strength benchmarks [1] (e.g. relevance, verifiability, etc.) and typically do not model scenarios characterized by complex queries over skewed and highly correlated data [2]. The Linked Data Benchmark Council (LDBC) is an EU FP7 ICT project that brings together a community of academic researchers and industry, whose main objective is the development of industrial-strength benchmarks for graph and RDF databases.

Making it Easier to Discover, Re-Use and Understand Search Engine Experimental Evaluation Data

by Nicola Ferro and Gianmaria Silvello

Experimental evaluation of search engines produces scientific data that are highly valuable from both a research and financial point of view. They need to be interpreted and exploited over a large time-frame, and a crucial goal is to ensure their curation and enrichment via inter-linking with relevant resources in order to harness their full potential. To this end, we exploit the LOD paradigm for increasing experimental data discoverability, understandability and re-usability.

Analysing RDF Data: A Realm of New Possibilities

by Alexandra Roatiș

The WaRG framework brings flexibility and semantics to data warehousing. The development of Semantic Web data represented within W3C’s Resource Description Framework [1] (RDF), and the associated standardization of the SPARQL query language now at v1.1 has lead to the emergence of many systems storing, querying, and updating RDF. However, as more and more RDF datasets (graphs) are made available, in particular Linked Open Data, application requirements also evolve.

The Web Science Observatory - The Challenges of Analytics over Distributed Linked Data Infrastructures

by Wendy Hall, Thanassis Tiropanis, Ramine Tinati, Xin Wang, Markus Luczak-Rösch and Elena Simperl

Linked data technologies provide advantages in terms of interoperability and integration, which, in certain cases, come at the cost of performance. The Web Observatory, a global Web Science research project, is providing a benchmark infrastructure to understand and address the challenges of analytics on distributed Linked Data infrastructures.

SPARQL: A Gateway to Open Data on the Web?

by Pierre-Yves Vandenbussche, Aidan Hogan, Jürgen Umbrich and Carlos Buil Aranda

Hundreds of datasets on the Web can now be queried through public, freely-available SPARQL services. These datasets contain billions of facts spanning a plethora of diverse topics hosted by a variety of publishers, including some household names, such as the UK and US governments, the BBC and the Nobel Prize Committee. A Web client using SPARQL could, for example, query about the winners of Nobel Prizes from Iceland, or about national electric power consumption per capita in Taiwan, or about homologs found in eukaryotic genomes, or about Pokémon that are particularly susceptible to water attacks. But are these novel SPARQL services ready for use in mainstream Web applications? We investigate further.

CODE Query Wizard and Vis Wizard: Supporting Exploration and Analysis of Linked Data

by Patrick Hoefler and Belgin Mutlu

Although the concept of Linked Data has been increasing in popularity, easy-to-use interfaces to access and make sense of the actual data are still few and far between. The CODE project's Query Wizard and Vis Wizard aim to fill this gap.

AV-Portal - The German National Library of Science and Technology's Semantic Video Portal

by Harald Sack and Margret Plank

In addition to specialized literature, 3D objects and research data, the German National Library of Science and Technology also collects scientific video clips, which will now be opened up via semantic media analysis, and published on the web with the help of Linked Open Data.

Browsing and Traversing Linked Data with LODmilla

by András Micsik, Sándor Turbucz and Zoltán Tóth

There are a range of problems associated with current Linked Data visualization tools, including lack of genericity and reliance on non-standard dataset endpoint features. These problems hinder the emergence of generic Linked Data browsers and can thus complicate the process of accessing Linked Data. With LODmilla we aim to overcome common problems of Linked Open data (LOD) browsing and to establish an extensible base platform for further evolution of Linked Data browsers.

Diachronic Linked Data: Capturing the Evolution of Structured Interrelated Information on the Web

by George Papastefanatos and Yannis Stavrakas

The recent development of Linked Open Data technologies has enabled large scale exploitation of previously isolated, public, scientific or enterprise data silos. Given its wide availability and value, a fundamental issue arises regarding the long-term accessibility of these knowledge bases; how do we record their evolution and how do we preserve them for future use? Until now, traditional preservation techniques keep information in fixed data sets, “pickled” and “locked away” for future use. Given the complexity, the interlinking and the dynamic nature of current data, especially Linked Open Data, radically new methods are needed.

Supporting the Data Lifecycle at a Global Publisher using the Linked Data Stack

by Christian Dirschl, Katja Eck and Jens Lehmann

The Linked Data Stack is an integrated distribution of aligned tools that support the whole lifecycle of Linked Data from extraction, authoring/creation via enrichment, interlinking and fusing through to maintenance. A global publishing company represents an ideal recent real-world usage scenario, illustrating the Linked Data Stack and the underlying lifecycle of Linked Data (including data-flows and usage scenarios).

MonetDB/RDF: Discovering and Exploiting the Emergent Schema of RDF Data

by Minh-Duc Pham and Peter Boncz

The Resource Description Framework (RDF) has been used as the main data model for the semantic web and Linked Open Data, providing great flexibility for users to represent and evolve data without need for a prior schema. This flexibility, however, poses challenges in implementing efficient RDF stores. It i) leads to query plan with many self-joins in triple tables, ii) blocks the use of advanced relational physical storage optimization such as clustered indexes and data partitioning, and iii) the lack of a schema sometimes makes it problematic for users to comprehend the data and formulate queries [1]. In the Database Architecture group at CWI, Amsterdam, we tackle these RDF data management problems by automatically recovering the structure present in RDF data, leveraging this structure both internally inside the database systems (in storage, optimization, and execution), and externally as an emergent schema towards the users who pose queries.

A SOLID Architecture to Weather the Storm of Real-Time Linked Data

by Miguel A. Martínez-Prieto, Carlos E. Cuesta, Javier D. Fernández and Mario Arias

Linked Open Data has increased the availability of semantic data, including huge flows of real-time information from many sources. Processing systems must be able to cope with such incoming data, while simultaneously providing efficient access to a live data store including both this growing information and pre-existing data. The SOLID architecture has been designed to handle such workflows, managing big semantic data in real-time.

Ontology-based Integration of Heterogeneous and Distributed Information of the Marine Domain

by Yannis Tzitzikas, Carlo Allocca, Chryssoula Bekiari, Yannis Marketakis, Pavlos Fafalios and Nikos Minadakis

The European iMarine project has defined a core ontology for publishing marine data, which is suitable for setting up warehouses that can answer complex queries.