Workshop on “Global Scientific Data Infrastructures: The Findability Challenge”

by Costantino Thanos

It is well-known that the scientific world is creating an unimaginably vast amount of rapidly increasing digital data. Among the members of the academic research community, there is growing consensus that e-science practices should be congruent with open-science and that scientific data should be freely accessible. However, in a networked open-science world, a big challenge faced by researchers is findability. Findability means the ease with which data/information/knowledge and tools/services for specific purposes can be discovered, and takes into account relevant aspects of the attributes, context and provenance of the data, the functionality and deployability of the tools and services, and profiles and goals of the searcher, etc. On the contrary, the current Internet search paradigm is characterized by a lack of context, with search being conducted independently of data provenance, professional profiles, and work goals. Enabling findability is thus of paramount importance for the next generation of global research data infrastructures.

A Workshop, held in May 2012 in Taormina, Italy, and organized by ISTI-CNR, aimed at investigating the findability challenge. The Workshop was organized around 14 invited talks offered by internationally recognized scientists working in the areas of databases, information retrieval, knowledge representation, and data infrastructures.

It was first suggested that the findability challenge could be considered as understanding how we can bring together information about a single subject (such as a topic or an entity) scattered across different sources (e.g., web sites, social sites, news feeds) in order to gain more complete and possibly more accurate knowledge about this process.

A number of topics were then examined in-depth. It was felt that one of the main emerging needs with big data applications is data exploration, i.e. examining big data sets searching for interesting patterns without a precise idea of what is being looked for. In this respect, adaptive query processing, ie, adaptive indexing, adaptive data loading, etc., was considered to be an appropriate technology that can help data management systems and scientists to avoid expensive actions until they are absolutely certain that these actions are going to pay off. The notion of mega-modeling, i.e. modeling a certain aspect/system, and then creating new services/models by combining existing ones in a principled way was also considered important. Some of the talks addressed the relevance of concepts such as semantics, modeling, and ontologies for findability. It was recognized that the design of efficient semantic query answering services remained a real challenge. Another findability challenge that was given considerable attention was scalability. When it comes to scalability, there are two sides of the coin: (a) scalability wrt data: ie, keeping up with the amount of online data, and (b) scalability wrt knowledge: the richer, more complete, more accurate the knowledge we seek the more difficult it is to acquire it. There was general agreement that it was important to be able to discover not just data but also data tools and services, ie, to enable the automated location of data tools and services that adequately fulfill a given research need.

Finally, the inadequacy of the current database technology to address the requirements of science was emphasized and some research directions for a more effective scientific data management in the context of a data intensive research were illustrated.

The scientific program of the Workshop was coordinated by Costantino Thanos (ISTI-CNR) and Yannis Ioannidis (Univ. of Athens). Further details and online versions of many of the invited talks can be found on the workshop website

More information:
http://datachallenges.isti.cnr.it/

{jcomments on}