Bridging the Clickable and Semantic Webs with RDFa

by Ben Adida

RDFa, a W3C proposal about to enter Last Call, will help bridge the clickable and semantic Webs. Publishers of current HTML Web pages can augment their output with interoperable, structured data. Web users can extract and process this data straight from its pleasant visual rendering within their browsers. RDFa will enable semantic Web data to progressively emerge from the existing, visual Web.

Semantic Web technologies, and in general the components of the 'data Web', are maturing at a rapid pace. A number of commercial products support the design of RDF (Resource Description Framework) schemas and the publication of RDF datasets. Key semantic-Web specifications, SPARQL (SPARQL Protocol and RDF Query Language) and GRDDL (Gleaning Resource Descriptions from Dialects of Languages), have recently reached Recommendation status with the W3C. One of the important questions remaining is how to connect the Semantic Web to the existing 'clickable Web', the Web just about everyone knows and loves. RDFa, a W3C proposal on the verge of entering Last Call, is one promising answer, a potential bridge between the clickable and semantic Webs. HTML already contains some constructs to provide the beginnings of data structure, for example with the rel attribute. Publishers are anxious to achieve this kind of functionality: Google recommends the use of this rel attribute with the keyword 'nofollow' to indicate that search engines should 'not follow' this link in their search relevance algorithms, eg when a publisher attempts to increase their URL's standing by posting it in multiple blog comments.

With RDFa, an HTML author can augment presentational markup with RDF structure. This begins very simply, by providing extra information about clickable links using the existing HTML rel attribute. In the following example, we declare a copyright license for the current document:

This document is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/"> Creative Commons license</a>

Note how the clickable link's href is used both for rendering purposes and for machine readability. This idea, often called DRY (Don't Repeat Yourself), is a key principle of the RDFa design.

A Web publisher may want to make statements about more than the current page, of course, and RDFa fully supports this. An image may be Creative-Commons licensed as follows:

<div about="photo1.jpg"> This photo is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/"> Creative Commons license </a> </div>

With RDFa, an author is not limited to the existing reserved keywords in the baseline HTML vocabulary (next, prev, license, etc.). It is trivial to import the Dublin Core vocabulary, for instance, using the xmlns declaration mechanism. Once this is done, an author can use the vocabulary in her document. The following example does just that, using, this time, the property RDFa attribute to indicate a literal field, rather than a URL:

This document was created by: <span property="dc:creator"> Ben Adida </span>.

RDFa supports most of the power of RDF, including the ability to create more complex graphs of data. For example, if we want to provide more information about the creator of this page, we can introduce a blank node to which we attach a number of additional data fields. The rel attribute is used without a corresponding href, which indicates the creation of a blank node, and the contained RDFa statements are automatically applied to this blank node.

This document was created by:
<div rel="dc:creator"> <span property="foaf:name"> Ben Adida</a>, [<a rel="foaf:mbox" href="mailto:This email address is being protected from spambots. You need JavaScript enabled to view it."> email</a>] </span>.

In the above example, the creator of the current document is a an entity named Ben Adida, with email address This email address is being protected from spambots. You need JavaScript enabled to view it.. Additional fields, including a phone number, a physical address, an organizational affiliation, etc. can be added just as easily, and further data depth, such as the physical address of the author's organizational affiliation, can be provided.

A number of RDFa tools are emerging, with parsers in a number of programming languages including Python, Java, PHP, Ruby, and JavaScript [1]. Within the browser, the RDFa Buttons [2] can be used in any Web browser to highlight RDFa structured data and convert it to RDF/XML or RDF/N3. Developers can easily extend these code snippets to provide specific actions based on this data, eg preparing a blog post about a Creative-Commons resource using appropriate attribution name and URL. Even more interesting is the Operator Firefox Add-On [3], which notifies users of the presence of RDFa as they browse, with content-specific action buttons automatically enabled based on the data found on the page. For example, Operator might allow one-click adding of an event to a calendar, of a phone number to an address book, or of a song to a playlist.

More information about RDFa markup can be found in the RDFa Primer [4], and the RDFa community and ecosystem is reachable on the RDFa Information Web site [5].

Links:
[1] http://rdfa.info/rdfa-implementations/
[2] http://www.w3.org/2006/07/SWD/ RDFa/impl/js/
[3] https://addons.mozilla.org/firefox/ addon/4106
[4] http://www.w3.org/TR/xhtml-rdfa-primer/
[5] http://rdfa.info/

Please contact:
Ben Adida
Creative Commons
E-mail: benadida.net