I am a newbie, trying to learn about RDF, RDFa and stuffs related to it since few days..
My question is, consider following HTML + RDFa code .. is it possible to extract the RDF part separately? if so could you please demonstrate simple code snippet (PHP or Java)..
i have heard Jena could be used, but couldn't find a tutorial which explains this. So if it is possible with Jena could anyone post some code snippet please..
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
version="XHTML+RDFa 1.0" xml:lang="en">
<head>
<title>John's Home Page</title>
<base href="http://example.org/john-d/" />
<meta property="dc:creator" content="Jonathan Doe" />
<link rel="foaf:primaryTopic" href="http://example.org/john-d/#me" />
</head>
<body about="http://example.org/john-d/#me">
<h1>John's Home Page</h1>
<p>My name is <span property="foaf:nick">John D</span> and I like
<a href="http://www.neubauten.org/" rel="foaf:interest"
xml:lang="de">Einstürzende Neubauten</a>.
</p>
<p>
My <span rel="foaf:interest" resource="urn:ISBN:0752820907">favorite
book is the inspiring <span about="urn:ISBN:0752820907"><cite
property="dc:title">Weaving the Web</cite> by
<span property="dc:creator">Tim Berners-Lee</span></span>
</span>
</p>
</body>
</html>
Yes, you can extract the RDF from the pages containing RDFa markup, and once extracted, you can put it into a local RDF triplestore if you want to do some stuff w/ that data alone, or you could insert it into a global triplestore and be able to query it alongside existing RDF data.
Here is a relevant discussion on Java RDFa parsers.
You can't separate the RDF from the HTML as the RDF is providing additional information about things in the HTML.
It would be like taking the footnotes and bibliography out of a book and throwing the book away: Mostly meaningless.
Have a look at Damian's java-rdfa. You can use it with Apache Jena, here is a snipped of code:
Class.forName("net.rootdev.javardfa.RDFaReader");
Model model = ...
model.read(url, "XHTML"); // xml parsing
model.read(other, "HTML"); // html parsing
Another option in Java is Apache Any23.
Parsing RDFa in PHP: https://github.com/njh/easyrdf/ (use 0.8 / master branch to have the RDFa parser)
Parsing RDFa in Java: http://semarglproject.org/