Wednesday, January 07, 2004

Followup on Relational RSS

RSS, old enough to be having relations? "Seb mentions several of the operators of Codd's relational algebra, and, it seems to me there are two general reasons why everyone isn't already operating on RSS as relational data: 1) it is distributed across many files, and 2) the hierarchic XML structue of RSS."

"The main issue I am dealing with now is what types of data structures and formats work best with the various combinations of uses between data interchange, data storage, and querying."

Okay, now I'm convinced that this really is replicating RDF and I would have to encourage anyone considering this to pick up an open source RDF library (like Jena or Redland) and use it to perform these operations on RDF based RSS.

The problems that are highlighted are the same ones that various implementations of RDF have had to solve. A problem with serializing a graph (relational data) in XML - that's RDF/XML and it's use of striping. Distributed across many files and being able to search it - that's usually a problem for RDF data stores (like Kowari or other freely available ones).

For example, to get all the documents (blog entries, etc.) authored by Sam Ruby (this is from a previous post describing iTQL): "select $creator subquery( select $type from <rss_schemas> where $type <http://www.w3.org/2002/07/owl#sameAs> <http://purl.org/dc/elements/1.1/creator> ) from <rss_feeds> where $creator $type 'rubys@intertwingly.net';"

Where "<rss_feeds>" can be any number of URIs combined with logical operators.

Of course, you'll need to convert some feeds from XML to RDF. While I often link to RDFT, the more usual ways include XSLT and programmatically using an RDF API and an XML library. One of the quickest ways I've found is using a combination of Jena and Jakarta Apache's XML Commons Digester.

Also related, Base data: relational, RDF, XML.

No comments: