Saturday, August 19, 2006


JRDF GUI 0.2 is available. It squashes a number of bugs and now correctly answers queries from the DAWG test cases like dawg-opt-query-002.

The overall issue with JRDF's SPARQL implementation was that it improperly handled the order specific nature of OPTIONAL (left to right). My original idea was to provide a query language that did not care about order. This meant I wasn't initially worried about parsing the grammar to ensure order specific query trees being produced. After talking to several people about it, it seems much more fruitful to provide a mapping to relational operations and to enable re-use of existing optimizations. Also, fixing order specific queries is a bigger task than I originally thought.

Another oversight, was that I originally considered OPTIONAL to be dyadic (accepting two relations to operate on) not nadic (1 or more relations) (this feature is demonstrated by dawg-opt-query-004). I started off with the intention of making operations nadic where possible but I only got around to implementing this for join.

In JRDF's relational algebra there is the concept of node types such as subject, predicate, object, uri, bnode, and literal. There are also composite nodes for positional types like: subject-object and subject-predicate-object. These were only created on project, which was unfortunate because these types are needed while performing different operations. If these are not available the wrong result is produced. These incorrect results made me think that the whole idea might be wrong. I reviewed what I'd done and I came across my initial idea of join compatibility and creating these composite nodes during joins but it was never implemented.

The current solution (a fairly inefficient hack) is to perform a project on every restrict which then creates relations with these composite nodes. This leads to a better solution than join compatilibility which should easy to add and more efficient. The general idea is to modify restrict to use composite nodes as headings based on how they are used in the query. But time is an issue and it's not important as far as the results of my thesis are concerned.
Post a Comment