Sunday, December 03, 2006

SPARQL Favours the Brave

SPARQL RULES! "Now, as opposed to [17], we define three notions of compatibility between substitutions:
  • Two substitutions O1 and O2 are bravely compatible (b-compatible) when for all x <- dom(O1) Intersection dom(O2) either xO1 = null or xO2 = null or xO1 = xO2 holds. i.e., when O1 Union O2 is a substitution over dom(O1) Union dom(O2).
  • Two substitutions O1 and O2 are cautiously compatible (c-compatible) when they are b-compatible and for all x <- dom(O1) Intersection dom(O2) it holds that xO1 = xO2.
  • Two substitutions O1 and O2 are strictly compatible (s-compatible) when they are c-compatible and for all x in dom(O1)Intersection dom(O2) it holds that x(O1 Union O2) /= null.
"

They make the conclusion that only c-joins operate correctly with idempotency for join and, "Following the definitions from the SPARQL specification, in fact, the b-joining semantics is the only admissible definition, which is why [17] does not consider null values at all. There are still advantages for gradually defining alternatives towards traditional relational algebra treatment. On the one hand, as we have seen in the examples above, the brave view on joining unbound variables might have partly surprising results, on the other hand, as we will see, the c- and s-joining semantics allow for a more effective implementation in terms of Datalog rules."

I'm surprised at how badly I've understood the proposed SPARQL algebra, I didn't think that joining on nulls was being proposed (see Example 2.4 on page 9 of the PDF). Again, this conflicts with relational algebra and JRDF's implementation.

A join in relational algebra does match the suggested s-compatible one, you can't join on NULLs (because they are considered unbound values, they aren't "real", they don't exist and aren't equalable values). As shown in the paper, it wouldn't successfully join a null name and would return only the row containing "Alice".

In SQL's 3VL logic, "NULL literally means that the value is unknown or indeterminate. One side effect of the indeterminate nature of NULL value is it cannot be used in a calculation or a comparision." This means you can't join across NULLs in SQL or use them for aggregate functions except for COUNT. NULLs are also handled differently across SQL implementations. One example that I've come across before is the handling of strings and NULL values between DB2 and Oracle.

It reminds me how much I hate IEEE citations (I think that's the standard) where you use [17] for the reference to Perez in this paper and it's [22] in another. What's wrong with [Perez2006] or something?

Anyway, they also suggest: a set different or minus operator (which I've suggested SPARQL would be incomplete without even though you can do it with FILTER and iTQL has had it for a while before that), nested queries using ASK and using SPARQL as a rules language.

This will probably be my last post on SPARQL for a while - I'm a bit sick of continually, publicly displaying my ignorance (which I guess has occurred over the last year). I could change JRDF to fit any of the proposed implementations (and every other permutation) but I'm not sure it makes sense. I feel very far away from understanding what's going on and I doubt I will (or should) be commenting about it too much in the future.

Update: I seem to have confused s and c semantics. I've updated the paragraph related to relational algebra - it does match s-compatibility.

No comments: