More News: Different Strokes

Thursday, November 30, 2006

Different Strokes

I feel I should make a correction to comments I made in regards to the proposed SPARQL algebra.

In that post and in my thesis I've said that the definitions of OPTIONAL by Pérez et al (and used by Andy Seaborne) are compatible with the relational definitions I used. I'm now about as certain as I can be that that's not the case.

The definition in Pérez et al is for set difference is (in ASCII):
O1 \ O2 = {u <- O1 | for all u' <- O2, u and u' are not compatible}.

The key here is the definition of compatibility, I didn't find it clear in the original paper (my fault and I'll explain a bit more about this later) but it's explicit in "Semantics of SPARQL" where it says, "Two mappings with disjoint domains are always compatible, and the empty mapping u0 is compatible with any other mapping. Intuitively, u1 and u2 are compatibles if u1 can be extended with u2 to obtain a new mapping, and vice versa."

A relational join is defined as the set union of the headings and matching values. The heading consists of the attributes (X, Y, Z), where X is the attributes of the left hand side relation, Y contains the attributes that match the left and right hand side relations and Z are the attributes of the right hand side relation. The body consists of tuples with the values matching X, Y, and Z (Date writes it as { X x, Y y, Z z}). I think that this is compatible with the Pérez et al definition.

But it's different for difference. The difference operator in relational algebra requires the relations to be the same type to be removed (this is the definition I've used). I'll use "\" to represent SPARQL's set difference and "-" to represent the relational difference.

For example:
{{ (?x = 2, ?y = 3) }} \ {{ (?y = 3) }} is {} in SPARQL
{{ (?x = 2, ?y = 3) }} - {{ (?y = 3) }} is {{ (?x = 2, ?y = 3) }} in the modified Galindo-Legaria relational algebra that JRDF uses.

The reason it is unchanged in relational algebra is because the definition of equality requires them to be the same type (the same attributes). So a binding with one value can't be equal to a binding with two values. The Pérez et al definition is a much looser definition and many more matches are made.

As an aside, I was fairly sure that set difference was a requirement for OWL inferencing. I might be wrong here too. If it is the case, though, I'd think it'd be nice (an efficienct use of operations) if SPARQL operations could be reused for OWL ones. This would allow the difference operator by itself to be expressed in SPARQL too.

So I think I understand the issue now. At the moment I'm trying to work out how JRDF gets what seems to be the right results because it doesn't really look like it should. The other thing that was in my paper was a mapping to SQL and I'm not sure how a SQL based SPARQL implementation would work either (based on the definitions I used). Lots of questions anyway.

I am trying to work out whether this definition of compatibility is a good one. At the moment I don't like it or dislike it because I don't understand yet.

The trigger was the removal of the term antijoin from the SPARQL algebra - that is a good idea because I understood it to mean it was compatible with relational algebra (using the same terms I guess). Up until that point I'd assumed difference in SPARQL was compatible with difference in relational algebra.

Update: Paul mentions in the comments, that the SPARQL difference operator is the one needed for OWL. Another win for that definition then too.

It's apparent that the relational model really isn't appropriate to model SPARQL in. I don't think there's a single definition of an operation in relational algebra that hasn't been changed.

Update 2: I've since come to realize that joins are not compatible either. This is due to the suggestion that you can join on NULL values. This contradicts relational algebra (taking either the Date or Codd approach on NULL) and SQL joins (not that I think that's worth much but it's easy to demonstrate).

Update 3: Last update for this blog I hope. While relational difference is not equivalent, antijoin is equivalent. Which is what I was told when antijoin was renamed diff in first place. More information, "The Difference is Antijoin".

4 comments:

Anonymous said...: I'm a little confused at your point here (and don't have time to read your original sources). You talk about the definition for "set difference" in Pérez et al, but then go on to say that, "the Pérez et al definition is much looser and isn't compatible with set difference".

At a casual reading, I'd guess that by "set difference", the relevant example you propose is the non-SPARQL one:
{{ (?x = 2, ?y = 3) }} - {{ (?y = 3) }} is {{ (?x = 2, ?y = 3) }} in relational algebra.

Right?

Of course, the difference operation in SPARQL (the \) is the one that is needed for OWL. (This is what Mulgara implements in the MINUS operator).; 8:09 am
Andrew said...: So the Perez et al is not compatible with set difference. I've removed that because I found it confusing when I re-read it too.

You maybe right that the SPARQL operation matches the OWL one. That would be good.

Like I said, I've made a mistake but I'm not quite sure of the implications except the obvious one that JRDF and my thesis are complete bollocks.; 8:59 am
Andrew said...: I don't think it's substantial but there maybe something that prevents it being wrong and that is you can't actually get {{ (?x = 2, ?y = 3) }} - {{ (?y = 3) }} in JRDF from a SPARQL query. You'd get something like:
{ {(?x = 2, pred1 = 2, obj1 = 2, ?y = 3, pred2 = 3, obj2 = 3) }} - {{ (?y = 3, pred2 = 3, obj2 = 3) }}

But that still leads to the same result, AFAICT.; 9:04 am
Andrew said...: Paul, I'm also confused because both you and Andrae have written that the MINUS operator in Mulgara is the relational difference operator.

See: http://mulgara.org/pipermail/mulgara-dev/2006-October/000125.html

"* Relational-Difference - the MINUS keyword. Some of the rules in
RDFS are expressed in terms of excluding certain classes of term, and the alternatives we faced were either an O(n) filter - or a O(log n)difference."; 9:33 am