Wednesday, April 04, 2007

What no MOOSE?

I promised myself I wasn't going to do this again, but I have one more, one more reason why the option of even having DISTINCT/LOOSE/CHOOSE in SPARQL is a bad idea. Part of this is stimulated because once more I'm sitting next to people trying to make the Semantic Web work and from my perspective SPARQL is letting them down.

It's not a new reason, it's one I wrote in 2004 which offers a pretty good reason why having this as an optional feature doesn't make sense for RDF:

"The other issue with the SPARQL is the lack of an implicit distinct. In my understanding of SQL, DISTINCT is optional because if your queries work on normalized data and joins are based on distinct keys then the returned results cannot be duplicated. If your query works on rows with repeated values on the same column then you apply DISTINCT.

In RDF's data model there isn't really this problem of duplicated data and normalization. SPARQL has the idea of matching statements in the graph. From my understanding, RDF's data model doesn't support the idea of multiple subject, predicates and/or objects with the same values.

In other words, it only seems valid that if a query matches one result in the graph it should return that one unique result not repeated multiple results."

This is on top of the other reasons I came up in "Bagging SPARQL". This could actually be seen as further discussion from the the initial response I got. Among other things, it was said that duplicates could arise by querying multiple graphs. I'd argue that forced distinct values provide the context to effectively count (or perform other aggregate values) across these multiple graphs.

It's three years on and they couldn't even allows users to declaratively count the number of statements in this mystical, future web of data.

1 comment:

Andrew said...

Relational projection - no there are no duplicates. SELECT - yes there is.

As for UNION, the UNION of two sets is still a set, so again no duplicates. Again, SQL UNION is bunk.

SQL has all of these incosistencies like AVG vs SUM or UNION vs JOIN which makes no sense on one based on the relational model let alone one based on RDF.