Sunday, June 04, 2006

Dressing Up SPARQL

Pérez et al.: Semantics and Complexity of SPARQL points to a new paper "Semantics and Complexity of SPARQL". Which offers a formal definition of most of the operations available in SPARQL.

"...I feel that this paper is a great formal foundation for SPARQL. It presents clear and reasonable answers to all of the weird edge cases we had to pussyfoot around until now. It makes me confident that SPARQL can be formally analyzed and there are answers to the hard optimization problems I and other people face when implementing SPARQL (e.g. in D2RQ)...I knew there were edge cases where my approach didn’t match SPARQL, and I suspected that they wouldn’t occur much in real queries – but now I know for sure, and I know that there are no further edge cases lurking."

From the paper: "...under the depth-first approach some natural properties of widely used operators do not hold, which may confuse some users. For example, it is not always the case that Eval D ((P1 AND P2 )) = Eval D ((P2 AND P1 )), violating the commutativity of the conjunction and making the result to depend on the order of the query."

"A graph pattern P is well designed if for every occurrence of a sub-pattern P?= (P1 OPT P2) of P and for every variable ?X occurring in P, the following condition holds: if ?X occurs both in P2 and outside P?, then it also occurs in P1."

"...the assumption on predicates used for joining (outer joining) relations to be null-rejecting...[in SPARQL] those predicates are implicit in the variables that the graph patterns share and by the definition of compatible mappings they are never null-rejecting....queries are also enforced not to contain Cartesian products, situation that occurs often in SPARQL when joining graph patterns that do not share variables. Thus, specific techniques must be developed in the SPARQL context."

While Galindo-Legaria and Rosenthal do limit their reordering work to ones without Cartesian products and null-intolerant predicates, other such as "Using EELs, a Practical Approach to Outerjoin and Antijoin Reordering" have extended their work to include handling special cases of these.

From my first look, the whole thing does point to having to re-develop a lot of work. Reusing things like depth-first query evaluation seem pretty fundamental. I'll certainly be looking at this more.

No comments: