Saturday, December 24, 2005

Know When to Hold Them, Know When to Fold Them

So what is an RDF merge and when should you apply it in SPARQL?

Very succinctly, in "The Semantics of SPARQL" it says: "The RDF merge  U+ <G1...Gn> of a sequence of graphs <G1...Gn> (i.e., a dataset) is the ordered merge union of the graphs, where repeated bnodes are substituted with fresh ones, by keeping the names of the bnodes coming first in the sequence order."

In "SPARQL Query Language for RDF" it gives a simple example:

Graph 1:
_:a foaf:name "Bob" .
_:a foaf:mbox .

Graph 2:
_:a foaf:name "Alice" .
_:a foaf:mbox .

The result of the merge, upon which queries are made:
_:x foaf:name "Bob" .
_:x foaf:mbox .

_:y foaf:name "Alice" .
_:y foaf:mbox .

Section 9 details querying multiple graphs in SPARQL, including a new dataset where the default graph is a merge of the graphs in the FROM clause.

In summary, when SPARQL operations are performed across graphs you get new blank nodes which prevents, for example, being able to JOIN across graphs using them.

What is generally required by RDF applications is something like smushing. For example, an "...RDF spider (often known as a "scutter") can gather up FOAF files and "smush" them together into a single model that unifies the individual pieces of information into a network." (from "A Semantic Web Shoebox - Annotating Photos with RSS and RDF").

To actually achieve smushing, Leo has an example algorithm or it might be appropriate to adapt RDF graph isomorphism algorithms.

No comments: