Wednesday, November 12, 2008

Indexing for Efficient SPARQL

Another interesting way of indexing triples: A role-free approach to indexing large RDF data dets in secondary memory for efficient SPARQL evaluation "We propose a simple Three-way Triple Tree (TripleT) secondary-memory indexing technique to facilitate efficient SPARQL query evaluation on such data sets. The novelty of TripleT is that (1) the index is built over the atoms occurring in the data set, rather than at a coarser granularity, such as whole triples occurring in the data set; and (2) the atoms are indexed regardless of the roles (i.e., subjects, predicates, or objects) they play in the triples of the data set. We show through extensive empirical evaluation that TripleT exhibits multiple orders of magnitude improvement over the state of the art on RDF indexing, in terms of both storage and query processing costs."

While looking around at arXiv I did a quick search and found two more interesting papers that seems related to a previous discussion on how the Semantic Web needs it's own programming language or I would say at least a way to process the web of data, both by Marko A. Rodriguez: "The RDF Virtual Machine" and "A Distributed Process Infrastructure for a Distributed Data Structure".

1 comment:

Anonymous said...

On the topic of a better language for the Semantic Web, I agree one is needed and am planning one of my own.

An intermediate step is Adenine which was developed as part of Haystack.

Recently I extricated Adenine from Haystack/Hayloft and packaged it up with for the JSR-223 Java Scripting Engine API. You can try it out quite easily using IFCX Wings. The Adenine Tutorial converted to IFCX Wings is available from SVN.