Thursday, November 15, 2007

Sesame Native Store

I'm very impressed at the moment with OpenRDF's native store as others have been in the past. One of the best things is how easy it was to work into the existing JRDF code.

As I've said before I've been searching for an on disk solution for loading and simple processing of RDF/XML. In the experiments I've been doing OpenRDF's btree index is much faster than any other solution (again not unexpected based on previous tests). The nodepool/string pool or ValueStore though is a bit slower than both Bdb and Db4o.

Loading 100,000 triples on my MacBook Pro 2GHz takes 37 secs with pure Sesame, 27 with the Sesame index and Db4o value store, 35 with Bdb value store and ehCache is still going (> 5 minutes). A million takes around 5 minutes with Sesame index and Db4o nodepool (about 3,400 triples/second) and 3 minutes with a Sesame index and memory nodepool (about 5500 triples/second).

There's lots of cleanup to go and there's no caching or anything clever going on at the moment, as I'm trying to hit deadlines. 0.5.2 is going to be a lot faster than 0.5.1 for this stuff.

Update: I've done some testing on some fairly low-end servers (PowerEdge SC440, Xeon 1.86GHz, 2GB RAM) and the results are quite impressive. With 100,000 triples averaging around 11,000 triples/second and 10 million averaging 9,451 triples/second.

Update 2: JRDF 0.5.2 is out. This is a fairly minor release for end user functionality but meets the desired goal of creating, reading and writing lots of RDF/XML quickly. Just to give some more figures: Bdb/Sesame/db4o (SortedDiskJRDFFactory) is 30% faster for adds and 10% slower for writing out RDF/XML than Bdb/Sesame (SortedBdbJRDFFactory). Both have roughly the same performance for finds. I removed ehcache as it was too slow compared to the other approaches.

3 comments:

Kingsley Idehen said...

Andrew,

Would you be interested in testing the new Virtuoso Storage Provider for Sesame? Ideally, I would like you to run it through the shootout and publish the results.

What do you think? I would certainly like to know if there is a faster RDF store than Virtuoso out there, especially via independent verification :-)

I am easy to find:
Personal URI: http://kidehen.idehen.net/dataspace/person/kidehen#this
Blog URI: http://www.openlinksw.com/blog/~kidehen

Unknown said...

Thank you for your kinds words. There's still quite a bit of room for improvements in the native store, which I hope to work on after the Sesame 2.0 release. Which Sesame version did you use, by the way?

Arjohn Kampman -- openrdf project lead

Andrew said...

We've been using OpenLink as well for some of the tests. It'd be great test it out. It'd be good to try it with a Java API.

Arjohn, I used RC1 of Sesame.