Thursday, November 15, 2007

Sesame Native Store

I'm very impressed at the moment with OpenRDF's native store as others have been in the past. One of the best things is how easy it was to work into the existing JRDF code.

As I've said before I've been searching for an on disk solution for loading and simple processing of RDF/XML. In the experiments I've been doing OpenRDF's btree index is much faster than any other solution (again not unexpected based on previous tests). The nodepool/string pool or ValueStore though is a bit slower than both Bdb and Db4o.

Loading 100,000 triples on my MacBook Pro 2GHz takes 37 secs with pure Sesame, 27 with the Sesame index and Db4o value store, 35 with Bdb value store and ehCache is still going (> 5 minutes). A million takes around 5 minutes with Sesame index and Db4o nodepool (about 3,400 triples/second) and 3 minutes with a Sesame index and memory nodepool (about 5500 triples/second).

There's lots of cleanup to go and there's no caching or anything clever going on at the moment, as I'm trying to hit deadlines. 0.5.2 is going to be a lot faster than 0.5.1 for this stuff.

Update: I've done some testing on some fairly low-end servers (PowerEdge SC440, Xeon 1.86GHz, 2GB RAM) and the results are quite impressive. With 100,000 triples averaging around 11,000 triples/second and 10 million averaging 9,451 triples/second.

Update 2: JRDF 0.5.2 is out. This is a fairly minor release for end user functionality but meets the desired goal of creating, reading and writing lots of RDF/XML quickly. Just to give some more figures: Bdb/Sesame/db4o (SortedDiskJRDFFactory) is 30% faster for adds and 10% slower for writing out RDF/XML than Bdb/Sesame (SortedBdbJRDFFactory). Both have roughly the same performance for finds. I removed ehcache as it was too slow compared to the other approaches.
Post a Comment