Thursday, July 29, 2004

Triple Store Bake-Off

Scalability Report on Triple Store Applications "Drawing conclusions about remotely accessible stores is more pertinent to our project requirements. In passing, it seems MySQL 3 performs the most quickly in general as a Jena store, and Kowari shows some great promise with its order of magnitude less time for configuration and its speed of loading data into the store.

Browsing and configuration times were the most pertinent figures to our future work. We don't believe the browsing times are really significant beyond the second granularity, so by that metric, it appears models with a performance between one and two seconds are potentially worth pursuing. All of our network models with caching appear to fall in that range, which is perhaps not a surprise since all of them implement caching in approximately the same fashion.

This leaves configuration time as the more interesting metric - how fast does a store return its results for creating the in-memory cache? For network models, the fastest were 3store and Sesame with files, though using files for the remote store is akin to using an in-memory model for our application, meaning it probably is not feasible for extremely large stores. So 3store and Sesame using MySQL 3 appear to be our best choices."

What would be nice to see is the data and queries being done. Some of the code is here.

There does seem to be some slight errors in the code, like creating a new ItqlInterpreterBean every time which effectively sets up a new RMI session. There are large differences in the testing, like the local "Load Page" is slower than over the network by two orders of magntitude, this may have to do with using the Jena API on top of Kowari.

The "configure" tests appears to be testing different things, because the variation in results including both the network and local tests is from 2ms to 200,000ms. The difference in Kowari local vs Kowari over the network is 2166ms vs 80304ms. Which shows the network version is slower but the only difference should be RMI and 78 seconds seems excessive even for RMI.

And the use of "In-Memory" should probably be "In JVM".

Something not shown in the graphs is the time taken to load the triples:
* Jena w/ Postgres - 971784 ms
* Jena w/ MySQL 4 - 844257 ms
* Jena w/ MySQL 3 - 667138 ms
* Kowari - 139092ms
* 3Store - 213088ms

Overall, it's pretty much what's expected, Kowari can achieve an order of magntitude improvement over SQL databases even over small datasets. Comparing Kowari against an SQL database with 5-10 million statements would show a greater margin of difference. Jena Fastpath and creating our own Model implementation should speed some of these results up.

No comments: