Thursday, March 29, 2007

MPTStore

Presentation Summary on MPTStore. A summary of an interesting approach by the Fedora guys to storing lots of triples, fast.

The real motivation behind experimenting with a new triplestore, however, was the NSDL use case. The National Science Digital Library5 (NSDL) is a moderately large repository (4.7 million objects, 250 million triples) with a lot of write activity (driven by periodic OAI harvests; primarily mixed ingests and datastream modifications). The NSDL data model also includes existential/referential integrity constraints that must be enforced. Querying the RI to determine correct repository state proved to be difficult: Kowari is aggressively buffering triple, sometimes on the order of seconds, before writing them to disk. Flushing the buffer after every write is also computationally expensive (hence the drive to use buffers in the first place).


Based on this observation, their solution, called “Mapped Predicate Tables,” creates a table for every predicate in the triplestore. This has several advantages: a low computational cost for triple adds and deletes, queries for known predicates are fast, complex queries benefit from the relatively mature RDBMS planner having finer-granularity statistics and query plans, and flexible data partitioning to help address scalability. This solution comes with several disadvantages, however: one needs to manage predicate to table mapping, complex queries crossing many predicates require more effort to formulate, and with a naive approach simple unbound queries scale linearly with the number of predicates.


They achieved basically the same performance with either asynchronous or synchronous modification.

The project is available on Sourceforge, including slides and javadoc (which has a similar design to JRDF except no blank nodes).
Post a Comment