Tuesday, October 26, 2004

Scalable OWL Lite

An Evaluation of Knowledge Base Systems for Large OWL Datasets "In this paper, we present an evaluation of four knowledge base systems (KBS) with respect to use in large OWL applications."

"DLDB-OWL [23], is a repository for processing, storing, and querying large amounts of OWL data. Its major feature is the extension of a relational database system with description logic inference capabilities. Specifically, DLDBOWL uses Microsoft Access® as the DBMS and FaCT [16] as the OWL reasoner. It uses the reasoner to precompute subsumption and employs relational views to answer extensional queries based on the implicit hierarchy that is inferred."

"...we were surprised to see that Sesame-Memory could load up to 10 universities, and was able to do it in 5% of the time of the next fastest system. However, for 20 or more universities, Sesame-Memory also succumbed to memory limitations...The result reveals an apparent problem for Sesame-DB: it does not scale in data loading...As an example, it took over 300 times longer to load the 20-university data set than the 1-university data set, although the former set contains only about 25 times more instances than the later...Sesame is a forward-chaining reasoner, and in order to support statement deletions it uses a truth maintenance system to track all deductive dependencies between statements."

"From our analysis, of the systems tested: DLDB is the best for large data sets where an equal emphasis is placed on query response time and completeness."

Lehigh University Benchmark.

3 comments:

Danny said...

Nice find. Nearly choked when I saw Microsoft Access® though - somehow I might be tempted to use Kowari instead...

Andrew said...

While most people who have used Access think of Jet for the last few years Access has been able to use something called MSDE (Microsoft Data Engine). This is basically SQL Server with some restrictions:
http://techrepublic.com.com/5102-6313-5031981.html
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnacc2k/html/acmsdeop.asp

J1 said...

Interesting. A few comments on the Sesame scalability findings: obviously the main memory store is limited by the amount of RAM available. I expect the speed decrease is caused by the JVM being at the limit of its allocated heap, which causes not only lots of swapping, but also more frequent sweeps of the garbage collector. This can tremendously slow down the process.

As for Sesame-DB (the database backend of Sesame): IMHO the performance degradation is partly caused by the inferencer, and partly by (non)configuration of MySQL. As the database grows, larger buffers are needed. The default configuration of MySQL hits a treshold at about 10 million triples. Tweaking of mysql's config parameters can make a big difference.