I've been spending some time looking at the querying part of JRDF. And it's quite bad. How bad? Well I've been profiling it and noticing that a lot of time was spent comparing attribute value pairs. An attribute in JRDF consists of a name (variable or position in a triple) and type (position in a triple or literal, URI Reference or blank node). Comparisons are done during most operations (like joins) and they are done on sorted attribute values. This is incredibly dumb. What's much better is to have a map of attributes to values. No sorting required and O(1) lookup - hurrah. The code around the comparisons also got a lot simpler and is obviously better. I think there's at least one other case of this at a different level and potentially room for about an order of magntitude speed up over the current release. Test queries are already 2-3 times faster.
The main reason for this though, is that currently the FILTER in JRDF runs about 10 times more slowly than a query using triple matching (this isn't a complexity measurement - it's based on a rather small set of triples). So a query with "?a <some:value> 'foo'" is much slower than "?a <some:value> ?b . FILTER(str(?b) = 'foo')". The queries aren't the same but their performance shouldn't be that much slower. However, in order to get to a stage of improving FILTER's performance the code has to be refactored - hopefully simpler and faster.
FILTER is nicely functionaly - it seems a shame to implement it in Java - it's eye poppingly bad at the moment. I was thinking functional Java but instead of taking the gateway drug I was think of just going to the hard stuff. FILTER is operated by creating different operations within a relation - which allows you to put ANDed FILTERs vertically across a relation (columns) and ORed FILTERs horizontally (rows). I don't know if anyone else implements it this way - it might be another bad idea over time.
No comments:
Post a Comment