Sunday, January 02, 2005

Agile Databases Again

Let them eat layer cake: flexibility versus clarity in data "Recently I've had some experience at the non-planetary scale of trading off between the extreme flexibility of technology like RDF versus a domain model that a person coming after could reasonably be expected to understand...It turns out that RDF is surprisingly cheap stuff to generate. The downside was that for purposes of communicating intent, ongoing maintenance and adding functionality against the collected data, RDF is not very pleasant to work with, at least not compared to SQL, Objects or XML. This is especially so at the presentation layer. It's also a different paradigm, and by using it you're technologically committed to yet another data model, directed graphs, along the usual suspects - objects, markup and relations. The cost of introducing a new model should not be underestimated. As a result RDF has been useful but not as cheap to manipulate as one would like."

Also, "This is another area where RDF falls down. Yes, there is Sparql and before that other SQL like languages, but again you're left iterating over raw RDF graph result sets, which is not always ideal."

As linked to previously, SPARQL results can be either a sub-graph or variable bindings.

"Arguably we could have used an RDF store such as Kowari, the in built persistence mappings of Jena, or even XQuery, along with the RDF interchange. The reality is there's only so much new technology you can apply in one go without taking on too much risk, especially in a short time frame, whereas we had a good idea of what we were getting into with a relational store."

Are triplestores good databases? " The Sparql language may not quite be finished but it’s certainly comparable to SQL...Overall this would suggest that RDF stores are potentially good DBs, on many points potentially much better than regular RDBMSs (or XML DBs) because of the more flexible model. But for this to be practicable it assumes the performance can be brought to a comparable level as RDBMSs, which if it hasn’t already been done would I think would only be a small matter of programming."

I've been fairly negative about SPARQL in the past because it's lack of counting and sorting and it also doesn't make sense to me to have DISTINCT - everything should be distinct, it's a graph.

There's also some comments about Kowari using Lucene for the database and not having transactions which is just wrong. Kowari uses NIO and AVL Trees for its store and has since the beginning. It's much faster than JDBM (which we looked at to store our blank node map and discarded because of speed).

No comments: