Monday, December 13, 2010

What's a Dual?

Erik Meijer has been giving a talk at YOW (in Melbourne) and San Francisco at QCON about Key Value Stores being the dual of SQL.

The obvious question is what is a dual? And like most really good questions you already know the answer. Set operations AND and OR are dual. Which means that if A OR F = A is true, by replacing AND with OR and T with F (or as Erik said switching the arrows), you get A AND T = A. Likewise, A OR T = T becomes A AND F = F.

This is the bit where Maths almost becomes magical. Why should these properties hold? Why doesn't this work for multiplication and addition?

In his talk, Erik quoted Benjamin Black, that the noSQL movement took away the relational model and gave nothing back. His talk also seems to be about the beginnings of a noSQL/coSQL algebra and a project to write a noSQL database within Microsoft (noSQL Server perhaps?). He's not the only one thinking about these kinds of things obviously.

He also mentioned, closure as an important property of numbers and other things. That is you plug 1 + 1 in, you get 2 and you can plug 2 into + 1 and get 3 and so on - the answer of one operation can be used for the next. You can't do that with SQL or SPARQL, which in my mind is a big fail. Erik agrees - and suggested that noSQL (or coSQL) should support this property. He's not the only one to think that noSQL should support the Unix philosophy (which is not just closure but is an important part of it). And he's certainly not going to be the last.

He described these properties in the same way as I discovered them in RDF, which was through graphs. The object model in LINQ has a graph of objects - objects which don't have a globally identifiable name - much like blank nodes. The cool thing about LINQ, well one of the many cool things, is that it supports closure - you query your object graph, you get back another object graph and can continue to plug away at it as you narrow your query. This certainly makes a lot of sense from a user querying data perspective (among others) - where you start with all students, you ask for all male students, then you ask for all male students with a grade point average about 5 or whatever. Narrower and narrower is one typical way people find the answer they're looking for.

The other thing he mentions is that SQL is a nightmare due to normalization, the complexity of queries, null semantics and computing over them and the difference between objects and tables. Erik showed that objects and relations are dual.

This is the bit where I introduce monads and say it's like pizza delivery or something. Screw that. It's just good in that if your operations follow laws then they can be mapped to other things and inherit any work done to make them good - like optimisation or scaling. In Erik's talk that meant finding out cool things like noSQL being dual to SQL and LINQ can be done as MapReduce (see this slide of Erik's). This works because LINQ operations are associative and MapReduce operations are also associative. It's not just a good idea it's a monadic law. If you can, read 10 pages on category theory (I'm only up to page 8 though).

Obviously, not everything is peaches. The noSQL/coSQL people say don't do joins - they don't scale - you can't self join on the web for example. In a MapReduce/noSQL system you either end up querying multiple times or you end up processing the data again. I'd suggest that someone will write a library that does multiple queries and joins over noSQL databases so that you don't have to do it yourself - maybe locally and in memory at first, and then with disk based structures for scalability later - I guess this is where LINQ would fit in.

There were lots of dualing properties in the relational vs noSQL/coSQL talk. I only remember a few SQL vs noSQL/coSQL: closed world vs open world, identity by value vs identity by reference (or properties), and (I might be wrong) transactional vs non-transactional.

So with our arrows in hand, ignoring how cool the world will be without JOINs and UNIONs and our mighty knowledge of set theory, what would JOIN and UNION look like if they were a dual and if such a thing existed as 0 and U for relations. U JOIN R = R becomes 0 UNION R = R and likewise U UNION R = U becomes 0 JOIN R = 0 (which matches my existing prejudices).

Just to put the cherry on this pig of a blog post (to mix metaphors), objects and functions look like duals too (given that people claim there's no meaning behind what an object is and what a dual is I feel safe anyway):

But from another perspective, the apply "method" of a closure can be used as a low-level method dispatch mechanism, so closures can be, and are, used to implement very effective objects with multiple methods...With closures seen as a building block with which to implement objects, it's clear that objects are a poor man's closures...If your language only has these restricted closures, and you're forced to build an object system on top of them, it's clear that closures are a poor man's objects.

Erik via Eugene Wigner said there's an unreasonable effectiveness of mathematics and I have to agree. Monads, duals, properties for LINQ, noSQL, relational databases, SPARQL, and so on. Very unreasonable and very cool.

Post a Comment