Saturday, January 19, 2008

Who Killed Functional Programming?

MapReduce: A major step backwards. Two good responses, "The Great MapReduce Debate" and "Relational Database Experts Jump The MapReduce Shark":
MapReduce is not a data storage or management system — it’s an algorithmic technique for the distributed processing of large amounts of data.

MapReduce has the same relationship to RDBMSs as my motorcycle has to a snowplow — it’s a step backwards in snowplow technology if you look at it that way.

I don’t think the authors understand distributed processing and distributed file systems when they think reduce must rely on FTP. The Wikipedia article says “each [reduce] node is expected to report back periodically with completed work and status updates,” pretty much the opposite of the “pull” DeWitt and Stonebraker criticize.


It's a fairly specious argument that more features means better and the comparisons are not equivalent - it would be better compare BigTable or HBase with databases not MapReduce - and that'd be silly because these guys are column database proponents which is (roughly) the same as HBase/BigTable (they do say it lacks views but the idea used by Google is copying rather mutating). An example of using MapReduce to do indexing, is Nutch. The distributed filesystem is important and it should be clear by now, you'd hope, that storing the data is not the problem, processing it is.

These types of comparisons, "X does something Y doesn't therefore X is better", reminds me of "Who Killed the Electric Car?". One of the reasons the EV1 was said to have failed was because "lacking an engine, it saves the driver the cost of replacement parts, motor oil, filters, and spark plugs" (and more) - all those moving parts, that had to maintained were gone. How can you change gears if you don't have a gearbox?

Does that remind you of a database or other piece of software you use? Isn't the software industry just like the car industry? There appears to be market forces ensuring that things stay complicated and expensive to maintain.

Meanwhile, Google is getting on with it by providing services for terabytes of data (via Wired), "Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features."

Update: It's been noted that bet on cheap and rickety if it's 10 times better (also mentions hypertable) and that they're comments are not even wrong.

Update 2: Part of my rant was going to be the whole Apple designs things with the fewest stuff in them - that's what makes them the leader what they leave out. It's the same gist that I read at "Heavier than Air".

No comments: