Saturday, December 29, 2007


I couldn't be more impressed. For the first time in about 25 years (I think the last time was H.E.R.O. for the Atari 2600) my mum has sat down (well stood up too) and played a computer game - Wii golf, bowling and tennis. While I remained victorious, it wasn't by much (especially golf). It reminded me of a description of the arrival of Pong by Nolan Bushnell where women would hustle men (induce to gamble over the outcome of a game) in bars.

Monday, December 24, 2007

Fat Controller

When does MapReduce make sense and what architecture is appropriate? I don't really know; wheras Tom has some ideas. I like the idea of MapReduce in the Google architecture (as cloned by Hadoop). I like the use of a distributed file system to evenly smear the data across the nodes in the cluster. However, these choices don't always seem appropriate.

The easiest and most obvious example where this causes a problem is that sometimes you want to ensure a single entry or exit in your system (when to start or stop). Another obvious example is where the overhead of network latency overwhelms your ability to parallelize the processing. More than that, it seems that if you can order things globally then the processing should be more efficient but it's unclear to me where the line is between that and the more distributed MapReduce processing (and whether adding that complexity is always worth it).

Google's third lecture (slides) answers some of the basic questions such as why use a distributed file system. It also lists decisions made such as read optimization, mutation handling (serializing writes and atomic appends), no caching (no need due to large file size), fewer but larger files (64MB chunks) and how the file handling is essentially garbage collection. They have implemented appends as it was a frequent operation. This is something that Hadoop has yet to do and can be an issue, especially for databases (which includes the requirements for appends and truncates attached to that issue).

There is obviously some adaption needed to alogirthms to run in a MapReduce cluster. Lecture 5 gives some examples of MapReduce algorithms. It includes a breadth first search of a graph and PageRank. Breadth first is chosen so there doesn't need to be any backtracking. For graph searching, they suggested creating an adjacency matrix - 1 indicating a link in the matrix and 0 indicating no link. To transfer it efficiently they use a sparse matrix (where you only record the links - very much like column databases of course). MapReduce is similar, with the processing split as a single row per page. For both of these process there is a non-MapReduce component. For example, in the PageRank algorithm a process exists to determine convergence of page values.

Sunday, December 23, 2007


Apple to tweak 'Stacks' in Mac OS X Leopard 10.5.2 Update Getting better - now all they have to do is fix Spaces and Java.

Friday, December 14, 2007

Timeless Software

Barry Boehm (having read it many times it's pronounced "beam"), "A View of 20th and 21st Century Software Engineering", lists the timeless qualities of a good programmer that have been discovered through the decades. Some of them include: don't neglect the sciences, avoid sequential processing, avoid cowboy programming, what's good for products is good for process, and adaptability trumps repeatability.

The powerpoint for the presentation is also online.

Update: I have notes from this talk but little time to review them. There's three things that stick out in my mind though.

The first is slide 5. This lists the most common causes of project failure. Most of these are not technical but centre around project management and people. I think it can be successfully argued that the other issues, mainly technical problems, are also people problems too, especially things like "lack of executive support".

Slide 9 has a Hegelian view of the progress made in writing software. That is, there is a process of advancement that involves theses, antitheses and syntheses. For example, software as a craft was an overreaction to software being considered analogous to hardware development and agile methods are a reaction (some may say overreaction) to the waterfall methodology.

Thirdly, Slide 30 has what initially looks like a good diagram to help decide how agile your project should be. I had issues with this. Much of my experience with agile projects is that they are actually better planned than most traditionally run projects. I also don't see how the criticality of the defects is appropriate either, as the number of defects is far fewer and less critical in well run agile projects too. Maybe I've never been on a well run planned project.

This made me wonder, why does software development still seem so far removed from science? Or to put in another way, why is it so dependent on personal experience? OO is quite frequently vilified but the reason people supported and continue to do so was because reuse increased and costs decreased - people were just following the numbers. Before OO, it was structured programming. And in contrast, what seemed like good ideas, such as proving the correctness of programs, have not been widely adopted. Barry points out that the reasons, since its inception in the 1970s, was because it did not prevent defects occurring in the specification and it was not scalable. This lead to the ideas of prototyping and RAD which were developed to overcome these perceived failures.

Scala Tipping the Scales

With the announcement of the Scala book and a couple of posts on Reddit: "The awesomeness of Scala is implicit" and "Why Scala?". This coincided with me reading up on what the Workingmouse people (like Tony) were up to with Scalaz.

I've only read Chapter 1 of the book but it's fairly compelling reading mentioning that it's OO, functional, high level, verifiable, etc. It also gives examples of how succinct Scala is compared to Java.

boolean nameHasUpperCase = false;
for (int i = 0; i < name.length(); ++i) {
if (Character.isUpperCase(name.charAt(i))) {
nameHasUpperCase = true;

Compared to:

val nameHasUpperCase = name.exists(_.isUpperCase)

This works for me much like the Ruby language and the books. You can understand where you came from and how it's better. It seems much more compelling than giving a sermon about how all Java users are fornicators of the devil and that they corrupt young children. It seems that most of the industry when selecting a computer language, to continue the religious theme, act like born-agains - jumping from one state to another, "I used to be a sinner managing my own memory, using pointer arithmetic and now I have seen the one true, pure language".

Update: There's also an interview with Bill Venners on Scala. He compares it with LINQ, why keeping static typing is important, how using the Option type removes nulls, that it is both more OO than Java (it removes static methods and primitives) while being fully functional, describes singleton objects and how it achieves the goal as a scalable language. One of the interesting points he makes is that strong types makes you solve problems in a particular way.

Wednesday, December 12, 2007

LINQed Data

I did a quick search for LINQ and RDF and there are two LINQ providers for .NET: