More News: 08/01/2007

Sunday, August 26, 2007

JRDF 0.5.0

As mentioned this version of JRDF supports a Resource interface, datatype support, persistence (via Berkeley DB), and Java 6. It does run under Java 5 but the RDF/XML writer now uses the StAX API, so it requires Woodstox or some other JSR 173 implementation.

A few things in the works such as an Elmo like API, Globalized nodes, and MSG support.

Sourceforge download.

Better Chess

Higher Games On the 10 year anniversary of Deep Blue beating Kasparov, some thought provoking suggestions:

It is interesting in this regard to contemplate the suggestion made by Bobby Fischer, who has proposed to restore the game of chess to its intended rational purity by requiring that the major pieces be randomly placed in the back row at the start of each game (randomly, but in mirror image for black and white, with a white-square bishop and a black-square bishop, and the king between the rooks). Fischer Random Chess would render the mountain of memorized openings almost entirely obsolete, for humans and machines alike, since they would come into play much less than 1 percent of the time. The chess player would be thrown back onto fundamental principles; one would have to do more of the hard design work in real time.

Fischer Random Chess, or Chess 960 removes one of the reasons that I've always disliked chess (as I never really could apply myself to remember openings).

Why isn't it just as nice--or nicer--to think that we human beings might succeed in designing and building brainchildren that are even more wonderful than our biologically begotten children? The match between Kasparov and Deep Blue didn't settle any great metaphysical issue, but it certainly exposed the weakness in some widespread opinions. Many people still cling, white-knuckled, to a brittle vision of our minds as mysterious immaterial souls, or--just as romantic--as the products of brains composed of wonder tissue engaged in irreducible noncomputational (perhaps alchemical?) processes. They often seem to think that if our brains were in fact just protein machines, we couldn't be responsible, lovable, valuable persons.

Via wonderTissue.

Saturday, August 25, 2007

Beautiful Engineering

One Bridge Doesn’t Fit All

American bridge engineering largely overlooks that efficiency, economy and elegance can be mutually reinforcing ideals. This is largely because engineers are not taught outstanding examples that express these ideals.

A 2000 report by the Federal Highway Administration indicated that an average of about 2,500 new bridges are completed each year; each could be an opportunity for better design. The best will be elegant and safe while being economical to build.

The key is to require that every bridge have one engineer who makes the conceptual design, understands construction and has a strong aesthetic motivation and personal attachment to the work. This will require not only a new ethos in professional practice, but also a new focus in the way engineers are educated, one modeled on the approach of those Swiss professors, Wilhelm Ritter and Pierre Lardy.

Via Bridges and code.

Wednesday, August 22, 2007

A Bunch of IntelliJ Goodness

I've been languishing without perhaps the best plugin of all time for IntelliJ. I mention it to everyone I work with and they said, hey I've come across this plugin that you might be interested in. And it's, ToggleTest. A simple plugin that lets you switch between tests and production code. Never let your hands leave the keyboard again.

Other plugins from the author include the best meta-plugin hot plugin and cruise watcher.

Result

The previous issue of IEEE is avaiable to Xplore subscribers. It is an issue dedicated to TDD and agile development.

Mock Objects: Find Out Who Your Friends Are highlights the oft discussed criticisms of test driven development including: breaking encapsulation, hard to read, increased code fragility and whether it is a valuable use of developer time. Of course, these differences can be resolved in a typical XP way:

My point is that plenty of well-written code is produced without relying on mocks or Tell, Don’t Ask. On the other hand, I also encounter plenty of TDD novices who produce poor code and poor tests. Can Steve and Nat’s approach consistently help novices write better code and tests? It could be. At this point, I remain open minded and am ready to pair-program with Steve or Nat to learn more.

The second article, is about the studied effect of TDD on the quality of code, TDD: The Art of Fearless Programming. Pretty much every study showed an increased effort (15-60%) and an increase in quality (5-267%). The ratio of effort to quality is 1:2 - so it seems to pay. Example definitions of quality were "reduction in defects" or "increase in functional tests passing". In general:

All researchers seem to agree that TDD encourages better task focus and test coverage. The mere fact of more tests doesn’t necessarily mean that software quality will be better, but the increased programmer attention to test design is nevertheless encouraging. If we view testing as sampling a very large population of potential behaviors, more tests mean a more thorough sample. To the extent that each test can find an important problem that none of the others can find, the tests are useful, especially if you can run them cheaply.

Via Mock Objects

Saturday, August 18, 2007

Off to See the Wizard

Microsoft comes up with a quite elegant and well integrated language extension for relational and XML data. IBM fires back 2 years later with a wizard that auto-generates beans and Java code with embedded SQL. The article also makes assertions like:

The most popular way to objectize to programmatically access and manipulate relational data has been through special APIs and wrappers that provide one or more SQL statements written as text strings.

It does seem like it does give you some benefits when writing SQL but it's simply not LINQ.

Thursday, August 16, 2007

Everything You Know is Wrong

OWL can be, as the French would say, databasesque. Using OWL to implement database like integrity constraints.

It's time to throw away your database (a paper describing the advantages of column databases) and replace it with something more flexible and scalable (using C-Store to store RDF). And column databases mean no NULLs. The problem isn't that your data is normalized it's that it's not normalized enough (also mentioned here).

Following a question that I had earlier this year about where is the good code to learn from, I've read about 3/4 of Beautiful Code. And while I have a hard time finding C/C++ code beautiful it certainly covers some good examples of code and the process behind writing it (including Beautiful Tests, Beautiful Concurrency and MapReduce).

Latency Moore's curse.

Nova links to why what you thought about genes is wrong (I've actually heard this before but hadn't come across any articles).

Scaling laws in Biology. Humans run on 100 watts or used to. Now we have cars etc. our power requirements are now 11,000 watts (making us the biggest creatures that have ever lived). And the magical number: 4.

The Enemies of Reason is now on Google Video.

Wednesday, August 08, 2007

Can't wait until they find the Astronaut Helmet

Lego giant emerges from sea "Workers at a drinks stall rescued the 2.5-metre (8-foot) tall model with a yellow head and blue torso.

"We saw something bobbing about in the sea and we decided to take it out of the water," said a stall worker. "It was a life-sized Lego toy.""

Naming

Relatively recently, UniProt (a protein sequence database) announced they were moving away from LSIDs (which are URNs) to URIs. The cons of LSIDs seem to be outweighing the pros. More generally, there seems to be much discussion as to what is more appropriate and what can and cannot be done with URIs vs URNs. And even some of LSIDs proponents are saying that using them at the moment is not a good idea.

The form of a URN is: urn:lsid:ubio.org:namebank:11815 which can be resolved using: http://lsid.tdwg.org/summary/urn:lsid:ubio.org:namebank:11815. A couple of IBM articles describe it in more detail: "Build a life sciences collaboration network with LSID" and "LSID best practices". Part of the problem seems to be that the LSID needs a resolving service, much like a web service, to return the data for a given LSID. The URI on the other hand can just use a bit of content negotiation to return either RDF data or human readable HTML. It's not a well known feature that a web client can tell a web server what data it can accept. So a Semantic Web client would say "give me RDF" and a normal web client says "give me HTML".

An alternative is using an existing standard, such as THTTP, which shows how to turn a URN into a URL by providing a REST based service. Where, requesting a URL for the urn "urn:foo:12345-54321" it becomes an HTTP request "GET /uri-res/N2L?urn:foo:12345-54321 HTTP/1.0". This is a bit like the biordf.org approach of "http://bio2rdf.org/namespace:id". Having de-referencable URIs is part of the Banff Manifesto.

Creating GUIDs is an interesting problem in a distributed environment. One of the other life science groups compared Handle, DOI, LSID and PURL (persistent URLs).

The mentioning of the Handle System brought back ideas from previous digital library work and using URNs to name RDF graphs (which I later to discovered wasn't entirely novel).

Friday, August 03, 2007

Game Changer

Wag the dog

Arguing that EC2 has no intrinsic business value, is like arguing that an electrical grid or a telephone network has no intrinsic business value. Speculation: one reason business systems can't adapt is because the assumptions about what the business used to do, are embedded deep in the code. Very deep, not easy to pull out. And not just in the code but in the physical architecture the system is running on. Business "logic" is like bindweed - by the time you've pulled it out, you've ripped out half the garden as well.

So I missed lunch with a friend earlier this year because he was stuck in an Exchange upgrade. This was at the same time I was looking into Google's architecture and it struck me that there's no real upgrade process or data conversion process that needs hand holding for their architecture. It was at that point that I thought a lot of the jobs system administrators currently do will be greatly simplified or removed with the right software. The ratio of system administrators to servers at Google has to be much smaller (i.e. more servers to people) because they need so many more servers and there just aren't that many system administrators to look after software written in the usual way.

It came up again when I got roped into a meeting with a bunch of guys that administer several clusters. I was quite happy to be quiet but it was brought up that I was looking at cluster techniques. I explained about EC2 and how you can run a server for as little as 10 cents per instance hour and experiments for $2. I think it came as a bit of a shock to them - first the cheapness (obviously) and second the availability (basically its removing the gatekeeping to resources). The second benefit is something that really removes a lot of the politics - something not to be underestimated.

Thursday, August 02, 2007

More Hadoop

A couple of days late, but the OSCON apparently had a focus on Hadoop:

The freedom to take your data away rather than controlling the rights to your source code might dominate the Web 3.0 meme. Hadoop! "I got a call from David Filo (co-founder of Yahoo!) last night saying Yahoo! is really behind this."

The main focus of the attention was the Doug Cutting and Eric Baldeschwieler talk.

The whole, shared nothing programming model is certainly infectious and it seems scarily appropriate for Semantic Web data (see some of the talk for more information). Yahoo's Hadoop research cluster is now apparently up to 2x1000 nodes.