Wednesday, November 28, 2007

Academic Software and BioMANTA

On Monday I gave a presentation at the ACB all hands meeting on the BioMANTA project. It covered the basics: the integration process, ontology design and the architecture.

There were some very incomprehensible presentations. But of the ones I did understand the lipid raft modeling (which looked a bit like Conway's Game of Life) was perhaps the coolest. There was quite a few presentations of software that involved: "we did it this way, the specifications changed, next time we'll do it right because what we have at the moment is a mess". It's very frustrating to see that change is still not accepted as the norm.

Two posts I read recently reminded me of this too: "Architecture and innovation" and "Why is it so difficult to develop systems?". Both describe the poor state of academic software. Many projects seem to suffer from this problem, not just academic ones, although academic ones seem to be prone to adding technology because it's cool/trendy/whatever which may help get it published but usually obscures the real novel aspects (or worse they don't have anything except the cool or trendy technologies).

But my impression of the last 10-15 years (especially W3C and Grid/eScience projects) is that they rapid become overcomplicated, overextended and fail to get people using them.

Ultimately much of the database and repository technology is too complicated for what we need at the start of the process. I am involved in one project where the database requires an expert to spend six months tooling it up. I thought DSpace was the right way to go to reposit my data but it wasn’t. I (or rather Jim) put 150,000+ molecules into it but they aren’t indexed by Google and we can’t get them out en masse. Next time we’ll simply use web pages.

By contrast we find that individual scientists, if given the choice, revert to two or three simple, well-proven systems:

* the hierarchical filesystem
* the spreadsheet

A major reason these hide complexity is that they have no learning curve, and have literally millions of users or years’ experience. We take the filesystem for granted, but it’s actually a brilliant invention. The credit goes to Denis Ritchie in ca. 1969. (I well remember my backing store being composed of punched tape and cards).

If you want differential access to resources, and record locking and audit trails and rollback and integrity of commital and you are building it from scratch, it will be a lot of work. And you lose sight of your users.

So we’re looking seriously at systems based on simpler technology than databases - such as RDF triple stores coupled to the filesystem and XML.

No comments: