Friday, September 29, 2006

Explaining Continuations

Continuations, functions and jumps Explaining that a function call consists of two jumps: one from the caller to the callee and the one back to the caller. Continuation passing style is then shown how it effects the semantics of a language.

With Infinite Money

Life at Google and the Talent Myth "I've talked to people who've come to Microsoft from Google (e.g. Danny Thorpe) and it definitely is as chaotic as it sounds there. For some reason, the description of life at Google by Steve Yegge reminds me a bit of Microsoft where there were two huge money making projects (Office & Windows in the case of Microsoft and AdWords & AdSense in the case of Google) and then a bunch of good to mediocre projects full of smart people dicking around. Over the years I've seen a reduction of the 'smart people dicking around' type projects over here and more focus on shipping code."

Someone seems to have forgotten that real artists ship.

I read the original article, "Good Agile, Bad Agile" expecting indepth analysis of Agile (or agile) methods. Instead it says things like, "Most of us in our industry are date-driven. There's always a next milestone, always a deadline, always some date-driven goal to it.

The only exceptions I can think of to this rule are:

1) Open-source software projects.
2) Grad school projects.
3) Google."

All of these cases are without time or money constraints.

Another post, "Stupid is as stupid does", says their methodology explains the continual betaness of Google, "Unasked by Joel, and left unexplained by Steve: everything at Google stays in beta, pretty much forever. Hmm. Why do you suppose that is? Well, you get a bunch of "really smart" people together, don't put any product/project management together, and let them move around at will... what do you get? You get a bunch of projects that end up being 80% done (i,e., all of the technically "interesting" pieces are done, but that boring "polish" stuff isn't).".

Monday, September 25, 2006

JRDF SPARQL Performance

I did some performance gathering of JRDF's implementation of SPARQL using some of the FOAF data from Mindswap.

Average for Query 1, 100,000 triples:
* Jena (ARQ 0.9.2) - 14685 ms
* JRDF and JRDF using Tuple Subsumption - 3652 ms

There is only one UNION implementation in JRDF.

Average for Query 2, 100,000 triples:
* Jena (ARQ 0.9.2) - 22872 ms
* JRDF - 6615 ms
* JRDF using Tuple Subsumption - 3750 ms

Average for Query 3, 100,000 triples:
* Jena (ARQ 0.9.2) - 15306 ms
* JRDF - 8019 ms
* JRDF using Tuple Subsumption - 4780 ms

The point is not that it was faster that Jena (although yay!), it's that the relational optimisations had a positive effect on querying speed. The current downloadable version of the JRDF SPARQL GUI (0.2) has very slow versions of hash code and equals methods for performance sensitive classes like AttributeValuePair. It still showed the benefit of the optimisations but made it 10 times slower to answer the queries. The modified version of these classes is only available in the JRDF subversion repository.

Friday, September 22, 2006

Ultimate Mashup

Thursday, September 21, 2006

A Better Way

There's a couple of paper's I've been reading about fusing functional programming and relational model and the influential "Can Programming Be Liberated From the von Neumann Style? A Functional Style and its Algebra of Programs".

Also we have Dijkstra's, "On the cruelty of really teaching computing science". Firstly, he states that people are by nature conservative and often use previous experience as the basis to learn something new, "...radical novelties are so disturbing that they tend to be suppressed or ignored, to the extent that even the possibility of their existence in general is more often denied than admitted."

So computing is generally quite different but people continue to use the wrong metaphors and analogies: "The practice is pervaded by the reassuring illusion that programs are just devices like any others, the only difference admitted being that their manufacture might require a new type of craftsmen, viz. programmers. From there it is only a small step to measuring "programmer productivity" in terms of "number of lines of code produced per month". This is a very costly measuring unit because it encourages the writing of insipid code, but today I am less interested in how foolish a unit it is from even a pure business point of view. My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

Besides the notion of productivity, also that of quality control continues to be distorted by the reassuring illusion that what works with other devices works with programs as well. It is now two decades since it was pointed out that program testing may convincingly demonstrate the presence of bugs, but can never demonstrate their absence. After quoting this well-publicized remark devoutly, the software engineer returns to the order of the day and continues to refine his testing strategies, just like the alchemist of yore, who continued to refine his chrysocosmic purifications."

The solution, is teaching students correctly, that the basics is not to be found Pascal or C or Java but in logic and mathematics: "Right from the beginning, and all through the course, we stress that the programmer's task is not just to write down a program, but that his main task is to give a formal proof that the program he proposes meets the equally formal functional specification. While designing proofs and programs hand in hand, the student gets ample opportunity to perfect his manipulative agility with the predicate calculus. Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs. And this concludes the sketch of my proposal for an introductory programming course for freshmen."

Monday, September 18, 2006

Unicode Maths Font

Does Your Browser Support Multi-language? is potentially the only available source of Miscellaneous Mathematical Symbols-A which has the left outer join symbol. I downloaded about half a dozen different fonts and none of them had them. Many thanks to Simon.

Sunday, September 17, 2006

Identity with UUID

Don't Let Hibernate Steal Your Identity "This example uses the object id as the definition of equals() and to derive hashCode(). This is much simpler. However, to make this work we need two things. First, we need a way to ensure every object has an id even before it is saved. This example assigns the id a value as soon as the id variable is declared. Second, we need a way to determine if this is a newly created object, or a previously saved object. In our original example, Hibernate checked whether the id field was null to determine if the object was new. Obviously this won't work anymore since our object id is never null. We can easily solve this by configuring Hibernate to check whether the version field, rather than the id field, is null. The version field is a much more appropriate indicator of whether your object has been previously saved."

"Furthermore, the new definition of equals() and hashCode() is universal for all objects that contain an object id. That means we can move those methods to an abstract parent class. We no longer need to re-implement equals() and hashCode() for every domain object, and we no longer need to think through which combination of fields is both unique and immutable for each class. Instead, we simply extend the abstract parent class. Of course, we don't want to force our domain objects to extend from a parent class, so we'll also define an interface to keep things flexible."

"We now have a simple and effective way to create domain objects. They extend AbstractPersistentObject, which automatically gives them an id when they're first created and properly implements equals() and hashCode(). They also get a reasonable default implementation of toString() that they can optionally override. If this is a test object or an example object for a query-by-example the id can be changed or set to null. Otherwise it should not be altered. If for some reason we need to create a domain object that extends some other class, it can implement the PersistentObject interface rather than extend the abstract class."

DS Hacks

* Frodo c64 emulator for the DS.
* Doom.
* Homebrew for DS.
* DSLinux.
* M3 Adapter SD Version Slim + Passcard 3.
* 40GB hard drive.

Friday, September 08, 2006

Iron VMs

Will It Be JRuby vs. IronPython? "On the heels of Microsoft's release of IronPython 1.0 comes the news that Sun has hired the two primary JRuby developers.

I don't know if Sun intended it, but the juxtaposition of these two news item gives the effect that the two managed execution environments—The JVM and .NET—have each chosen an anointed champion scripting language for their platform.

The orphan that's left out of the spotlight is Jython. I feel kind of sad for Jython and its lone publicly known active maintainer Frank Wierzbicki."

JRuby Steps Into the Sun "The potential for Ruby on the JVM has not escaped notice at Sun, and so we'll be focusing on making JRuby as complete, performant, and solid as possible. We'll then proceed on to help build out broader tool support for Ruby, answering calls by many in the industry for a "better" or "smarter" Ruby development experience."

Thursday, September 07, 2006

The Myth of Web 2.0

All We Got Was Web 1.0, When Tim Berners-Lee Actually Gave Us Web 2.0 is a reference to an interview of Tim Berners-Lee, "Yes, the original vision of Berners-Lee is now apparently happening, so he's right in a sense there while glossing over the reality of the early Web. But though his vision was largely possible since the advent of the first forms-capable browser, at first we only got what we could call "Web 1.0"; simple Web sites that were largely read-only or at least would only take your credit card. The essential draw of mountains of valuable user generated content just wasn't there. And the millions of people with the skills and attitudes weren't there either. Even the techniques for making good emergent, self-organizing communities and two-way software were in their very infancy or were misunderstood. An example: How long did it take the lowly editable Web page (aka wikis) to be popular and widespread? Nearly a decade. The fact is, most of us know that innovation is all too likely to race ahead of where society is. I run into folks from Web 1.0 startups fairly often that bitterly complain about how they were building Web 2.0 software in 2000, but nobody came."

Tim also mentions those boring things like the Semantic Web, SPARQL and Enquire (the first browser which was an editor and browser).

Wednesday, September 06, 2006

The 7th dimension is infinity

Possible Worlds: The Fifth Dimension So it starts at 0 so I think the 5th dimension is labelled "4" on the presentation, the 6th dimension is actually all possible worlds. There are 11 dimensions but it says the 10th. Using Wikipedia's definition of dimension the 0 dimension is the first defined dimension.

Relational.OWL

Relational.OWL - A Data and Schema Representation Format Based on OWL "In this paper we introduce a Web Ontology Language (OWL)-based (Miller & Hendler 2004) representation format for relational data and schema components, which is particularly appropriate for exchanging items among remote database systems. OWL, originally created for the Semantic Web enables us to represent not only the data itself, but also its interpretation, i.e. knowledge about its format, its origin, its usage, or its original embedment in specific frameworks.

Hence, remote databases are instantly able to understand each other without having to arrange an explicit exchange format - the usage of OWL on both sides is sufficient. This would be impossible using present XML formats."

"Bizer introduced in (Bizer 2003) a mapping language between relational data and RDF, particularly between specific relational query results and RDF. Contrary to our approach, D2R MAP converts the stored data into ”real” RDF objects, i.e. an address would be represented as a RDF address ob ject. This approach takes into account, that the original database cannot be reconstructed using this kind of data representation anymore, since it does not contain information concerning the original schema of the database. As a result, the data represented with the D2R MAP language looses its relationship to the original database. Tracing the data to its original storage position is thus hardly possible."

Homepage, other papers (such as "Database to Semantic Web Mapping using RDF Query Languages" and the good "Bringing Relational Data into the Semantic Web using SPARQL and Relational.OWL") and documentation with beta versions of the software written in Java (uses Jena).

D2R Server 0.3 was release recently too.

Monday, September 04, 2006

Modular Predicate Dispatch

Tom's post on interfaces reminded me of a paper and software I looked at in 2004 that implemented multiple dispatch using predicate inference in order to execute the correct method on a class. It was interesting at the time as it seems like a natural evolution of propositional logic technologies including things like the Semantic Web. Not to mention an implementation of the Visitor pattern that was quite neat.

A new paper released early this year, "Modularly Typesafe Interface Dispatch in JPred", in which they talk about the way they've successfully added support for predicate dispatch on interfaces. This is much better than most previous approaches, as far as I can tell. It's interesting that the type inferencing in their compiler was able to help debug the compiler itself.

They make reference to the Multijava project which adds open classes and symmetric multiple dispatch - but is limited to using classes and not interfaces like JPred originally was. Open classes are a way of getting some of the benefits that the Visitor pattern without modifying as much code. More details are in this paper.

The code looks like:
class C {
void m(Object o) {
System.out.println("got a C and an Object");
}
}
class D extends C {
void m(final Object o) {
System.out.println("got a D and an Object");
super.m(o);
this.resend(o);
}
}


The 2006 paper also references, "featherweight Java", which seems to have opened up a whole range of academic possibilities.

Saturday, September 02, 2006

Attacking the Boundary of Ignorance


  • The Ontology Integration Problem "The reason I think a mapping tool is a critical need is that I think while in theory it's a nice idea to imagine ontologists reusing ontologies from one another, in practice many ontologists (especially those working on large complex ontologies) would rather write their own internally consistent ontologies and map them to other ontologies rather than importing other ontologies into what they are making and then having to deal with all the inconsistencies and confusion that arises from doing that."

  • Language Wars "I know Paul told you that he made his app in Lisp and then he made millions of dollars because he made his app in Lisp, but honestly only two people ever believed him and, a complete rewrite later, they won't make that mistake again.

    The safe answer, for the Big Enterprisy Thing where you have no interest in being on the cutting edge, is C#, Java, PHP, or Python..."

  • Circles of knowledge and boundaries of ignorance "...earning can be viewed as either expanding your circle of knowledge or as increasing your boundary of ignorance. So, the more you learn the more you know, but also the more you know that you don’t know. Depending on your temperament, this can be either encouraging or discouraging to your efforts to continue learning."

  • Bigtable: A Distributed Storage System for Structured Data From Google Research Papers

  • JUnit 4 vs. TestNG A few advantages of TestNG such as dependencies and providing test data (even complex objects).