More News: 2006

Saturday, December 23, 2006

Thank Goodness

THANK GOODNESS! About Dan Dennett's recent brush with death.

"Yes, I did have an epiphany. I saw with greater clarity than ever before in my life that when I say "Thank goodness!" this is not merely a euphemism for "Thank God!" (We atheists don't believe that there is any God to thank.) I really do mean thank goodness! There is a lot of goodness in this world, and more goodness every day, and this fantastic human-made fabric of excellence is genuinely responsible for the fact that I am alive today. It is a worthy recipient of the gratitude I feel today, and I want to celebrate that fact here and now."

"Do I worship modern medicine? Is science my religion? Not at all; there is no aspect of modern medicine or science that I would exempt from the most rigorous scrutiny, and I can readily identify a host of serious problems that still need to be fixed. That's easy to do, of course, because the worlds of medicine and science are already engaged in the most obsessive, intensive, and humble self-assessments yet known to human institutions, and they regularly make public the results of their self-examinations."

"One thing in particular struck me when I compared the medical world on which my life now depended with the religious institutions I have been studying so intensively in recent years. One of the gentler, more supportive themes to be found in every religion (so far as I know) is the idea that what really matters is what is in your heart: if you have good intentions, and are trying to do what (God says) is right, that is all anyone can ask. Not so in medicine! If you are wrong—especially if you should have known better—your good intentions count for almost nothing. And whereas taking a leap of faith and acting without further scrutiny of one's options is often celebrated by religions, it is considered a grave sin in medicine. A doctor whose devout faith in his personal revelations about how to treat aortic aneurysm led him to engage in untested trials with human patients would be severely reprimanded if not driven out of medicine altogether."

"In other words, whereas religions may serve a benign purpose by letting many people feel comfortable with the level of morality they themselves can attain, no religion holds its members to the high standards of moral responsibility that the secular world of science and medicine does!"

Wednesday, December 20, 2006

Some Design Issues

Semantic Web Road map "It is clearly important that the query language be defined in terms of RDF logic. For example, to query a server for the author of a resource, one would ask for an assertion of the form "x is the author of p1" for some x. To ask for a definitive list of all authors, one would ask for a set of authors such that any author was in the set and everyone in the set was an author. And so on."

Relational Databases on the Semantic Web "Is the RDF model an entity-relationship mode? Yes and no. It is great as a basis for ER-modelling, but because RDF is used for other things as well, RDF is more general. RDF is a model of entities (nodes) and relationships. If you are used to the "ER" modelling system for data, then the RDF model is basically an openning of the ER model to work on the Web. In typical ER model involved entity types, and for each entity type there are a set of relationships (slots in the typical ER diagram). The RDF model is the same, except that relationships are first class objects: they are identified by a URI, and so anyone can make one. Furthurmore, the set of slots of an object is not defined when the class of an object is defined."

Linked Data "The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data."

"So statements which relate things in the two documents must be repeated in each. This clearly is against the first rule of data storage: don't store the same data in two different places: you will have problems keeping it consistent. This is indeed an issue with browsable data. A set of of completely browsable data with links in both directions has to be completely consistent, and that takes coordination, especially if different authors or different programs are involved."

Tuesday, December 19, 2006

A Sick Industry

Respecting the whole person "There's a discussion going on about women and IT, starting with a post by Richard Jones: Why there's few women in IT. Phillip Eby responded in Is porn driving women away from the computer industry?, The Real Reasons There Are Few Women In IT -- And What YOU Can Do About It and Why (Most) Men Don't Get It -- and I personally find his analysis much more useful.

Richard's original post talks about a case where someone used porn in a presentation, and he's basically saying that is "why there's few women in IT". I don't buy it."

Links to HOWTO Encourage Women in Linux.

The real question is why people (men and women) aren't going into IT - it's not just women. The IT Workforce Conundrum "...college student enrollment in computer science programs is down substantially from the 1990s -- around 20 percent according to the Computing Research Association. In fact, the peak numbers reached in the last decade reflected the Internet boom and represented twice as many CS students as there were in the 1970s. Outsourcing is helping to meet the demand of some types of IT jobs, but many companies are still hard-pressed to find qualified tech workers.

A recent PricewaterhouseCoopers (PwC) report predicts that competition for high-tech talent is going to become even more severe over the next few years as globalization absorbs the remaining technology workers around the world. The report says that while companies have been forced to look offshore in order to gain access to larger pools of talent, even this resource is not bottomless. European and Asian executives anticipate a severe shortage of tech talent within the next three years. And, according to the report, worker compensation is being equalized globally; even India and China are not considered low-cost anymore."

The PCW Report says, "In an era of scarcity, technology companies will need to focus more acutely on their talent management. But today, in many areas of human capital management, executives’ self-assessments show little or no confidence in their companies’ abilities."

X for Vista

Windows Vista Vista is not a copy of OS X. The search box is in the completely opposite corner (bottom left opposed to top right), gadgets not widgets, and the 3D chess games in Vista has porcelain as a board type (innovative).

Vista Wins on Looks. As for Lacks ... suggests that the copying wasn't quite perfect, "And then there’s that Sidebar, the floating layer of mini-programs. If you close one of the gadgets, you lose its contents forever: your notes in the Post-it Notes gadget, your stock portfolio in the Stocks gadget, and so on. You couldn’t save them if you wanted to. How could Microsoft have missed that one."

Thursday, December 14, 2006

Relational OWL

As a sequel (not SQL) to relational SPARQL there's relational OWL, "In this paper we analyzed the similarities and differences between relational databases and description logics, in particular with respect to the role of schema constraints. Our analysis reveals more similarities than differences since, in both cases, constraints are just (restricted) first-order theories. Furthermore, reasoning about the schema is in both systems performed under standard first-order semantics, and it even employs closely related reasoning algorithms. The differences between relational databases and description logics become apparent only if one considers the problems related to reasoning about data. In relational databases, answering queries and constraint satisfaction checking correspond to model checking, whereas, in description logics they correspond to entailment problems."

Via All Problems Solved.

No Chance

Yangtse dolphin 'is extinct', a victim of economic explosion "Instead, the dolphin was driven to destruction by the noise of the traffic, which disrupted the sonic waves it used to navigate — it was virtually blind, its tiny eyes useless in the murky, sedimented water of the river.

In addition, overfishing had cut its food supplies by half, while the huge Three Gorges Dam had interfered with current and sandbank patterns. The World Conservation Union will only declare a species formally extinct once there is "no reasonable doubt", but Mr Pfluger said it was too late to hold out any more hope."

There's actually a blog that covers these based on Douglas Adam's, "Last Chance to See", called "Another Chance to See". As he said, "There is one last reason for caring, and I believe that no other is necessary. It is certainly the reason why so many people have devoted their lives to protecting the likes of rhinos, parakeets, kakapos, and dolphins. And it is simply this: the world would be a poorer, darker, lonelier place without them."

The whole expedition is available in blog form at Blog Baji.org, where the race goes on to stop the extinction of another Yangtze inhabitant the finless porpoises. They say, "The baiji is Functionally extinct. Lipotes vexilifier is the first species of cetacean – whales, dolphins and porpoises – to disappear from our globe in modern times…the first large mammal to go extinct as a result of man’s destruction of their natural habitat and ressources."

At least there's still the Kakapo to be seen.

Monday, December 11, 2006

The Difference is Antijoin

I made a mistake about making a mistake. I got caught up with the fact that SPARQL's diff is not the same as relational difference but that's not how left outerjoin is defined. So I'm correct that relational set difference is not compatbile with SPARQL's difference as defined in "The SPARQL Algebra" but wrong that this means the definitions for left outer join are not equivalent.

So antijoin is:
( R1 difference (project(R1)(R1 join R2)) )

So using the previous example:
{{ (?x = 2, ?y = 3) }} \ {{ (?y = 3) }} is {} in SPARQL.

Now, as I said using plain difference it's:
{{ (?x = 2, ?y = 3) }} - {{ (?y = 3) }} is {{ (?x = 2, ?y = 3) }} in relational algebra.

But using antijoin the right hand side is really:

project ({{ (?x = 2, ?y = 3) }}) ({{ (?x = 2, ?y = 3) }} join {{ (?y = 3) }})
project ({{ (?x = 2, ?y = 3) }}) ({{ (?x = 2, ?y = 3) }})
{{ (?x = 2, ?y = 3) }}. Which matches the left hand side.

This is why JRDF looked like it was producing the right result - it was doing the same thing - it's just the operations were further decomposed. I'm sure I've done this before but for some reason I was certain I was wrong the other day. I was told that antijoin and SPARQL's diff were equivalent when it was changed but I had to question it.

There still remains an outstanding issue around null joins but maybe that's not needed either. I know that the NULLs in "SPARQL RULES!" are required for Logic Programs but it doesn't need to be there for relational algebra as long as you have the null accepting/untyped join.

So SPARQL in JRDF looks fine again.

JSF to XHTML to SVG or PDF

I haven't looked at JSF since 2004 where I thought it would be cool to combine Swing and JSF. Coming back to JSF after nearly two years I came across, Combine JSF Facelets and the Flying Saucer XHTML Renderer which takes XHTML content and rendered it using Swing this is then transformed into an image, PDF or SVG. In the past I've used iReport and directly iText to generate reports (PDF and Excel). The thing at the back of my mind is always, this seems to be reproducing a lot of what HTML does and people only really want PDF because it prints well and Excel because they have macros. Until there are macros built into browsers to do manipulation of data that is.

Thursday, December 07, 2006

Microsoft does RDF

Microsoft Connected Services Framework "The Profile Manager component provides profile management services. Connected Services Framework uses Profile Manager to store custom information about users and their preferences. Profile information is held in a Resource Description Framework (RDF) store, which is implemented by a Microsoft SQL Server database. Profile Manager provides facilities for creating and managing user profile information and for propagating profile information to Web services that cooperate in a service-oriented application."

Behold MS SPARQL: "

string sparqlQuery =
    @"PREFIX ex: 
    CONSTRUCT {
     ex:fullName ?Name
    }
    WHERE {
     ex:fullName ?Name
    }
  ";

"

Via, links for 2006-12-07. Microsoft using RDF and SPARQL, "...according to timbl Vista will be using XMP (RDF inside) to store metadata about photos etc."

Also, I recently noticed their implementation of EXCEPT (SQL's set difference) EXCEPT and INTERSECT (Transact-SQL).

Marx Smurf

A followup to, Papa Smurf is a Communist. Wikipedia has an article about The Smurfs and communism "Papa Smurf has a wide beard, which some feel looks like Karl Marx's. He also wears red slacks and a red cap, displaying the stereotypical color of Communism throughout the world. Despite the society's communal nature, Papa Smurf does have the ultimate authority, often overruling Brainy Smurf when he oversteps his boundaries. In several episodes when Papa Smurf is not present, the Smurf Village's utopian system destabilizes entirely."

Also, "Communism fell in Russia around the time that The Smurfs were lost from TV syndication and comic publication."

Gummi Bears, what was there game?

Wednesday, December 06, 2006

JRDF GUI Takedown

The other day I decided to remove the JRDF GUI from Sourceforge. Subsequently, there has been a need to download it again. :-) It's here for the time being:
http://jrdf.sf.net/jrdf-gui-0.3.jar

Like I've said, none of the operations implemented in JRDF match SPARQL's. In order not to throw away the work I've done I'm going to spend sometime soon renaming it and changing the syntax (to be more like Tutorial D). The name might be UQL (Unknown Query Language), RRQL (Relational RDF Query Language) or DRQL (tutorial D like RDF Query Language). Any suggestions?

Wired on Religion

So. Many. Letters "We would have sworn that our November cover story on New Atheism was going to generate a firestorm of reader criticism, delivered with a side order of brimstone just for good measure. Wrong. You all just started calmly talking. And talking. And talking. We got more responses to this article than to any piece in memory. Brimstone quotient: low."

"So we posted every last letter to our website."

Sunday, December 03, 2006

SPARQL Favours the Brave

SPARQL RULES! "Now, as opposed to [17], we define three notions of compatibility between substitutions:

Two substitutions O1 and O2 are bravely compatible (b-compatible) when for all x <- dom(O1) Intersection dom(O2) either xO1 = null or xO2 = null or xO1 = xO2 holds. i.e., when O1 Union O2 is a substitution over dom(O1) Union dom(O2).
Two substitutions O1 and O2 are cautiously compatible (c-compatible) when they are b-compatible and for all x <- dom(O1) Intersection dom(O2) it holds that xO1 = xO2.
Two substitutions O1 and O2 are strictly compatible (s-compatible) when they are c-compatible and for all x in dom(O1)Intersection dom(O2) it holds that x(O1 Union O2) /= null.

"

They make the conclusion that only c-joins operate correctly with idempotency for join and, "Following the definitions from the SPARQL specification, in fact, the b-joining semantics is the only admissible definition, which is why [17] does not consider null values at all. There are still advantages for gradually defining alternatives towards traditional relational algebra treatment. On the one hand, as we have seen in the examples above, the brave view on joining unbound variables might have partly surprising results, on the other hand, as we will see, the c- and s-joining semantics allow for a more effective implementation in terms of Datalog rules."

I'm surprised at how badly I've understood the proposed SPARQL algebra, I didn't think that joining on nulls was being proposed (see Example 2.4 on page 9 of the PDF). Again, this conflicts with relational algebra and JRDF's implementation.

A join in relational algebra does match the suggested s-compatible one, you can't join on NULLs (because they are considered unbound values, they aren't "real", they don't exist and aren't equalable values). As shown in the paper, it wouldn't successfully join a null name and would return only the row containing "Alice".

In SQL's 3VL logic, "NULL literally means that the value is unknown or indeterminate. One side effect of the indeterminate nature of NULL value is it cannot be used in a calculation or a comparision." This means you can't join across NULLs in SQL or use them for aggregate functions except for COUNT. NULLs are also handled differently across SQL implementations. One example that I've come across before is the handling of strings and NULL values between DB2 and Oracle.

It reminds me how much I hate IEEE citations (I think that's the standard) where you use [17] for the reference to Perez in this paper and it's [22] in another. What's wrong with [Perez2006] or something?

Anyway, they also suggest: a set different or minus operator (which I've suggested SPARQL would be incomplete without even though you can do it with FILTER and iTQL has had it for a while before that), nested queries using ASK and using SPARQL as a rules language.

This will probably be my last post on SPARQL for a while - I'm a bit sick of continually, publicly displaying my ignorance (which I guess has occurred over the last year). I could change JRDF to fit any of the proposed implementations (and every other permutation) but I'm not sure it makes sense. I feel very far away from understanding what's going on and I doubt I will (or should) be commenting about it too much in the future.

Update: I seem to have confused s and c semantics. I've updated the paragraph related to relational algebra - it does match s-compatibility.

Good Models - My Super-Turtle is Better than Yours

Standards and Pseudo Standards from Celebrating OWL interoperability and spec quality. Can a standard be based on a pseudo standard? Another posting also points to "An Investigation into the Feasibility of the Semantic Web" about the feasibility of integration using ontolgies.

Beyond Belief 2006 Session 5 includes Paul Davies (asks why the universe should even be understandable to human beings, why he's no longer a Platonist and levitating super-turtles) and Session 9 on why religion may have a place.

Agile Atheism "I am not agile because I don't believe in the agile religion and I don't accept its dogma. I like the engineering and planning practices that agile teams use - in the same way that I like people who do nice things (even when they do it because of fear of divine retribution). The difference is I don't want to be constrained by dogma into only doing those sensible things which are prescribed by agile. In the same way I don't like being prevented from doing sensible social things because of religious beliefs."

Jane's Rule for Loading Dishwashers "Compare this to test-driven development. There may be a little more effort when writing code, because you are writing programmer-tests to drive writing that code. It's an "unnatural" process, like sorting silverware into sub-bins when loading the dishwasher. But the true benefit comes later in the project (possibly just minutes or hours later), when you can rely on those programmer-tests to make refactoring safer. And since you fixed bugs during TDD, you have much less work fixing bugs later when it comes time to ship your product."

How can I get financial market information updated automatically to my spreadsheets?

10 most intelligent / least intelligent dogs My dog is apparently the stupidest breed with regards to obedience. I'm not sure I'd call obedience a sign of intelligence (the complete opposite really).

Saturday, December 02, 2006

Strong Typing without the Typing

Functions, Types, Function Types, and Type Inference "One of the most important things to recognize about Haskell's type system is that it's based on type inference. What that means is that in general, you don't need to provide type declarations. Based on how you use a value, the compiler can usually figure out what type it is. The net effect is that in many Haskell programs, you don't write any type declarations, but your program is still carefully type-checked."

"If we look at that type, and think about what the factorial function actually does, there's a problem. That type isn't correct, because factorial is only defined for integers, and if we pass it a non-integer value as a parameter, it will never terminate! But Haskell can't figure that out for itself - all it knows is that we do three things with the parameter to our function: we compare it to zero, we subtract from it, and we multiply by it. So Haskell's most general type for that is a general numeric type. So since we'd like to prevent anyone from mis-calling factorial by passing it a fraction..."

Currently I'm enjoying the print version "The Haskell Road to Logic, Math and Programming" (links to the PDF version). I haven't appreciated the writing style used in this math textbook this much since, "Introduction to Graph Theory". I'm still waiting for Haskell to reach 0.1% popularity.

Breaking Project

SPARQL Basic Graph Pattern Matching Andy writes, "Example data:

  :a :p 1 .
  :a :p 2 .

Example query pattern:

 { ?x :p [] }

How many answers? Blank nodes are existential variables in RDF; named variables (the regular ?x ones) are universal variables. Queries don't return the binding of an existential; queries can return the binding of a named variable and the bound value of a named variables can be passed to other parts of the query (FILTERs, OPTIONALs etc) via the algebra.

In the absence of a DISTINCT in the query, are the solutions to this pattern:

1 : ?x = :a
2 : ?x = :a , ?x = :a
Either 1 or 2

"
I would say there is a fourth option here, see this example, which would be:

{ { ?x:subject = :a, p1:predicate1 = :p, o1:object1 = 1 } , { ?x:subject = :a, p1:predicate1 = :p, o1:object = 2 } }

The DAWG test case, rdfSemantics-bNode-type-var, redefines the meaning of project.

If you keep track of where the variable were bound you can still count the distinct instances without redefining project.

Thursday, November 30, 2006

Different Strokes

I feel I should make a correction to comments I made in regards to the proposed SPARQL algebra.

In that post and in my thesis I've said that the definitions of OPTIONAL by Pérez et al (and used by Andy Seaborne) are compatible with the relational definitions I used. I'm now about as certain as I can be that that's not the case.

The definition in Pérez et al is for set difference is (in ASCII):
O1 \ O2 = {u <- O1 | for all u' <- O2, u and u' are not compatible}.

The key here is the definition of compatibility, I didn't find it clear in the original paper (my fault and I'll explain a bit more about this later) but it's explicit in "Semantics of SPARQL" where it says, "Two mappings with disjoint domains are always compatible, and the empty mapping u0 is compatible with any other mapping. Intuitively, u1 and u2 are compatibles if u1 can be extended with u2 to obtain a new mapping, and vice versa."

A relational join is defined as the set union of the headings and matching values. The heading consists of the attributes (X, Y, Z), where X is the attributes of the left hand side relation, Y contains the attributes that match the left and right hand side relations and Z are the attributes of the right hand side relation. The body consists of tuples with the values matching X, Y, and Z (Date writes it as { X x, Y y, Z z}). I think that this is compatible with the Pérez et al definition.

But it's different for difference. The difference operator in relational algebra requires the relations to be the same type to be removed (this is the definition I've used). I'll use "\" to represent SPARQL's set difference and "-" to represent the relational difference.

For example:
{{ (?x = 2, ?y = 3) }} \ {{ (?y = 3) }} is {} in SPARQL
{{ (?x = 2, ?y = 3) }} - {{ (?y = 3) }} is {{ (?x = 2, ?y = 3) }} in the modified Galindo-Legaria relational algebra that JRDF uses.

The reason it is unchanged in relational algebra is because the definition of equality requires them to be the same type (the same attributes). So a binding with one value can't be equal to a binding with two values. The Pérez et al definition is a much looser definition and many more matches are made.

As an aside, I was fairly sure that set difference was a requirement for OWL inferencing. I might be wrong here too. If it is the case, though, I'd think it'd be nice (an efficienct use of operations) if SPARQL operations could be reused for OWL ones. This would allow the difference operator by itself to be expressed in SPARQL too.

So I think I understand the issue now. At the moment I'm trying to work out how JRDF gets what seems to be the right results because it doesn't really look like it should. The other thing that was in my paper was a mapping to SQL and I'm not sure how a SQL based SPARQL implementation would work either (based on the definitions I used). Lots of questions anyway.

I am trying to work out whether this definition of compatibility is a good one. At the moment I don't like it or dislike it because I don't understand yet.

The trigger was the removal of the term antijoin from the SPARQL algebra - that is a good idea because I understood it to mean it was compatible with relational algebra (using the same terms I guess). Up until that point I'd assumed difference in SPARQL was compatible with difference in relational algebra.

Update: Paul mentions in the comments, that the SPARQL difference operator is the one needed for OWL. Another win for that definition then too.

It's apparent that the relational model really isn't appropriate to model SPARQL in. I don't think there's a single definition of an operation in relational algebra that hasn't been changed.

Update 2: I've since come to realize that joins are not compatible either. This is due to the suggestion that you can join on NULL values. This contradicts relational algebra (taking either the Date or Codd approach on NULL) and SQL joins (not that I think that's worth much but it's easy to demonstrate).

Update 3: Last update for this blog I hope. While relational difference is not equivalent, antijoin is equivalent. Which is what I was told when antijoin was renamed diff in first place. More information, "The Difference is Antijoin".

Tuesday, November 28, 2006

B, b, b boca gives you an enterprise ready RDF store

IBM Semantic Layered Research Platform announced the release of Boca: "Boca is designed to make it possible to build multi-user, distributed RDF applications" Supports named graphs, security, replication, revision history and JMS notification of changes.

The users guide lists some other interesting features such as a client stack with "a fair amount of compatibility with HP's Jena API" and text indexing using Lucene. Based on the configuration it looks like it requires DB2 or Apache Derby as well as Java 5.

Via, IBM SLRP Release.

Thursday, November 23, 2006

The Web is to Blame

It is the year 2000, we were promised flying cars.

While putting YouTube links I also appreciated, U2 & Green Day - The Saints are Coming.

Wednesday, November 22, 2006

Another SPARQL Algebra

A couple of days ago Andy Seaborne posted a link to The SPARQL Algebra, "The SPARQL Algebra defines the semantics of a SPARQL query execution. The algebraic expression is derived from a query string by parsing and transforming the abstract syntax tree. The result of a query can then be calculated by the evaluation rules. This gives the correct results of query -- it does not imply the actual execution of a query must be performed in this manner."

This seems influenced by Jorge Pérez's work (or least it shares a common basis as the definitions are very similar) and looks like it's compatible with the stuff that I did (it even has antijoin in there).

And in other SPARQL news, Danny Ayers has published (based on discussions between Max Völkel and Richard Cyganiak) a SPARQL Update Language for insertion and deletion of triples.

Update: Andy Seaborne has posted about his work on this algebra, a version of ARQ is available which implements this algebra (it's post 1.4). The use of the FILTER command is also discussed. Via SPARQL will be formalized as an algebra.

Tuesday, November 21, 2006

The Search for the Levitating Super-Turtle

Atheists: The New Gays "Prior to 9/11, it would have been career suicide for a public figure to come right out and say God is a fairy tale. Now it’s a feature of popular culture. You can see it on cable of course, in shows such as BullSh*t, Real Time, The Daily Show, and Southpark. But it’s also a feature of network TV. The main character on House is written as the most brilliant human on the planet, and he’s an atheist."

"Ask a deeply religious Christian if he’d rather live next to a bearded Muslim that may or may not be plotting a terror attack, or an atheist that may or may not show him how to set up a wireless network in his house. On the scale of prejudice, atheists don’t seem so bad lately."

A good preview of Dawkins book is here.

I've recently finished reading both "The God Delusion" and "The Goldilocks Enigma" and I found both books quite good. Seeing as though so many people commented last time I thought I'd post a bit more about what I think this time.

Much like the disappointment of revisiting old television shows of your youth, Dawkins' book is great for pointing out how truly bad those stories taught at Sunday school were, including Noah, Lot and Abraham. Although I must admit, even as a child I found the story of the flood and "The A-Team" both rather unbelievable. There are other interesting topics, like coming up with morals without religion, but I think these are better covered elsewhere.

One of the main things I got out of this book is that progress is about conscious raising. Most improvements have come about when a society becomes aware of a problem and goes about trying to solve them. Historically this includes human rights, more recently global warming and ones to fully take hold like animal rights. The other thing I got out of it is that I don't have as many problems with religion as Dawkins.

I found Dawkins the least convincing when he diverges from his areas of expertise especially when he tries to cover cosmology. This is especially apparent when you compare his counter argument against teleology (things look like they were designed therefore there must be a designer). Dawkins explanation in relation to biology is clear and concise but for cosmology its rather glossed over and there seems to be a bit of hand waving. He doesn't provide a good argument why evolution on a universal scale is well founded. This is where Paul Davies' book provides some better arguments for a rational creation of the universe.

Davies is actually a little bit more open to the idea of God than Dawkins which, when he chooses a different explanation, makes his arguments more convincing. The possible explanations of the universe he discusses include: absurd (no real cause), unique (there are no free parameters for the universe to be the way it is), the multiverse (String theory), intelligent design (God or Gods), the life principle and the self explaining universe. He says he prefers the latter two explanations. I found the most interesting explanation given is the self explaining universe. It uses quantum mechanics, casual loops and the requirement for the universe to understand itself.

The last chapter of the book is certainly the best and I wish he spent the whole book on the ideas in it instead. His description of the infinite regress as the levitating super-turtle is great. He also describes how Platonism is incorrect, especially at the beginning of the universe, and how the laws of physics have emerged over time.

Thursday, November 16, 2006

JRDF GUI 0.3 Released

An important bug in JRDF's SPARQL project operation was found so I've released JRDF GUI 0.3 to fix it. There are also a couple of minor enhancements to do with some code cleanup including an increase in speed (about 25% for some queries). I've also started testing it against the new DAWG test cases and the relevant OPTIONAL and UNION ones have passed so far (I haven't tested them all).

RDF and OWL are Yahoo's Secret Weapons

Semantic Web Hype "At just over 1 year into my work at Yahoo! we have reached the first use in production of OWL and RDF in a small way and are looking at further steps, none of which I’m revealing...we should show how you don’t need to boil the ocean of semantics such as applying a big pile of technologies or requiring something fragile like a single web-wide ontology - wrong, wrong wrong. Start from concrete data-centric approaches that build up to use layers of technology solutions to different problems as they emerge, only if needed and demonstrating usefulness at each stage."

Web 2.0 is the new Applets

Web "Me2.0" -- Exploding the Myth of Web 2.0 "Web 2.0 is a myth -- there is no Web 2.0...I've seen this before -- it happened just over 10 years ago in the early days of Java applets. I should know, I launched http://www.gamelan.com -- which was THE portal for Java apps. Well guess what -- 10 years later, what remains of all those Java applets? Not the hundreds of thousands of little applets that people made (even though some were actually quite wonderful). No. They are almost all gone. Instead, what really survived and continues to grow is the Java platform itself, and large Java application platforms and development tools. That's where the real value is."

Nova has a bunch of interesting postings recently including: "Web 3.0 Versus Web 2.0", "Does the Semantic Web = Web 3.0? and "New York Times Article About the Emerging Semantic Web" (all about the recent NY Times article on the Semantic Web, the hype and misconceptions), "What is the Semantic Web, Actually?" and "The Meaning and Future of the Semantic Web".

Wednesday, November 15, 2006

Simplicity, Good Design and Refactoring

Questions for Paul Graham - Simplicity and beauty in art, science and programming "Perhaps we're at a point where art and science and programming diverge. Science (often) aims at finding the simplicity behind the apparent complexity of the universe. Engineering usually aims at efficient solutions, excluding the extraneous which introduces cost and more paths to failure. Art doesn't always aim at simplicity. It just as frequently tries to expose the complexity of what looked simple. Thus, perhaps the union of art, science and engineering maintained by Paul's essay isn't fundamental, although there are certainly historical periods in which they align."

Video of interview of Paul Graham and original article Taste for Makers.

Tuesday, November 14, 2006

The Classpath Exception

So I had almost exactly the same discussion with someone today about the GPL and Java that David Wood is alluding to.

Tim Bray has the answer: "Unmodified GPL2 for our SE, ME, and EE code. GPL2 + Classpath exception for the SE libraries. Javac and HotSpot and JavaHelp code drops today. The libraries to follow, with pain expected fighting through the encumbrances. Governance TBD, but external committers are a design goal. No short-term changes in the TCK or JCP.".

Friday, November 10, 2006

Mocking the Inspector

TDD Anti-Patterns "The Mockery - Sometimes mocking can be good, and handy. But sometimes developers can lose themselves and in their effort to mock out what isn’t being tested. In this case, a unit test contains so many mocks, stubs, and/or fakes that the system under test isn’t even being tested at all, instead data returned from mocks is what is being tested."

"The Inspector - A unit test that violates encapsulation in an effort to achieve 100% code coverage, but knows so much about what is going on in the object that any attempt to refactor will break the existing test and require any change to be reflected in the unit test."

Semantic Web 2.0

Some recent articles about people discussing the same ideas as the Semantic Web. The Great Database in the Sky "He went on to describe his vision of a skype for database access, combining my data, your data and public data into the next generation OLAP, running a trillion transactions per day. An example could be weather data and he asked what if you could run a SQL statement across all the data sources in the world"

"This is where it became evident that there is a deep disconnect between the traditional database community and the semantic web community. Mårten’s response was rather vague, that this wasn’t as broad as the semantic web and that the semweb includes unstructured data so wasn’t appropriate."

CEO of MySQL "Invents" the Semantic Web! "I have to say, his talk was both a validation of what we have all been working towards, and as Ian Davis explains, it is also a clear sign that the W3C and the Semantic Web community have not found a way to get the message accross."

And moving data around, owning your data seems to be another aspect of the Semantic Web overlooked.

WEB 2.0: Google CEO: Take your data and run "The more we can, for example, let users move their data around, never trap the data of an end user, let them move it if they don't like us, the better."

And my mind boggles at the idea of taking the proposed Australian Access card and their integration problems and fusing it with RDF. Some interesting points: "...the Access Card will be owned by the cardholder and not by the issuer...The effect of the issuer retaining ownership is that they control the card and the purpose for which it is used."

"In Centrelink alone we have a massive 275 kilometres of files...Medicare has to measure its records in a similar way. They have more than 3 square kilometres of storage space for forms with signatures."

"We collect, and almost never reuse, this information."

"The new card will finally put an end to this waste of time. We will be able to reuse the information that you have given us before, but only for the purposes for which you gave it to us. We can then pre-populate forms and take a lot of the pain out of the claim process."

Thursday, November 09, 2006

Mr Sparkle

Just posted, "Applying the relational model to SPARQL" to the DAWG comments list. It describes the work done adding SPARQL support (or small bits of it anyway) to JRDF. A HTML version to follow.

Direct link (PDF ~500K).

BTW, if anyone notices any grammatical errors please let me know and I'll fix them right up. I get to the point where I can't read what I've written anymore so there's bound to be a few errors.

Update: I've updated it with some small typos fixed, formatting and the example relations (some conversions to S1 -> Supplier1 that I missed) were fixed up.

Update 2: The HTML version is now available. Many thanks to Eric Prud'hommeaux for the work he did converting the original PDF.

Wednesday, November 08, 2006

Languages - Open vs Closed World

What did Javascript have that Java did not? "Languages based on static inheritance and typing are good for building complex silo (i.e. closed world) based applications. However, a global scale Architecture of Participation requires more dynamic structures like that found in prototype based inheritance and dynamic typing. In such a massively open world, the distinction between metadata, configuration and instances is simply impossible to pin down into well defined classes and configuration files."

"Javascript by its design is fundamentally messy, however that is its advantage over Java. The path to any sanity may just be what Google has shown, when building silo apps, hide the messy details under a clean well defined Java facade. Never forget though, that these facades are abstractions that leak. Always afford your applications the ability to escape into the Web and Javascript when necessary."

Bruce Tate has recently had a new article published, "Crossing borders: Delayed binding": "The more you dig into type and binding strategies, the more you find that waiting until run time to bind to an invocation or type fundamentally changes the programming process, opening a whole new world of possibilities. True, you find less safety. But you also find less repetition, more power, and more flexibility with fewer lines of code."

Why Semijoin is Better than Join

Query Processing in a System for Distributed Databases (SDD-1) "We prefer semijoins to joins for three reasons. First, Ri(A = B) Rj subsumes Ri, and so semijoins monotonically reduce the size of the database. By contrast, joins can increase the size of the database; in the worst case...Second, semijoins can be computed with less intersite data transfer than joins...we need only transmit a projection of a relation...the third advantage of semijoins is that the “reductive effect” of any single join can be attained by two semijoins, usually at lower cost..."

This is one of the first times, in many years, where I had to resort to using another search engine other than Google (Yahoo) to find this paper.

Tuesday, November 07, 2006

ISWC 2006

I noticed a few papers (I've only looked at them all yet) were available from the current ISWC:

Semantics and Complexity of SPARQL. The early version of this paper was used in my thesis.

A Model Driven Approach for Building OWL DL and OWL Full Ontologies.

MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data

Provenance Explorer -- Tailored Provenance Views Using Semantic Inferencing

Can OWL and Logic Programming Live Together Happily Ever After?

The best paper nominees is here including one that I hope to read in the near future "Querying the Semantic Web with Preferences".

Monday, November 06, 2006

Tupelo, Tupelo

Tupelo is a project to create a content repository to support data-driven scientific applications. More specifically it support JSR 170 (Content Repository API), WebDAV and URIQA (which are the extended HTTP verbs for getting RDF graphs). It implements a per object ACL and version control system.

It uses Kowari, 3Store, and Jena as backing stores. It also has a rather interesting API for dealing with triples called Context (more information in the cookbook). There are some smaller things too like using 1.5's varargs to add triples, transitive closures on queries, object to resource mapping and something that I'd meant to do for JRDF a while ago, have a ResourceVisitor that actually visits resources.

The presentation lists similar projects: SRB, Slide/SAM, Fedora, DSpace and Jackrabbit. All of which (except for SRB) I'm pretty familar with.

And ActiveRDF, a Ruby API for accessing RDF, went 1.0!

Sunday, November 05, 2006

A Perfectly Functional Blog

So now I can get my fill of functional rants complete with how it is applicable to every possible situation, Tony Morris’ blog.

Update: The first bit of content is up now, "Have you ever wanted to do this?". Which is all about how easy it is to redeclare equality in Haskell vs Java or .NET.

Quick links

Monad page on the Haskell wiki. Including the spacemen and space ship tutorial. Another interest metaphor for monads, there's a monster in my Haskell!

Mozilla to completely remove RDF support "Brendan Eich wants to remove RDF completely for Mozilla Firefox 3.0 aka Mozilla Platform 2.0."

E.O. Wilson + Daniel Dennett "Shortsightedness is natural. Hypertension is natural. Obesity is natural if you eat too much. There are many things that have deep evolutionary roots that are natural. But one of the glories of civilizations is we've learned to adjust things that may be natural but we don't like them. So I think natural cuts two ways."

Tim Berners-Lee Announces Web Science Initiative - Studying the Social Web "Tim Berners-Lee is leading the program, which is essentially about formalizing a new kind of scientific discipline called Web Science. The goal is to understand the deeper structure of the social Web and how people are using it. But as well as studying the Web, they also hope to shape the future of the Web."

The Fifth Element

Outcomes over Features - the fifth agile value? "An agile project – in fact any project – should focus on a set of outcomes. How we get there is less important than actually getting there. A CIO or business sponsor wants to solve a specific problem, by a specific date, and has an idea of how much money they are prepared to spend achieving that. By driving out a comprehensive list of stories, estimating them, and then rolling it all up into a Big Release Plan, you run the risk of focusing attention on the features (stories) and defining Success as the delivery of all these stories. This is exacerbated when the release plan is shown to people who weren’t part of the original exercise. They see the release plan and naturally assume that it is the primary “output” of the process.

In the manner of the Agile Manifesto, whilst there is value in delivering a list of features, I value achieving the outcomes of the project higher. This feels complementary to the other four agile manifesto values."

Via, Valuing Outcomes over Features.

Saturday, November 04, 2006

Decertification

Another Nail in the IT Certification Coffin ""Across all 253 skills we survey, the value of noncertified skills is growing at a rate five times greater than certification pay. And there's no sign that this is going to change any time soon," said Foote.

Certifications are losing value because employers are looking for more in their workers than the ability to pass an exam; they want business-articulate IT pros. "

Friday, November 03, 2006

Win Some, Lose Some

In, Rocking with Ruby, I made the outlandish claims that: "Continuations are going to be a key piece of infrastructure.". So maybe not, with the recent news that Seaside is going away from it: "One thing I’d like to do is reduce the dependence of Seaside on continuations - they drove a lot of the initial interest in the framework but they’re becoming (or seeming) much less important over time, and the use cases to which they’re best suited are these days often addressed with AJAX instead."

The other claim that seems to have worked out though was, "That future languages and platforms will probably be deployed on .NET and Java VMs. The competition between the two seems to have a positive impact on both - locking out any competitors. That means, there's something to look forward to in Java 7 and .NET 3." This follows the news about JRuby: "JRuby has been getting more and more attention from folks within Sun, Rubyists around the world, and especially from Java developers anxious to escape from their Java-only prisons."

Thursday, November 02, 2006

The Four Parts of Future Languages

Convergence in Language Design: A Case of Lightning Striking Four Times in the Same Place "In all four research projects, the programming language has a layered structure. In its most general form, the language has four layers."

1. "The inner layer is a strict functional language. All four projects start with this layer."

2. "The second layer adds deterministic concurrency. Deterministic concurrency is sometimes called declarative or dataflow concurrency."

3. "The third layer adds asynchronous message passing. This leads to a simple message-passing model in which concurrent entities send messages asynchronously."

4. "The fourth layer adds global mutable state. Three of the four projects have global mutable state as a final layer, provided for different reasons, but always with the understanding that it is not used as often as the other layers. In the Erlang project, the mutable state is provided as a persistent database with a transactional interface. In the network transparency project, the mutable state is provided as an ob ject store with a transactional interface and as a family of distributed protocols that is used to guarantee coherence of state across the distributed system."

When Garbage Collection isn't Enough

There is frequently an error that people make when allocating resources. Everyone seems to understand that you use a try/finally block but how to do it is usually unclear. Should you put all the resource deallocation in one finally block? How do you handle the exception that maybe thrown attempting to deallocate the resource?

I've previously advocated using a non-final assignment and then checking in the finally block if object is not null and then closing it. The better way is:

final Connection conn = ...;
try{
  ...
} finally {
  close(conn);
}

This idiom is detailed here. It gives reasons behind using this idiom and the rule of thumb: "place one "try-finally" directly after each resource allocated".

This was written in 2005 based on a previous Javalobby thread. I continue to see resource leaks caused by not closing resources correctly. I know of people who have made considerable money consulting to fix these kinds of bugs in large systems which I find fairly depressing.

Friday, October 27, 2006

Dynamic vs Static

A Semantic Web Primer for Object-Oriented Software Developers "In contrast to object-oriented systems, where objects normally cannot change their type, applications based on Semantic Web technology can follow a formal, yet dynamic typing system. RDF and OWL classes themselves are also dynamic, it is possible to create and manipulate them at runtime. For example, one could define a temporary class that is formally represented as an OWL expression and then ask the reasoner about the instances of this class. This means that reasoners can be compared to rich query answering systems. These queries can not only be asked at ontology design time, but also at execution time."

Monday, October 23, 2006

Sheeple

Happy sheep (mp3) "There are more than 110-million sheep in Australia. Almost 3-and-a-half million are exported overseas every year. In a bid to allay public concern about sheep welfare, a group of CSIRO scientists in NSW is working on a rather unusual project.

They're trying to work out how to tell if a sheep is happy."

Firstly, they have been able to train sheep to press a lever when they were discontent and they also found that sheep have a sense of time.

Now to the point, when sheep are in a paddock without shade they will just be distributed throughout the paddock. However, if they are hot (discontent) and there are trees they will obviously move to the shade. So it's difficult to tell the difference between a discontent sheep that has no other option or a content sheep. If there is an alternative and they are discontent then they will move to the better alternative.

Friday, October 20, 2006

Two Steps Further

Clearing the Air - More Languages that Suck taking the sucks and other words in combination with computer languages to the next level. Where Fotran sucks the least and Javascript sucks the most. There was also a correlation between hackiness and suckiness.

Update: Javadevelopers f*** the least "Just to add a further number to the study -- because Andrew unexplicably omitted Python -- here's the data: about 196,000 files / 200 occurences -> 980. That's the second highest result, placing it between Java and Perl (note that the higher the number, the less f***s -- I would have normalized that by taking it 1/n, but, fuck, there's always something to complain)."

Update: Using Google Code search to find the programming language most likely to drive you mad "From this we can clearly see that C is leading the pack, with TCL obviously a pretty mind-bending second place."

Tuesday, October 17, 2006

OO is a trick, functional programming is a lie and C# sucks

Good Ideas, Through the Looking Glass (PDF) "Enough has been said and written about this non-feature [goto] to convince almost everyone that it is a primary example of a bad idea. The designer of Pascal retained the goto statement (as well as the if statement without closing end symbol). Apparently he lacked the courage to break with convention and made wrong concessions to traditionalists. But that was in 1968. By now, almost everybody has understood the problem, but apparently not the designers of the latest commercial programming languages, such as C#."

"To postulate a state-less model of computation on top of a machinery whose most eminent characteristic is state, seems to be an odd idea, to say the least. The gap between model and machinery is wide, and therefore costly to bridge. No hardware support feature can wash this fact aside: It remains a bad idea for practice. This has in due time also been recognized by the protagonists of functional languages. They have introduced state (and variables) in various tricky ways. The purely functional character has thereby been compromised and sacrificed. The old terminology has become deceiving."

"...this writer believes that a more effective way to let a system make good use of parallelism is provided by object-orientation, each object representing its own behaviour in the form of a “private” process."

And on Object Oriented Programming: "Objects are records, classes are types, methods are procedures, and sending a method is equivalent to calling a procedure...Was this change of terminology expressing an essential paradigm shift, or was it a vehicle for gaining attention, a “sales trick”? "

The State of the Semantic Web

TripCom define their project as: "We will improve the ideas of Tuple Space computing by adding semantics by use of a graph-based data-model to rely on Triples. The Triple Space serves as a persistent publication system for semantically linked information in semantically clustered subspaces. We will develop a scalable and linkable Triple Space storage, based on improving and combining current RDF Stores and Tuple Space infrastructures."

I came across their "State of the art and Requirements Analysis" it mentions YARS, Jena, Sesame, 3store, Kowari, JRDF, Edutella and Oracle. They seem especially keen on YARSQL and its support for provenance. They also seem interested in extending SPARQL to support subqueries and inserting and deleting statements.

Sunday, October 15, 2006

I'm Sick of the Anti-Thor Rhetoric

The flying spaghetti monster "Why do you call yourself an atheist? Why not an agnostic?

Well, technically, you cannot be any more than an agnostic. But I am as agnostic about God as I am about fairies and the Flying Spaghetti Monster. You cannot actually disprove the existence of God. Therefore, to be a positive atheist is not technically possible. But you can be as atheist about God as you can be atheist about Thor or Apollo. Everybody nowadays is an atheist about Thor and Apollo. Some of us just go one god further.

When you're talking about God, are you really talking about the God of the Bible -- Yahweh of the Old Testament?

Well, as it happens, I am because I have an eye to the audience who's likely to be reading my book. Nobody believes in Thor and Apollo anymore so I don't bother to address the book to them. So, in practice, it's addressed to believers in the Abrahamic God."

More provactively, "My sense is that you don't just think religion is dishonest. There's something evil about it as well.

Well, yes. I think there's something very evil about faith, where faith means believing in something in the absence of evidence, and actually taking pride in believing in something in the absence of evidence. And the reason that's dangerous is that it justifies essentially anything. If you're taught in your holy book or by your priest that blasphemers should die or apostates should die -- anybody who once believed in the religion and no longer does needs to be killed -- that clearly is evil. And people don't have to justify it because it's their faith. They don't have to say, "Well, here's a very good reason for this." All they need to say is, "That's what my faith says." And we're all expected to back off and respect that. Whether or not we're actually faithful ourselves, we've been brought up to respect faith and to regard it as something that should not be challenged. And that can have extremely evil consequences. The consequences it's had historically -- the Crusades, the Inquisition, right up to the present time where you have suicide bombers and people flying planes into skyscrapers in New York -- all in the name of faith."

Precedence in SPARQL

A recent, "bug report" on the Jena mailing list was sent to me. Basically, it's saying that ARQ doesn't execute SPARQL the way that was expected and JRDF was being held up as correct.

This has to do with the "Nested OPTIONALs" problem, as described by Richard Cyganiak. It basically boils down to the fact that ARQ interprets queries in a left to right, top to bottom fashion; which is the way the SPARQL standard defines it. For example, { <pattern 1> opt { <pattern 2> opt { <pattern 3> } } } should be executed as: take the results of <pattern 1> opt <pattern 2> and then take the results and optionally match <pattern 3>.

This is contradictory to the expression: 6 / (3 - 2), where you perform what's most deeply nested first. To me SPARQL is saying the correct answer is 0 (2 - 2) rather than 6 (6 / 1).

For JRDF to meet SPARQL's requirements I'd either have to reorder the query or change to another compiler compiler, as SableCC is only an LALR parser. Again, not currently a problem for my current JRDF work but if I really want to stamp it SPARQL, when there are test cases that will fail, it will have to be fixed.

Update: This still seems an ongoing issue with the DAWG, "Issues with evaluating optional: Commutativity of AND".

Wednesday, October 11, 2006

How to Win Against Terrorists? Don't Fight a War

Fighting terrorism with justice "What is remarkable is how American policy-makers have refused to learn from the historical experiences of Great Britain, a country that faced terror threats for decades in Northern Ireland, from no surrender zealots motivated by religious and nationalist fervor. In the case of the IRA, several hundred terrorists were organized after 1978 into secret cells that could wage terror pretty much indefinitely. The IRA was embedded within a global terror network that included Marxist guerrillas in South America, ETA in Spain and foreign governments, such as Libya. Not so different from al Qaeda.

Even a passing familiarity with the history of Northern Ireland would provide us with pointers on how not to deal with Islamist terrorists. First, between 1761 and 1972 the British government passed 26 legislative acts containing features designed to combat Irish nationalists, including measures seen in the US Patriot Act such as detention without trial and the suspension of habeas corpus. In the 1970s, the war model led British security services to use harsh interrogation techniques such as hooding, subjecting suspects to loud noises, sleep deprivation, prolonged standing, slaps to the face and slow starvation of detainees. When this severe ill treatment eventually [as it always does] becomes public knowledge, it undermined the legitimacy of government counter-terror policies, and created new constituencies of sympathizers in Northern Ireland. Repressive British government policy in the 1970s-from massacre of 'Bloody Sunday' to the internment of terror suspects without trial and a 'shoot-to-kill' policy in the 1980s, opened up a wellspring of support for a militaristic and Marxist-inspired IRA, which may not have existed otherwise.

In the end, it was only in the 1980s, late 1980's, when the British government moved from a 'war on terror' model to a law enforcement model that it began to win the struggle against Irish nationalist terror groups."

A BBC article supports this: "...the most recent example happened in 1971, when the British Government introduced the internment of hundreds of republican suspects in an attempt to shut down the IRA. The tactic was abandoned four years later and is thought to have increased support for the IRA."

See previous post from 2002, "Using the American model to solve all acts of terrorism".

Agile Bigger than Jesus

Software Development: It's a Religion "But software development is, and has always been, a religion. We band together into groups of people who believe the same things, with very little basis for proving any of those beliefs. Java versus .NET. Microsoft versus Google. Static languages versus Dynamic languages. We may kid ourselves into believing we're "computer scientists", but when was the last time you used a hypothesis and a control to prove anything? We're too busy solving customer problems in the chosen tool, unbeliever!"

Tuesday, October 10, 2006

Robert Glass

Author Interview: Robert Glass "If you had to sum up today's state of the art from the perspective of someone who experienced software development in the sixties, seventies, and eighties, what would you say are our best and worst traits.

Bob:Best traits? The depth and quality of available tools. The Agile belief in people over process. The Open Source focus on fun over duty.

Worst traits? The "us vs. them" mentality which causes today's programmers to see themselves as a separate and competing breed from yesterday's programmers. The tendency to reinvent wheels. The belief in Agile processes as being good for all problems. The hyped belief in Open Source as the best of all possible ways of building software."

Developer.* has some of his work, "Software Maintenance is a Solution, Not a Problem", "Success/Failure Criteria: Some Surprises", and "The Many Flavors of Testing".

Egomania Itself

"How many 2-word anagrams of "Agile Manifesto" do you think there are?" Follow up to Good Agile, Bad Agile.

"In any case, the whole discussion about whether Google's approach is viable for tech companies in other domains is a red herring. Most companies don't use Agile Methodologies, or if they do, it's only a handful of teams, maybe 10% or fewer, I'd guess. At least it's true at the companies I know lots of people from - Sun, Microsoft, Yahoo, Amazon, Google, Blizzard, and other places like them: industry leaders who write kick-ass software. They do it almost entirely without Agile. It's not just Google. It's everyone."

Friday, October 06, 2006

WTF

It's easy to prove what rules vs what sucks using Google. So using the new Code Search it's also easy to find out which language is most likely to produce the word "fuck" is in the code - the ultimate test of how good a language is (or how well things are going).
So here it is (higher the better):

C++ ~835,000 files / ~2000 fucks = 41.75.
Java ~766,000 files / ~500 fucks = 1532.
Ruby ~254 files / ~60 fucks = 4.2.
Perl ~186,000 files / ~400 fucks = 465.
PHP ~195,000 files / ~2000 fucks = 97.5.
Lisp ~400 files / ~100 fucks = 4.
Scheme ~400 files / ~50 fucks = 8.

Wednesday, October 04, 2006

Benefits of Spicy Languages

"Functional Programming For The Rest of Us" seems to pretty much cover most topics that I've come across with functional programming, except for monads. There's also an interesting presentation by Shriram Krishnamurthi about Web programming and the suitability of some ideas from functional programming (continuations mainly) and the applicability of the MVC pattern. He also states that languages should incorporate things that people have to do over and over again like garbage collection.

"Since every symbol in FP is final, no function can ever cause side effects. You can never modify things in place, nor can one function modify a value outside of its scope for another function to use (like a class member or a global variable). That means that the only effect of evaluating a function is its return value and the only thing that affects the return value of a function is its arguments.

This is a unit tester's wet dream. You can test every function in your program only worrying about its arguments. You don't have to worry about calling functions in the right order, or setting up external state properly."

"A functional program is ready for concurrency without any further modifications. You never have to worry about deadlocks and race conditions because you don't need to use locks! No piece of data in a functional program is modified twice by the same thread, let alone by two different threads."

"An interesting property of functional languages is that they can be reasoned about mathematically. Since a functional language is simply an implementation of a formal system, all mathematical operations that could be done on paper still apply to the programs written in that language."

"Most people I've met have read the Design Patterns book by the Gang of Four. Any self respecting programmer will tell you that the book is language agnostic and the patterns apply to software engineering in general, regardless of which language you use. This is a noble claim. Unfortunately it is far removed from the truth.

Functional languages are extremely expressive. In a functional language one does not need design patterns because the language is likely so high level, you end up programming in concepts that eliminate design patterns all together. Once such pattern is an Adapter pattern (how is it different from Facade again? Sounds like somebody needed to fill more pages to satisfy their contract). It is eliminated once a language supports a technique called currying."

Tuesday, October 03, 2006

SPARQL to SQL

Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns "...we proposed: (1) an efficient algorithm, BGPtoSQL, that generates an SQL query equivalent to an input SPARQL basic graph pattern to be evaluated over the triple store, and (2) a generic query translation strategy and an efficient algorithm, SPARQLtoSQL, that translates an input SPARQL query with a basic graph pattern or an optional graph pattern to an equivalent SQL query. To our best knowledge, our algorithmic solution for the optional graph pattern query mapping problem is the first published in the literature."

The Common Misunderstanding of Design Patterns

A little while ago, I responded to Andrae talking about how patterns were just language defects. As I pointed out, "Design patterns are not reliant on OO, Java, C++ or any particular language or programming paradigm." and as an example the singleton pattern disappears when you use Spring et al.

Similarly, a much more popular blogging group are talking about the same thing with "Ralph Johnson on design patterns" and one of the author's of "Design Patterns" response, "Design patterns and language design".

Mark Dominus writes,

"If you're a language designer, and a "pattern" comes to your attention, then you have a great opportunity. The programmers using your language have a recurring problem. They have to implement the same solution to it, over and over. Clearly, this is a good place to try to expend some design effort; perhaps you can trade off a little simplicity for some functionality and fix the language so that the problem is a problem no longer.

Getting rid of one recurring design problem might create new ones. But if the new problems are operating at a higher level of abstraction, you may have a win. Getting rid of the need for "subroutine call" pattern in assembly language opened up all sorts of new problems: when and how do I do recursion? When and how do I do coroutines?"

And Ralph Johnson,

"No matter how complicated your language will be, there will always be things that are not in the language. These things will have to be patterns. So, we can eliminate one set of patterns by moving them into the language, but then we'll just have to focus on other patterns. We don't know what patterns will be important 50 years from now, but it is a safe bet that programmers will still be using patterns of some sort."

So this goes to the heart of what good design is. Putting in double dispatch or multiple inheritance is not necessarily an overall win in language design whereas something like closures may be. But like most things, especially related to design, proficient users are usually too close to the problem to make the right call.

Friday, September 29, 2006

Explaining Continuations

Continuations, functions and jumps Explaining that a function call consists of two jumps: one from the caller to the callee and the one back to the caller. Continuation passing style is then shown how it effects the semantics of a language.

With Infinite Money

Life at Google and the Talent Myth "I've talked to people who've come to Microsoft from Google (e.g. Danny Thorpe) and it definitely is as chaotic as it sounds there. For some reason, the description of life at Google by Steve Yegge reminds me a bit of Microsoft where there were two huge money making projects (Office & Windows in the case of Microsoft and AdWords & AdSense in the case of Google) and then a bunch of good to mediocre projects full of smart people dicking around. Over the years I've seen a reduction of the 'smart people dicking around' type projects over here and more focus on shipping code."

Someone seems to have forgotten that real artists ship.

I read the original article, "Good Agile, Bad Agile" expecting indepth analysis of Agile (or agile) methods. Instead it says things like, "Most of us in our industry are date-driven. There's always a next milestone, always a deadline, always some date-driven goal to it.

The only exceptions I can think of to this rule are:

1) Open-source software projects.
2) Grad school projects.
3) Google."

All of these cases are without time or money constraints.

Another post, "Stupid is as stupid does", says their methodology explains the continual betaness of Google, "Unasked by Joel, and left unexplained by Steve: everything at Google stays in beta, pretty much forever. Hmm. Why do you suppose that is? Well, you get a bunch of "really smart" people together, don't put any product/project management together, and let them move around at will... what do you get? You get a bunch of projects that end up being 80% done (i,e., all of the technically "interesting" pieces are done, but that boring "polish" stuff isn't).".

Monday, September 25, 2006

JRDF SPARQL Performance

I did some performance gathering of JRDF's implementation of SPARQL using some of the FOAF data from Mindswap.

Average for Query 1, 100,000 triples:
* Jena (ARQ 0.9.2) - 14685 ms
* JRDF and JRDF using Tuple Subsumption - 3652 ms

There is only one UNION implementation in JRDF.

Average for Query 2, 100,000 triples:
* Jena (ARQ 0.9.2) - 22872 ms
* JRDF - 6615 ms
* JRDF using Tuple Subsumption - 3750 ms

Average for Query 3, 100,000 triples:
* Jena (ARQ 0.9.2) - 15306 ms
* JRDF - 8019 ms
* JRDF using Tuple Subsumption - 4780 ms

The point is not that it was faster that Jena (although yay!), it's that the relational optimisations had a positive effect on querying speed. The current downloadable version of the JRDF SPARQL GUI (0.2) has very slow versions of hash code and equals methods for performance sensitive classes like AttributeValuePair. It still showed the benefit of the optimisations but made it 10 times slower to answer the queries. The modified version of these classes is only available in the JRDF subversion repository.

Friday, September 22, 2006

Ultimate Mashup

Showing how prescient Douglas Adams was (and well researched - Vanaver Bush and Ted Nelson get nods for example), Hyperland much like Apple's Knowledge Navigator with Tom Baker as the software agent (all agents must have bow ties apparently) via A Hitchhiker’s Guide to the Semantic Web.
Aperture is now open sourced, it crawls file systems, web sites and mailboxes, extracts the metadata and provides a GUI for searching. Via, Aduna goes open source and Aperture to the Semantic Desktop.
The ultimate mashup -- Web services and the semantic Web, Part 3: Understand RDF and RDFs.

Thursday, September 21, 2006

A Better Way

There's a couple of paper's I've been reading about fusing functional programming and relational model and the influential "Can Programming Be Liberated From the von Neumann Style? A Functional Style and its Algebra of Programs".

Also we have Dijkstra's, "On the cruelty of really teaching computing science". Firstly, he states that people are by nature conservative and often use previous experience as the basis to learn something new, "...radical novelties are so disturbing that they tend to be suppressed or ignored, to the extent that even the possibility of their existence in general is more often denied than admitted."

So computing is generally quite different but people continue to use the wrong metaphors and analogies: "The practice is pervaded by the reassuring illusion that programs are just devices like any others, the only difference admitted being that their manufacture might require a new type of craftsmen, viz. programmers. From there it is only a small step to measuring "programmer productivity" in terms of "number of lines of code produced per month". This is a very costly measuring unit because it encourages the writing of insipid code, but today I am less interested in how foolish a unit it is from even a pure business point of view. My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

Besides the notion of productivity, also that of quality control continues to be distorted by the reassuring illusion that what works with other devices works with programs as well. It is now two decades since it was pointed out that program testing may convincingly demonstrate the presence of bugs, but can never demonstrate their absence. After quoting this well-publicized remark devoutly, the software engineer returns to the order of the day and continues to refine his testing strategies, just like the alchemist of yore, who continued to refine his chrysocosmic purifications."

The solution, is teaching students correctly, that the basics is not to be found Pascal or C or Java but in logic and mathematics: "Right from the beginning, and all through the course, we stress that the programmer's task is not just to write down a program, but that his main task is to give a formal proof that the program he proposes meets the equally formal functional specification. While designing proofs and programs hand in hand, the student gets ample opportunity to perfect his manipulative agility with the predicate calculus. Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs. And this concludes the sketch of my proposal for an introductory programming course for freshmen."

Monday, September 18, 2006

Unicode Maths Font

Does Your Browser Support Multi-language? is potentially the only available source of Miscellaneous Mathematical Symbols-A which has the left outer join symbol. I downloaded about half a dozen different fonts and none of them had them. Many thanks to Simon.

Sunday, September 17, 2006

Identity with UUID

Don't Let Hibernate Steal Your Identity "This example uses the object id as the definition of equals() and to derive hashCode(). This is much simpler. However, to make this work we need two things. First, we need a way to ensure every object has an id even before it is saved. This example assigns the id a value as soon as the id variable is declared. Second, we need a way to determine if this is a newly created object, or a previously saved object. In our original example, Hibernate checked whether the id field was null to determine if the object was new. Obviously this won't work anymore since our object id is never null. We can easily solve this by configuring Hibernate to check whether the version field, rather than the id field, is null. The version field is a much more appropriate indicator of whether your object has been previously saved."

"Furthermore, the new definition of equals() and hashCode() is universal for all objects that contain an object id. That means we can move those methods to an abstract parent class. We no longer need to re-implement equals() and hashCode() for every domain object, and we no longer need to think through which combination of fields is both unique and immutable for each class. Instead, we simply extend the abstract parent class. Of course, we don't want to force our domain objects to extend from a parent class, so we'll also define an interface to keep things flexible."

"We now have a simple and effective way to create domain objects. They extend AbstractPersistentObject, which automatically gives them an id when they're first created and properly implements equals() and hashCode(). They also get a reasonable default implementation of toString() that they can optionally override. If this is a test object or an example object for a query-by-example the id can be changed or set to null. Otherwise it should not be altered. If for some reason we need to create a domain object that extends some other class, it can implement the PersistentObject interface rather than extend the abstract class."

DS Hacks

* Frodo c64 emulator for the DS.
* Doom.
* Homebrew for DS.
* DSLinux.
* M3 Adapter SD Version Slim + Passcard 3.
* 40GB hard drive.

Friday, September 08, 2006

Iron VMs

Will It Be JRuby vs. IronPython? "On the heels of Microsoft's release of IronPython 1.0 comes the news that Sun has hired the two primary JRuby developers.

I don't know if Sun intended it, but the juxtaposition of these two news item gives the effect that the two managed execution environments—The JVM and .NET—have each chosen an anointed champion scripting language for their platform.

The orphan that's left out of the spotlight is Jython. I feel kind of sad for Jython and its lone publicly known active maintainer Frank Wierzbicki."

JRuby Steps Into the Sun "The potential for Ruby on the JVM has not escaped notice at Sun, and so we'll be focusing on making JRuby as complete, performant, and solid as possible. We'll then proceed on to help build out broader tool support for Ruby, answering calls by many in the industry for a "better" or "smarter" Ruby development experience."

Thursday, September 07, 2006

The Myth of Web 2.0

All We Got Was Web 1.0, When Tim Berners-Lee Actually Gave Us Web 2.0 is a reference to an interview of Tim Berners-Lee, "Yes, the original vision of Berners-Lee is now apparently happening, so he's right in a sense there while glossing over the reality of the early Web. But though his vision was largely possible since the advent of the first forms-capable browser, at first we only got what we could call "Web 1.0"; simple Web sites that were largely read-only or at least would only take your credit card. The essential draw of mountains of valuable user generated content just wasn't there. And the millions of people with the skills and attitudes weren't there either. Even the techniques for making good emergent, self-organizing communities and two-way software were in their very infancy or were misunderstood. An example: How long did it take the lowly editable Web page (aka wikis) to be popular and widespread? Nearly a decade. The fact is, most of us know that innovation is all too likely to race ahead of where society is. I run into folks from Web 1.0 startups fairly often that bitterly complain about how they were building Web 2.0 software in 2000, but nobody came."

Tim also mentions those boring things like the Semantic Web, SPARQL and Enquire (the first browser which was an editor and browser).

Wednesday, September 06, 2006

The 7th dimension is infinity

Possible Worlds: The Fifth Dimension So it starts at 0 so I think the 5th dimension is labelled "4" on the presentation, the 6th dimension is actually all possible worlds. There are 11 dimensions but it says the 10th. Using Wikipedia's definition of dimension the 0 dimension is the first defined dimension.

Relational.OWL

Relational.OWL - A Data and Schema Representation Format Based on OWL "In this paper we introduce a Web Ontology Language (OWL)-based (Miller & Hendler 2004) representation format for relational data and schema components, which is particularly appropriate for exchanging items among remote database systems. OWL, originally created for the Semantic Web enables us to represent not only the data itself, but also its interpretation, i.e. knowledge about its format, its origin, its usage, or its original embedment in specific frameworks.

Hence, remote databases are instantly able to understand each other without having to arrange an explicit exchange format - the usage of OWL on both sides is sufficient. This would be impossible using present XML formats."

"Bizer introduced in (Bizer 2003) a mapping language between relational data and RDF, particularly between specific relational query results and RDF. Contrary to our approach, D2R MAP converts the stored data into ”real” RDF objects, i.e. an address would be represented as a RDF address ob ject. This approach takes into account, that the original database cannot be reconstructed using this kind of data representation anymore, since it does not contain information concerning the original schema of the database. As a result, the data represented with the D2R MAP language looses its relationship to the original database. Tracing the data to its original storage position is thus hardly possible."

Homepage, other papers (such as "Database to Semantic Web Mapping using RDF Query Languages" and the good "Bringing Relational Data into the Semantic Web using SPARQL and Relational.OWL") and documentation with beta versions of the software written in Java (uses Jena).

D2R Server 0.3 was release recently too.

Monday, September 04, 2006

Modular Predicate Dispatch

Tom's post on interfaces reminded me of a paper and software I looked at in 2004 that implemented multiple dispatch using predicate inference in order to execute the correct method on a class. It was interesting at the time as it seems like a natural evolution of propositional logic technologies including things like the Semantic Web. Not to mention an implementation of the Visitor pattern that was quite neat.

A new paper released early this year, "Modularly Typesafe Interface Dispatch in JPred", in which they talk about the way they've successfully added support for predicate dispatch on interfaces. This is much better than most previous approaches, as far as I can tell. It's interesting that the type inferencing in their compiler was able to help debug the compiler itself.

They make reference to the Multijava project which adds open classes and symmetric multiple dispatch - but is limited to using classes and not interfaces like JPred originally was. Open classes are a way of getting some of the benefits that the Visitor pattern without modifying as much code. More details are in this paper.

The code looks like:

class C {
  void m(Object o) {
    System.out.println("got a C and an Object");
  }
}
class D extends C {
  void m(final Object o) {
    System.out.println("got a D and an Object");
    super.m(o);
    this.resend(o);
  }
}

The 2006 paper also references, "featherweight Java", which seems to have opened up a whole range of academic possibilities.