Thursday, November 30, 2006

Different Strokes

I feel I should make a correction to comments I made in regards to the proposed SPARQL algebra.

In that post and in my thesis I've said that the definitions of OPTIONAL by Pérez et al (and used by Andy Seaborne) are compatible with the relational definitions I used. I'm now about as certain as I can be that that's not the case.

The definition in Pérez et al is for set difference is (in ASCII):
O1 \ O2 = {u <- O1 | for all u' <- O2, u and u' are not compatible}.

The key here is the definition of compatibility, I didn't find it clear in the original paper (my fault and I'll explain a bit more about this later) but it's explicit in "Semantics of SPARQL" where it says, "Two mappings with disjoint domains are always compatible, and the empty mapping u0 is compatible with any other mapping. Intuitively, u1 and u2 are compatibles if u1 can be extended with u2 to obtain a new mapping, and vice versa."

A relational join is defined as the set union of the headings and matching values. The heading consists of the attributes (X, Y, Z), where X is the attributes of the left hand side relation, Y contains the attributes that match the left and right hand side relations and Z are the attributes of the right hand side relation. The body consists of tuples with the values matching X, Y, and Z (Date writes it as { X x, Y y, Z z}). I think that this is compatible with the Pérez et al definition.

But it's different for difference. The difference operator in relational algebra requires the relations to be the same type to be removed (this is the definition I've used). I'll use "\" to represent SPARQL's set difference and "-" to represent the relational difference.

For example:
{{ (?x = 2, ?y = 3) }} \ {{ (?y = 3) }} is {} in SPARQL
{{ (?x = 2, ?y = 3) }} - {{ (?y = 3) }} is {{ (?x = 2, ?y = 3) }} in the modified Galindo-Legaria relational algebra that JRDF uses.

The reason it is unchanged in relational algebra is because the definition of equality requires them to be the same type (the same attributes). So a binding with one value can't be equal to a binding with two values. The Pérez et al definition is a much looser definition and many more matches are made.

As an aside, I was fairly sure that set difference was a requirement for OWL inferencing. I might be wrong here too. If it is the case, though, I'd think it'd be nice (an efficienct use of operations) if SPARQL operations could be reused for OWL ones. This would allow the difference operator by itself to be expressed in SPARQL too.

So I think I understand the issue now. At the moment I'm trying to work out how JRDF gets what seems to be the right results because it doesn't really look like it should. The other thing that was in my paper was a mapping to SQL and I'm not sure how a SQL based SPARQL implementation would work either (based on the definitions I used). Lots of questions anyway.

I am trying to work out whether this definition of compatibility is a good one. At the moment I don't like it or dislike it because I don't understand yet.

The trigger was the removal of the term antijoin from the SPARQL algebra - that is a good idea because I understood it to mean it was compatible with relational algebra (using the same terms I guess). Up until that point I'd assumed difference in SPARQL was compatible with difference in relational algebra.

Update: Paul mentions in the comments, that the SPARQL difference operator is the one needed for OWL. Another win for that definition then too.

It's apparent that the relational model really isn't appropriate to model SPARQL in. I don't think there's a single definition of an operation in relational algebra that hasn't been changed.

Update 2: I've since come to realize that joins are not compatible either. This is due to the suggestion that you can join on NULL values. This contradicts relational algebra (taking either the Date or Codd approach on NULL) and SQL joins (not that I think that's worth much but it's easy to demonstrate).

Update 3: Last update for this blog I hope. While relational difference is not equivalent, antijoin is equivalent. Which is what I was told when antijoin was renamed diff in first place. More information, "The Difference is Antijoin".

Tuesday, November 28, 2006

B, b, b boca gives you an enterprise ready RDF store

IBM Semantic Layered Research Platform announced the release of Boca: "Boca is designed to make it possible to build multi-user, distributed RDF applications" Supports named graphs, security, replication, revision history and JMS notification of changes.

The users guide lists some other interesting features such as a client stack with "a fair amount of compatibility with HP's Jena API" and text indexing using Lucene. Based on the configuration it looks like it requires DB2 or Apache Derby as well as Java 5.

Via, IBM SLRP Release.

Wednesday, November 22, 2006

Another SPARQL Algebra

A couple of days ago Andy Seaborne posted a link to The SPARQL Algebra, "The SPARQL Algebra defines the semantics of a SPARQL query execution. The algebraic expression is derived from a query string by parsing and transforming the abstract syntax tree. The result of a query can then be calculated by the evaluation rules. This gives the correct results of query -- it does not imply the actual execution of a query must be performed in this manner."

This seems influenced by Jorge Pérez's work (or least it shares a common basis as the definitions are very similar) and looks like it's compatible with the stuff that I did (it even has antijoin in there).

And in other SPARQL news, Danny Ayers has published (based on discussions between Max Völkel and Richard Cyganiak) a SPARQL Update Language for insertion and deletion of triples.

Update: Andy Seaborne has posted about his work on this algebra, a version of ARQ is available which implements this algebra (it's post 1.4). The use of the FILTER command is also discussed. Via SPARQL will be formalized as an algebra.

Tuesday, November 21, 2006

The Search for the Levitating Super-Turtle

Atheists: The New Gays "Prior to 9/11, it would have been career suicide for a public figure to come right out and say God is a fairy tale. Now it’s a feature of popular culture. You can see it on cable of course, in shows such as BullSh*t, Real Time, The Daily Show, and Southpark. But it’s also a feature of network TV. The main character on House is written as the most brilliant human on the planet, and he’s an atheist."

"Ask a deeply religious Christian if he’d rather live next to a bearded Muslim that may or may not be plotting a terror attack, or an atheist that may or may not show him how to set up a wireless network in his house. On the scale of prejudice, atheists don’t seem so bad lately."

A good preview of Dawkins book is here.

I've recently finished reading both "The God Delusion" and "The Goldilocks Enigma" and I found both books quite good. Seeing as though so many people commented last time I thought I'd post a bit more about what I think this time.

Much like the disappointment of revisiting old television shows of your youth, Dawkins' book is great for pointing out how truly bad those stories taught at Sunday school were, including Noah, Lot and Abraham. Although I must admit, even as a child I found the story of the flood and "The A-Team" both rather unbelievable. There are other interesting topics, like coming up with morals without religion, but I think these are better covered elsewhere.

One of the main things I got out of this book is that progress is about conscious raising. Most improvements have come about when a society becomes aware of a problem and goes about trying to solve them. Historically this includes human rights, more recently global warming and ones to fully take hold like animal rights. The other thing I got out of it is that I don't have as many problems with religion as Dawkins.

I found Dawkins the least convincing when he diverges from his areas of expertise especially when he tries to cover cosmology. This is especially apparent when you compare his counter argument against teleology (things look like they were designed therefore there must be a designer). Dawkins explanation in relation to biology is clear and concise but for cosmology its rather glossed over and there seems to be a bit of hand waving. He doesn't provide a good argument why evolution on a universal scale is well founded. This is where Paul Davies' book provides some better arguments for a rational creation of the universe.

Davies is actually a little bit more open to the idea of God than Dawkins which, when he chooses a different explanation, makes his arguments more convincing. The possible explanations of the universe he discusses include: absurd (no real cause), unique (there are no free parameters for the universe to be the way it is), the multiverse (String theory), intelligent design (God or Gods), the life principle and the self explaining universe. He says he prefers the latter two explanations. I found the most interesting explanation given is the self explaining universe. It uses quantum mechanics, casual loops and the requirement for the universe to understand itself.

The last chapter of the book is certainly the best and I wish he spent the whole book on the ideas in it instead. His description of the infinite regress as the levitating super-turtle is great. He also describes how Platonism is incorrect, especially at the beginning of the universe, and how the laws of physics have emerged over time.

Thursday, November 16, 2006

JRDF GUI 0.3 Released

An important bug in JRDF's SPARQL project operation was found so I've released JRDF GUI 0.3 to fix it. There are also a couple of minor enhancements to do with some code cleanup including an increase in speed (about 25% for some queries). I've also started testing it against the new DAWG test cases and the relevant OPTIONAL and UNION ones have passed so far (I haven't tested them all).

RDF and OWL are Yahoo's Secret Weapons

Semantic Web Hype "At just over 1 year into my work at Yahoo! we have reached the first use in production of OWL and RDF in a small way and are looking at further steps, none of which I’m revealing...we should show how you don’t need to boil the ocean of semantics such as applying a big pile of technologies or requiring something fragile like a single web-wide ontology - wrong, wrong wrong. Start from concrete data-centric approaches that build up to use layers of technology solutions to different problems as they emerge, only if needed and demonstrating usefulness at each stage."

Web 2.0 is the new Applets

Web "Me2.0" -- Exploding the Myth of Web 2.0 "Web 2.0 is a myth -- there is no Web 2.0...I've seen this before -- it happened just over 10 years ago in the early days of Java applets. I should know, I launched http://www.gamelan.com -- which was THE portal for Java apps. Well guess what -- 10 years later, what remains of all those Java applets? Not the hundreds of thousands of little applets that people made (even though some were actually quite wonderful). No. They are almost all gone. Instead, what really survived and continues to grow is the Java platform itself, and large Java application platforms and development tools. That's where the real value is."

Nova has a bunch of interesting postings recently including: "Web 3.0 Versus Web 2.0", "Does the Semantic Web = Web 3.0? and "New York Times Article About the Emerging Semantic Web" (all about the recent NY Times article on the Semantic Web, the hype and misconceptions), "What is the Semantic Web, Actually?" and "The Meaning and Future of the Semantic Web".

Wednesday, November 15, 2006

Simplicity, Good Design and Refactoring

Questions for Paul Graham - Simplicity and beauty in art, science and programming "Perhaps we're at a point where art and science and programming diverge. Science (often) aims at finding the simplicity behind the apparent complexity of the universe. Engineering usually aims at efficient solutions, excluding the extraneous which introduces cost and more paths to failure. Art doesn't always aim at simplicity. It just as frequently tries to expose the complexity of what looked simple. Thus, perhaps the union of art, science and engineering maintained by Paul's essay isn't fundamental, although there are certainly historical periods in which they align."

Video of interview of Paul Graham and original article Taste for Makers.

Tuesday, November 14, 2006

The Classpath Exception

So I had almost exactly the same discussion with someone today about the GPL and Java that David Wood is alluding to.

Tim Bray has the answer: "Unmodified GPL2 for our SE, ME, and EE code. GPL2 + Classpath exception for the SE libraries. Javac and HotSpot and JavaHelp code drops today. The libraries to follow, with pain expected fighting through the encumbrances. Governance TBD, but external committers are a design goal. No short-term changes in the TCK or JCP.".

Friday, November 10, 2006

Mocking the Inspector

TDD Anti-Patterns "The Mockery - Sometimes mocking can be good, and handy. But sometimes developers can lose themselves and in their effort to mock out what isn’t being tested. In this case, a unit test contains so many mocks, stubs, and/or fakes that the system under test isn’t even being tested at all, instead data returned from mocks is what is being tested."

"The Inspector - A unit test that violates encapsulation in an effort to achieve 100% code coverage, but knows so much about what is going on in the object that any attempt to refactor will break the existing test and require any change to be reflected in the unit test."

Semantic Web 2.0

Some recent articles about people discussing the same ideas as the Semantic Web. The Great Database in the Sky "He went on to describe his vision of a skype for database access, combining my data, your data and public data into the next generation OLAP, running a trillion transactions per day. An example could be weather data and he asked what if you could run a SQL statement across all the data sources in the world"

"This is where it became evident that there is a deep disconnect between the traditional database community and the semantic web community. Mårten’s response was rather vague, that this wasn’t as broad as the semantic web and that the semweb includes unstructured data so wasn’t appropriate."

CEO of MySQL "Invents" the Semantic Web! "I have to say, his talk was both a validation of what we have all been working towards, and as Ian Davis explains, it is also a clear sign that the W3C and the Semantic Web community have not found a way to get the message accross."

And moving data around, owning your data seems to be another aspect of the Semantic Web overlooked.

WEB 2.0: Google CEO: Take your data and run "The more we can, for example, let users move their data around, never trap the data of an end user, let them move it if they don't like us, the better."

And my mind boggles at the idea of taking the proposed Australian Access card and their integration problems and fusing it with RDF. Some interesting points: "...the Access Card will be owned by the cardholder and not by the issuer...The effect of the issuer retaining ownership is that they control the card and the purpose for which it is used."

"In Centrelink alone we have a massive 275 kilometres of files...Medicare has to measure its records in a similar way. They have more than 3 square kilometres of storage space for forms with signatures."

"We collect, and almost never reuse, this information."

"The new card will finally put an end to this waste of time. We will be able to reuse the information that you have given us before, but only for the purposes for which you gave it to us. We can then pre-populate forms and take a lot of the pain out of the claim process."

Thursday, November 09, 2006

Mr Sparkle

Just posted, "Applying the relational model to SPARQL" to the DAWG comments list. It describes the work done adding SPARQL support (or small bits of it anyway) to JRDF. A HTML version to follow.

Direct link (PDF ~500K).

BTW, if anyone notices any grammatical errors please let me know and I'll fix them right up. I get to the point where I can't read what I've written anymore so there's bound to be a few errors.

Update: I've updated it with some small typos fixed, formatting and the example relations (some conversions to S1 -> Supplier1 that I missed) were fixed up.

Update 2: The HTML version is now available. Many thanks to Eric Prud'hommeaux for the work he did converting the original PDF.

Wednesday, November 08, 2006

Languages - Open vs Closed World

What did Javascript have that Java did not? "Languages based on static inheritance and typing are good for building complex silo (i.e. closed world) based applications. However, a global scale Architecture of Participation requires more dynamic structures like that found in prototype based inheritance and dynamic typing. In such a massively open world, the distinction between metadata, configuration and instances is simply impossible to pin down into well defined classes and configuration files."

"Javascript by its design is fundamentally messy, however that is its advantage over Java. The path to any sanity may just be what Google has shown, when building silo apps, hide the messy details under a clean well defined Java facade. Never forget though, that these facades are abstractions that leak. Always afford your applications the ability to escape into the Web and Javascript when necessary."

Bruce Tate has recently had a new article published, "Crossing borders: Delayed binding": "The more you dig into type and binding strategies, the more you find that waiting until run time to bind to an invocation or type fundamentally changes the programming process, opening a whole new world of possibilities. True, you find less safety. But you also find less repetition, more power, and more flexibility with fewer lines of code."

Why Semijoin is Better than Join

Query Processing in a System for Distributed Databases (SDD-1) "We prefer semijoins to joins for three reasons. First, Ri(A = B) Rj subsumes Ri, and so semijoins monotonically reduce the size of the database. By contrast, joins can increase the size of the database; in the worst case...Second, semijoins can be computed with less intersite data transfer than joins...we need only transmit a projection of a relation...the third advantage of semijoins is that the “reductive effect” of any single join can be attained by two semijoins, usually at lower cost..."

This is one of the first times, in many years, where I had to resort to using another search engine other than Google (Yahoo) to find this paper.

Monday, November 06, 2006

Tupelo, Tupelo

Tupelo is a project to create a content repository to support data-driven scientific applications. More specifically it support JSR 170 (Content Repository API), WebDAV and URIQA (which are the extended HTTP verbs for getting RDF graphs). It implements a per object ACL and version control system.

It uses Kowari, 3Store, and Jena as backing stores. It also has a rather interesting API for dealing with triples called Context (more information in the cookbook). There are some smaller things too like using 1.5's varargs to add triples, transitive closures on queries, object to resource mapping and something that I'd meant to do for JRDF a while ago, have a ResourceVisitor that actually visits resources.

The presentation lists similar projects: SRB, Slide/SAM, Fedora, DSpace and Jackrabbit. All of which (except for SRB) I'm pretty familar with.

And ActiveRDF, a Ruby API for accessing RDF, went 1.0!

Sunday, November 05, 2006

A Perfectly Functional Blog

So now I can get my fill of functional rants complete with how it is applicable to every possible situation, Tony Morris’ blog.

Update: The first bit of content is up now, "Have you ever wanted to do this?". Which is all about how easy it is to redeclare equality in Haskell vs Java or .NET.

Quick links

The Fifth Element

Outcomes over Features - the fifth agile value? "An agile project – in fact any project – should focus on a set of outcomes. How we get there is less important than actually getting there. A CIO or business sponsor wants to solve a specific problem, by a specific date, and has an idea of how much money they are prepared to spend achieving that. By driving out a comprehensive list of stories, estimating them, and then rolling it all up into a Big Release Plan, you run the risk of focusing attention on the features (stories) and defining Success as the delivery of all these stories. This is exacerbated when the release plan is shown to people who weren’t part of the original exercise. They see the release plan and naturally assume that it is the primary “output” of the process.

In the manner of the Agile Manifesto, whilst there is value in delivering a list of features, I value achieving the outcomes of the project higher. This feels complementary to the other four agile manifesto values."

Via, Valuing Outcomes over Features.

Saturday, November 04, 2006

Decertification

Another Nail in the IT Certification Coffin ""Across all 253 skills we survey, the value of noncertified skills is growing at a rate five times greater than certification pay. And there's no sign that this is going to change any time soon," said Foote.

Certifications are losing value because employers are looking for more in their workers than the ability to pass an exam; they want business-articulate IT pros. "

Friday, November 03, 2006

Win Some, Lose Some

In, Rocking with Ruby, I made the outlandish claims that: "Continuations are going to be a key piece of infrastructure.". So maybe not, with the recent news that Seaside is going away from it: "One thing I’d like to do is reduce the dependence of Seaside on continuations - they drove a lot of the initial interest in the framework but they’re becoming (or seeming) much less important over time, and the use cases to which they’re best suited are these days often addressed with AJAX instead."

The other claim that seems to have worked out though was, "That future languages and platforms will probably be deployed on .NET and Java VMs. The competition between the two seems to have a positive impact on both - locking out any competitors. That means, there's something to look forward to in Java 7 and .NET 3." This follows the news about JRuby: "JRuby has been getting more and more attention from folks within Sun, Rubyists around the world, and especially from Java developers anxious to escape from their Java-only prisons."

Thursday, November 02, 2006

The Four Parts of Future Languages

Convergence in Language Design: A Case of Lightning Striking Four Times in the Same Place "In all four research projects, the programming language has a layered structure. In its most general form, the language has four layers."

1. "The inner layer is a strict functional language. All four projects start with this layer."

2. "The second layer adds deterministic concurrency. Deterministic concurrency is sometimes called declarative or dataflow concurrency."

3. "The third layer adds asynchronous message passing. This leads to a simple message-passing model in which concurrent entities send messages asynchronously."

4. "The fourth layer adds global mutable state. Three of the four projects have global mutable state as a final layer, provided for different reasons, but always with the understanding that it is not used as often as the other layers. In the Erlang project, the mutable state is provided as a persistent database with a transactional interface. In the network transparency project, the mutable state is provided as an ob ject store with a transactional interface and as a family of distributed protocols that is used to guarantee coherence of state across the distributed system."

When Garbage Collection isn't Enough

There is frequently an error that people make when allocating resources. Everyone seems to understand that you use a try/finally block but how to do it is usually unclear. Should you put all the resource deallocation in one finally block? How do you handle the exception that maybe thrown attempting to deallocate the resource?

I've previously advocated using a non-final assignment and then checking in the finally block if object is not null and then closing it. The better way is:
final Connection conn = ...;
try{
...
} finally {
close(conn);
}
This idiom is detailed here. It gives reasons behind using this idiom and the rule of thumb: "place one "try-finally" directly after each resource allocated".

This was written in 2005 based on a previous Javalobby thread. I continue to see resource leaks caused by not closing resources correctly. I know of people who have made considerable money consulting to fix these kinds of bugs in large systems which I find fairly depressing.