Thursday, April 24, 2008

Update from WWW2008

The HCLS workshop was very good. I especially enjoyed Mark Wilkinson's talk about BioMody 2.0 (very Larry Lessig-esque) and Chris Baker's. There's some definite interest from a number of people about my talk too.

The keynote of the first day was from the Vice President of Engineering at Google, Kai-Fu Lee. In my talk I said that IBM had noted that scale-out architecture gives you a 4 times performance benefit for the same cost. He said that Google gets around 33 times or more from a scale-out architecture. The whole cloud thing is really interesting in that it's not only about better value but about doing things you just can't do with more traditional computing architectures. The number of people I've overheard saying that they haven't been able to get their email working because they're using some sort of client/server architecture is amazing. I mean what's to get working these days when you can just use GMail?

The SPARQL BOF was interesting as well (Eric took notes). The time frame seems to be around 2009 before they get started on SPARQL the next generation. What sticks out in my mind is the discussion around free text searching - adding something like Lucene. There was also aggregates, negation, starting from a blank node in a SPARQL query and transitive and following owl:sameAs. I was pretty familiar with all of these so it was interesting just to listen for a change. So with both aggregates and free text you are creating a new variable. Lucene gives you a score back and I remember in Kowari we had that information but I don't think it was ever visible in a variable (maybe I'm wrong I don't really remember). It would be nice to be able to bind new variables somehow from things in the WHERE clause for this - and that would also allow you to filter out based on COUNTS greater than some value (without having a HAVING clause) or documents that match your Lucene query greater than a certain value. Being able to do transitive relationships just on a subset of the subclass relationship (like only subclasses of mammals not infer the whole tree of life) seemed to have been met with some reluctance. I really didn't understand this but it seemed to be around that it was the store's responsibility to control this and not up to the user to specify.

The other thing that was mentioned was transactions. It seems that transactions probably won't be part of SPARQL due to the nature of distributed transactions across the Web.

There was one paper on the first day that really stood out. I don't know what it is about logicians giving talks but they are generally really appealing to me. It was "Structured Objects in OWL: Representation and Reasoning" presented by Bernardo Grau. It seems to take the structural parts of an OWL ontology and creates a graph to represent it. This prevents DL reasoning of an infinite tree and creates a bounded graph. This is cool for biology - the make up for a cell for example but it also speeds up reasoning and allows errors to be found.

The other interesting part was the linked data area. I was a bit concerned that it was going to create a read only Semantic Web. A lot of the work, such as DBpedia that converts Wikipedia to RDF, seems a bit odd to me as you can only edit the Semantic Web indirectly through documents. But in the Linked Data Workshop a paper was presented called "Tabulator Redux: Browsing and Writing Linked Data" which of course adds write capabilities. I spoke to Chris Bizer (who gave a talk on how the linked data project now has ~2 billion triples) about whether you could edit DBpedia this way and he said probably not yet. That's going to be interesting to see where it goes.

I am just going off memory rather than notes. So I'll probably flesh this out a bit more later.

1 comment:

Kingsley Idehen said...

Andrew,

As DBpedia is hosted by Virtuoso (from OpenLink Software) I can tell you a thing or two about updates.

First of, Virtuoso has always supported the ability to perform SPARQL updates via SPARUL. The issue with Dbpedia is not updating per se. it's more to do with how far we go:

1. deltas from DBpedia to Wikipedia
2. deltas from Wikipedia to DBpedia

Option 2 is the only viable route at the current time. We've long planned to create a pinger service to select partners for updating DBpedia via from Wikipedia deltas.

Kingsley