Wednesday, June 27, 2007

Moving Up

I really appreciate blogs sometimes. I started Al Gore's, "The Assault on Reason", before putting it down halfway because it was depressing (which was as bad as "Sicko" and "A Crude Awakening"). Lawrence Lessig on the other hand has decided to move from the copyright brick wall to the corruption brick wall.

In one of the handful of opportunities I had to watch Gore deliver his global warming Keynote, I recognized a link in the problem that he was describing and the work that I have been doing during this past decade. After talking about the basic inability of our political system to reckon the truth about global warming, Gore observed that this was really just part of a much bigger problem. That the real problem here was (what I will call a "corruption" of) the political process. That our government can't understand basic facts when strong interests have an interest in its misunderstanding.


The answer is a kind of corruption of the political process. Or better, a "corruption" of the political process. I don't mean corruption in the simple sense of bribery. I mean "corruption" in the sense that the system is so queered by the influence of money that it can't even get an issue as simple and clear as term extension right. Politicians are starved for the resources concentrated interests can provide. In the US, listening to money is the only way to secure reelection. And so an economy of influence bends public policy away from sense, always to dollars.

Scalability, Scalability, Scalability, Scalability

Dare Obasanjo has four recent postings on the recent Google Scalability Conference.

Google Scalability Conference Trip Report: MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets:

The talk was about the three pillars of Google's data storage and processing platform; GFS, BigTable and MapReduce.

A developer only has to write their specific map and reduce operations for their data sets which could run as low as 25 - 50 lines of code while the MapReduce infrastructure deals with parallelizing the task and distributing it across different machines, handling machine failures and error conditions in the data, optimizations such as moving computation close to the data to reduce I/O bandwidth consumed, providing system monitoring and making the service scalable across hundreds to thousands of machines.

Currently, almost every major product at Google uses MapReduce in some way. There are 6000 MapReduce applications checked into the Google source tree with the hundreds of new applications that utilize it being written per month. To illustrate its ease of use, a graph of new MapReduce applications checked into the Google source tree over time shows that there is a spike every summer as interns show up and create a flood of new MapReduce applications that are then checked into the Google source tree.


Google Scalability Conference Trip Report: Using MapReduce on Large Geographic Datasets:

A common pattern across a lot of Google services is creating a lot of index files that point and loading them into memory to make lookups fast. This is also done by the Google Maps team which has to handle massive amounts of data (e.g. there are over a hundred million roads in North America).

Q: Where are intermediate results from map operations stored?
A: In BigTable or GFS


Google Scalability Conference Trip Report: Lessons in Building Scalable Systems:

The most important lesson the Google Talk team learned is that you have to measure the right things. Questions like "how many active users do you have" and "how many IM messages does the system carry a day" may be good for evaluating marketshare but are not good questions from an engineering perspective if one is trying to get insight into how the system is performing.

Specifically, the biggest strain on the system actually turns out to be displaying presence information.

Giving developers access to live servers (ideally public beta servers not main production servers) will encourage them to test and try out ideas quickly. It also gives them a sense of empowerement. Developers end up making their systems easier to deploy, configure, monitor, debug and maintain when they have a better idea of the end to end process.


And finally, Google Scalability Conference Trip Report: Scaling Google for Every User, which had some interesting ideas about search engine usage.

Update: links for 2007-07-06 includes links to the Google talk on MapReduce tasks on large datasets and other goodies (scalable b-trees, hashing, etc).

Update 2: Greg Linden has a much better list of videos and commentary.

DERI at Google

The Semantic Web at Google (I found the movie here).

Starting from the end, Stefan was asked about inferencing and relationships. He basically responded that linked data is more practical and immediately useful and that the affect of Description Logics has been over estimated.

The highlight for me was the demo on how to construct user interfaces automatically (from about 20 minutes in). The algorithms are described in more detail in, Extending faceted navigation for RDF data.

They also talked a little about how Ruby was a good language for Semantic Web applications and referenced, ActiveRDF: Object-Oriented Semantic Web Programming.

The applications and tools demoed: SIOC Project, Active RDF, JeromeDL (digital library), BrowseRDF (automatically generated, facted UI) and S3B (social semantic search and browsing).

Tuesday, June 26, 2007

The History of Trans and Walk

Paul recently started talking about the history of trans and walk in Kowari/Mulgara. I did a bit of searching in old Tucana emails and feature requests.

I also vaguely remember trying to create an abstract class for walk and the base transitive classes (like the Exhaustive and Direct transitive closure classes). And one of the feature requests also mentions the idea of "set depth" to reduce the recursion on unbounded traversal which would also be handy for the shortest path between two nodes.

Friday, June 22, 2007

Getting up to Speed With OWL 1.1

Bijan's "Two Interesting Quotes" really got me started on understanding the new features of OWL 1.1. The addition of the RBox (to ABox and TBox), using SROIQ instead of SHOIN, is the obvious one.

The second interesting quote was from, "Describing chemical functional groups in OWL-DL for the classification of chemical compounds" which gives some clear examples using the new features of OWL 1.1 including qualified cardinality, negation and property chains. The first two are analogous to: chemical has two bonds and chemical has exactly one bond.

Property chains are another interesting feature which is highlighted in the above paper, in "The OBO to OWL mapping, GO to OWL 1.1!" and in the first paper Bijan spoke about. In one way this is a bit disappointing because the OBO work replicates what we been doing in the BioMANTA project (I hate duplication in all its forms) - we based it on the Protégé plugin. Our relationships are quite simple, mainly is-a, so OWL DL was enough. We hadn't yet encountered some of the problems such as reflexive and anti-symmetric relationships. It also links to a web page mapping OBO to OWL 1.1 and OWL DL semantics.

As noted in "Igniting the OWL 1.1 Touch Paper: The OWL API" the addition of an object model along with the OWL 1.1 specification makes it obvious what an OWL 1.1 API should look like. I haven't yet used OWL API much, except looking at how it integrates with Pellet and how closely it matched the OWL specification. The use of axioms made the addition of RBox a bit easier to understand (or misunderstand if I've got it wrong). Hopefully, punning will be made clear to me eventually too (I've been too stupid to understand it just based on the explanations and too lazy to look into it). Pellet 1.5 also introduces incremental inferencing which is hopefully as good as it sounds.

The other papers that were of interest: "Representing Phenotypes in OWL" (again making use of qualified cardinality) and "A survey of requirements for automated reasoning services for bio-ontologies in OWL".

And just to round off, I found a very good paper, "Towards a Quantitative, Platform-Independent Analysis of Knowledge Systems", about all the mistakes that can be made during modeling (errors, assumptions, etc.) and other types of failures such as language (under or over expressive), management, learning, reasoning, querying, and others.

And I know I'll need to come back to this, which lists the fragments of OWL 1.1 that can be used to keep things computable in polynomial time.

Friday, June 08, 2007

Bottom Up Semantics

A couple of recent articles on DevX have come up by Rod Coffin exploring how a bit of semantics (and Semantic Web technology) can help improve your applications ability to understand what people are trying to do. The first, Let Semantics Bring Sophistication to Your Applications introduces the concepts of ontologies and querying. The second, Use Semantic Language Tools to Better Understand User Intentions shows how using WordNet you can supplement existing applications to improve search results.

Update: DevX now has a Semantic Web Zone.

Thursday, June 07, 2007

0 dollars

SCO's Linux Billions?
For the second quarter of 2007, SCO reported a total of zero dollars of revenue from its SCOsource licensing program. In the first quarter of 2007, SCO reported SCOsource licensing revenue of $34,000, which is somewhat less than the billions McBride had been expecting.

Wednesday, June 06, 2007

All the Web is a Stage

Google Gears' WorkerPool - Message Passing, Shared Nothing Concurrency

By now, you‘ve probably already seen Google Gears, Google's solution for dragging AJAX applications offline. The one thing that jumped out at me was their WorkerPool component. This is a very nice solution for concurrency in Javascript.
In short: if you have any long running task, you can create a WorkerPool which is basically group of Javascript processes Note: they're not threads! The workers in a WorkerPool share nothing with each other, which makes them like OS processes or Erlang's lightweight processes (actually, they're more like the latter, as they're likely to run in the same address space).

And now guess, how these workers, these processes, communicate? Yep: messages, formatted as strings. Important to remember: if you format objects with as JSON strings, you can even send objects and structures along. The handler that receives messages also gets the ID of the sender, so if the sender implements a handler too, it‘s possible to return results asynchronously.

If you‘re reminded of Erlang or the old Actor concept you‘re right. I wonder what the Google Apps will do with this new concurrency approach (well, new for Javascript… yes I‘m ignoring Javascript Rhino).

I still hope that AJAX will die a quick death, like Java Applets, just for being so damn ugly and horrible to implement. But… things like this tell me that this will probably not happen. Good ideas like Google Gears will help paint over the ugly details of a solution, and it‘s all hip now, so it's easy to ignore many of the problems.


The same idea is also being implemented in Scala with their Actors.

Linked Together

This demo shows what can be achieved with the correct use of metadata from Flickr images. If nothing else, this answers why you would bother with metadata (as if there needs to be a justification these days). By correctly annotating your content you gain an advantage in leveraging a network of data. Via, Semantically-linked Interactive Imagery: Wow! The Emergent Web in Action and the demo Blaise Aguera y Arcas: Photosynth demo.

Update: What? I've edited this entry. Weird, I'm sure it made sense at the time.

Tuesday, June 05, 2007

JRDF 0.4.1.1 Released

JRDF 0.4.1.1. A bug fix release for an error in the wiring. It was wiring up the join code rather than the union code when it was suppose to do union operations. Both Derby and Hadoop were removed for this release (which reduces the GUI jar by 2 MB).

Why Fahrenheit 451 is not like 1984

Ray Bradbury: Fahrenheit 451 Misinterpreted

Now, Bradbury has decided to make news about the writing of his iconographic work and what he really meant. Fahrenheit 451 is not, he says firmly, a story about government censorship. Nor was it a response to Senator Joseph McCarthy, whose investigations had already instilled fear and stifled the creativity of thousands.

This, despite the fact that reviews, critiques and essays over the decades say that is precisely what it is all about. Even Bradbury’s authorized biographer, Sam Weller, in The Bradbury Chronicles, refers to Fahrenheit 451 as a book about censorship.


Points to Ray Bradbury at Home with movies about many topics including stories and politics.

Sunday, June 03, 2007

4:34

I was struck by an interview with Ayaan Hirsi Ali. In it she says a moderate version of Islam cannot exist without disregarding parts of the Koran (and without thinking, questioning and debating). I'd always assumed this was true and thought it wouldn't bother me but I was surprised when it did. About half way through the interview she was prompted to give support for her claim that Islam does promote violence. She says there is a criteria for determining whether something is true and she quoted Sura (4:34). Without dancing around the subject it says when faced with a disobedient wife: shout at her, ignore her, and finally beat her. And as far as I can tell this is not up for dispute. It's a bit hard to describe and I'm sure I'd heard it before but for some reason it finally got to me.

So like, Timothy 2, 12 this is a religious statement, without allegory, that is clearly in conflict with modern laws and morals (assuming that morals can exist outside of religion).

I guess I'd always assumed people ignore bits that don't make sense. But as someone reading Timothy said, "What do you do when you are confronted with a finding in scripture that either goes against what you've always believed or at least contradicts what you would like to believe? There are really only two choices. Understand it, accept it and conform to it or reject it and go on doing whatever you want."