Tuesday, September 30, 2008

More Data

Neurocommons have released their integrated RDF datasets. It is composed of different modules or bundles including MeSH, Medline, OBO and others.

Wednesday, September 24, 2008

Who Holds the Power?

On Monday I went and saw John Wilbanks talk on "Publishing in Today`s Environment" (I can't seem to find a decent URL but a good overview of some of the topics discussed is "The Open Access Interviews: John Wilbanks"). I didn't take many notes but a few things have stuck with me:
  • Libraries providing a role as repositories of data (both public and private).

  • The idea of "free as in puppy" in relation to digital curation - a great metaphor. When someone gives you a puppy it is initially free but the upkeep of it is anything but.

  • The Queensland Government has its own Creative Commons initiative and the Australian Government is following suite.

  • There's a power struggle occurring between researchers and publishers to make data, papers and the like freely available. The power used to be on the side of publishers but as the producers band together (by country, university, faculty and so on) the power is going back to them.

  • Creative Commons are continuing to fight, in cunning ways, to get back to decent copyright law.

The kind of behavior that publishers have exhibited appears to be on the way out and seems to be going the other way. This is where databases interoperate with each other, papers can link to the original data, and tools, data and papers are all part of an integrated experience. The linking and integration of papers and other artifacts seems to be a one way process, it's hard to imagine a developer, scientist, arts graduate or anyone else spending huge amounts of money and time finally wresting control from one bunch of people only to tie it to another.

This leads to the T-Mobile G1 announcement. The announcement was very underwhelming - guys speaking for the first 10-15 minutes saying what a wonderful job they had done and the guy talking about how good an experience the phone was at playing Pacman. The more intersting part is the open platform. Much like the publishing area the ability to have open access will be a massive differentiator. It reminded me a bit about the discussion on how AOL, Prodigy and the other closed networks quickly died when faced with the open web:
By contrast, the proprietary networks of CompuServe, AOL, Prodigy, and Minitel were out beating the bushes for content, arranging to provide it through the straightforward economic model of being paid by people who would spend connect time browsing it. If anything, we would expect the proprietary networks to offer more, and for a while they did. But they also had a natural desire to act as gatekeepers—to validate anything appearing on their network, to cut individual deals for revenue sharing with their content providers, and to keep their customers from affecting the network’s technology. These tendencies meant that their rates of growth and differentiation were slow.

The closed versus open network is not quite the whole picture with respect to the iPhone versus Android. While there is competition between iTunes versus Amazon, Streetview versus normal Google Maps and there will be other content battles, I don't think that's the source of the real innovation. I think it's probably the technical innovation that the online providers couldn't match that was decisive and will be for phones:
The software driving these communities was stagnant: subscribers who were both interested in the communities’ content and technically minded had few outlets through which to contribute technical improvements to the way the communities were built. Instead, any improvements were orchestrated centrally. As the initial offerings of the proprietary networks plateaued, the Internet saw developments in technology that in turn led to developments in content and ultimately in social and economic interaction: the Web and Web sites, online shopping, peer-to-peer networking, wikis, and blogs.

Is developing on an iPhone going to lead to more technical innovation over Android? Does the ability to have open source code on Android beat the Apple NDAs?

Apple will probably recognize that it's the developers that will ultimately have the power but like publishing it depends on the actions of both parties. At the moment both of these new phone platforms are a little limited - the ability for people to innovate on iPhones using iTunes, Mail is out but it doesn't seem terribly better for Android (being DRM free is a good start though). If history is any guide, it would seem to favor open development over closed.

Update: Similar more succinct explanation.

Update: No Pragamatic book due to NDA - the outrage!

Friday, September 19, 2008

Tuesday, September 16, 2008

Make Every Web Site a Semantic Web Site

Back in March this was news but I completely missed it. Dapper has a rather nice way of turning web sites into data - XML, JSON, etc but it also includes a semanify service. It includes using existing namespaces (FOAF, GSS, Creative Commons, Media RSS and Dublin Core) supported by Yahoo's search engine.

This is covered in more depth in Semantify Hacks - Creating a your own RDF schema using Dapper:
So now, building a Dapp means you also built your own RDF compatible schema, that you can use wherever by just pointing to the webservice:

http://www.dapper.net/websiteServices/dapp-scheme.php?dappName=MYDAPP


The given example is MSN's search engine which you can see in all its RDF/XML glory.

ReadWriteWeb has step-by-step instructions.

Monday, September 08, 2008

Into Thick Air

Chrome, JavaScript, and Flash: Two (Mostly) Opposing Views the second comment closely follows a reason update on CounterNotions, Google Chrome: Bad news for Adobe.
But a full-fledged browser. One that behaves, however, as a platform to host applications best tied to cloud computing with built-in local persistence for offline computing. Sure, in its current form Chrome can’t compete with Silverlight or Flex/AIR for what Adobe calls “expressiveness,” meme-speak for rich graphics, animations, integrated video and other visual UI goodies.

Chrome may shut it off for good. It’s possible that various open source Chrome technologies could melt into Safari and Firefox. But –– whether as a stand-alone product or a progenitor of fast, powerful and expressive browsers –– Chrome signals to anybody but the diehard Microsoft constituents that the browser itself, not a proprietary plug-in or a separate runtime, is the future of RIAs. With its huge ecosystem, Microsoft can live with that. At least until its enterprise monopoly seriously erodes. But Adobe cannot.

In a world where the online pie is divided among the .NET army of Microsoft, the browser-gang of Apple+Mozilla+Google, and the lone Adobe, it’s not difficult to predict whose share will shrink into insignificance. If the exclusion of Flash from the iPhone wasn’t a wake-up call for Adobe, Chrome should certainly be one.


Most of the commentary is focused on the browser within an operating system angle. Although one of the easter eggs is a familiar screensaver. I think it's more helpful to concentrate on the fact that these browsers are getting rich enough to remove the applications embedded within browsers. There is already a lot of functionality developed or being developed such as SVG and storage (part of HTML 5). Chrome ships with Gears though and Webkit, to see how HTML5 and Gears relates see Aaron Boodman's talk on implementing HTML5 in Gears. They create two namespaces one for implementing the standard APIs and one for non-standard APIs - it seems like it has quite a solid development process behind it. Is there really a lot of reason left to support these proprietary applications within applications?

More MapReduce Groovy

A very good post on Cascading (covered previously), GOODBYE MAPREDUCE, HELLO CASCADING

Cascading’s logical model abstracts away MapReduce into a convenient tuples, pipes, and taps model. Data is represented as “Tuples”, a named list of objects. For example, I can have a tuple (”url”, “stats”), where “url” is a Hadoop “Text” object and “stats” is my own “UrlStats” complex object, containing methods for getting “numberOfHits” and “averageTimeSpent”. Tuples are kept together in “streams”, and all tuples in a stream have the exact same fields.

An operation on a stream of tuples is called a “Pipe”. There are a few kinds of pipes, each encompassing a category of transformations on a tuple stream. For instance, the “Each” pipe will apply a custom function to each individual tuple. The “GroupBy” pipe will group tuples together by a set of fields, and the “Every” pipe will apply an “aggregator function” to all tuples in a group at once.

One of the most powerful features of Cascading is the ability to fork and merge pipes together.

Once you have constructed your operations into a “pipe assembly”, you then tell Cascading how to retrieve and persist the data using an abstraction called “Tap”. “Taps” know how to convert stored data into Tuples and vice versa, and have complete control over how and where the data is stored. Cascading has a lot of built-in taps - using SequenceFiles and Text formats via HDFS are two examples. If you want to store data in your own format, you can define your own Tap. We have done this here at Rapleaf and it has worked seamlessly.

Senior Re-Searcher

Someone who continues to search (re-search) Google until he finds what is required. Senior in that he remembers a time before Google.

Wednesday, September 03, 2008

Stats for Nerds

I read "Is google chrome under / partially reporting RAM usage ?" which seems to be the typical problem with Microsoft's task manager. A better way to check out Chrome vs others is to hit Shift-Escape, click on "Stats for Nerds" and it has a comparison there.

Tuesday, September 02, 2008

Antimetabole

Bill Clinton had an interesting one recently in his Democratic convention speech: "People the world over have always been more impressed by the power of our example than by the example of our power."

Monday, September 01, 2008

Thesis and Antithesis

David Anderson on Agile. He introduces his talk by discussing some very broad ideas, including how agility has been applied to other areas. He talks about how the agile manifesto seems to have become more about belief and superstition than a scientific model. It's also rarely clicked through - to see the principles behind the manifesto.

The agile community has found it is better to develop in a failure tolerant environment. This was a reaction to the focus on more and more accurate estimates which lead to analysis patterns (or antipatterns). They are antipatterns because there was never enough detail in the analysis to provide accurate enough estimates and to deal with the unexpected.

Through his talk he listed a bunch of thought bubbles on software development:
* Value is providing functionality fastest.
* Knowledge work is perishable.
* Perfect is the enemy of good enough.
* Develop a high trust, high social capital.
* Have a highly collaborative culture.
* Reflect and adjust.
* Sustainable pace.
* Craftmanship.
* Value is contextual, context is temporal.
* Waste over scale.

Traditional industries have relied on bigger batch sizes to provide better economies of scale. With software development the transaction costs causes large batch sizes to work against you. Smaller batches allow you release more business value more quickly and reduce waste (waste over scale).

He introduces kanban which has ideas such as: pull, flow, and regulate work in progress. It does not use time boxed iterations and little or no planning or estimations (at least from a traditional agile approach). It still has constant improvement and delivery. It drops the idea of a generalist approach to labor which is seen by some as waterfall in disguise. The reason this comes about is because at an enterprise scale it's not feasible to hire a lot of experts who are excellent generalists - the labor pool does not exist. This leads to a tension with typical lean principals of reducing waste by having generalists.

He sees two main changes coming in the future: software factories (software product lines - which includes DSLs) and cloud services (deploying web services in the cloud). Architecture and modeling will come back in fashion as a value chain allows incentives for delivering common behavior. He's talking about a 100 fold improvement in productivity through using these ideas.

Finally, he talks about how CMM/CMMI is actually quite a good indication of the ability for software projects to succeed and the industry as a whole. He mentions a report on how agile and CMMI can work together, the idea being that not only is organizational maturity a good idea but it actually means that agile methods can be implemented better (it's called "CMMI or Agile: Why Not Embrace Both?!" - I only found a few references, for example Agile+CMMI Panel @ SEPG).

Via.

Still have Richard P. Gabriel's talk to go through too.