Wednesday, August 08, 2007

Naming

Relatively recently, UniProt (a protein sequence database) announced they were moving away from LSIDs (which are URNs) to URIs. The cons of LSIDs seem to be outweighing the pros. More generally, there seems to be much discussion as to what is more appropriate and what can and cannot be done with URIs vs URNs. And even some of LSIDs proponents are saying that using them at the moment is not a good idea.

The form of a URN is: urn:lsid:ubio.org:namebank:11815 which can be resolved using: http://lsid.tdwg.org/summary/urn:lsid:ubio.org:namebank:11815. A couple of IBM articles describe it in more detail: "Build a life sciences collaboration network with LSID" and "LSID best practices". Part of the problem seems to be that the LSID needs a resolving service, much like a web service, to return the data for a given LSID. The URI on the other hand can just use a bit of content negotiation to return either RDF data or human readable HTML. It's not a well known feature that a web client can tell a web server what data it can accept. So a Semantic Web client would say "give me RDF" and a normal web client says "give me HTML".

An alternative is using an existing standard, such as THTTP, which shows how to turn a URN into a URL by providing a REST based service. Where, requesting a URL for the urn "urn:foo:12345-54321" it becomes an HTTP request "GET /uri-res/N2L?urn:foo:12345-54321 HTTP/1.0". This is a bit like the biordf.org approach of "http://bio2rdf.org/namespace:id". Having de-referencable URIs is part of the Banff Manifesto.

Creating GUIDs is an interesting problem in a distributed environment. One of the other life science groups compared Handle, DOI, LSID and PURL (persistent URLs).

The mentioning of the Handle System brought back ideas from previous digital library work and using URNs to name RDF graphs (which I later to discovered wasn't entirely novel).

No comments: