Friday, December 12, 2008

Metadata for Fork Sake

Once in a while, I get excited and frustrated about the state of metadata - particularly in file systems. Previously, I've written about rumors of OS X doing metadata better and the wonders of BeOS. A resource fork is something that Macs have had for ages, although it has appeared in other systems before and since:
The concept of a resource manager for graphics objects, to save memory, originated in the OOZE package on the Alto in Smalltalk-76...Although the Windows NT NTFS can support forks (and so can be a file server for Mac files), the native feature providing that support, called an alternate data stream, has never been used extensively...Early versions of the BeOS implemented a database within the filesystem...Though not strictly a resource fork, AmigaOS stores meta data in files known as .info files...NeXT operating systems NeXTSTEP and OPENSTEP, and its successor, Mac OS X, and other systems like RISC OS implemented another solution. Under these systems the resources are left in an original format, for instance, pictures are included as complete TIFF files instead of being encoded into some sort of container.


This links to the Grand Unified Model 1 and Grand Unified Model 2 which are also good for a few quotes:
Almost every piece of data in the Macintosh ended up being touched by the Grand Unified Model. Even transient data, data being cut and pasted within and between applications, did not escape. The Scrap Manager labeled each piece of data on the clipboard with a resource type. In another Mac innovation, multiple pieces of data, each of a different type, could be stored on the clipboard simultaneously, so that applications could have a choice of representation of the same data (for example, storing both plain and styled text). And since this data could easily be stored on disk in a resource file, we were able to provide cutting and pasting of relatively large chunks of data by writing a temporary file called the Clipboard.

Since resource objects were typed, indicating their internal data format, and had ID's or names, it seemed that files should be able to be typed in the same way. There should be no difference between the formats of an independent TEXT file, stored as a standalone file, and a TEXT resource, stored with other objects in a resource file. So I decided we should give files the same four-byte type as resources, known as the type code. Of course, the user should not have to know anything about the file's type; that was the computer's job. So Larry Kenyon made space in the directory entry for each file for the type code, and the Mac would maintain the name as a completely independent piece of information.

Tuesday, December 09, 2008

Restlet Talk

I spoke last night at the Java Users Group about Restlet. It's a basic introduction to both Restlet and trying to link data across web sites. I wasn't very happy with the example - it was basically stolen from a Rails introduction. At least I could answer the question about why you would allow your data to be searched (to sell adverts on your recipe web site). I think it went down okay, mainly because most Java developers are used to large frameworks and complicated APIs in order to do what Restlet does (so it's impressive), the Rails developers knew some of the concepts already and while most are wary of RDF, SPARQL, OWL and the Semantic Web stack it was a fairly incremental addition to achieve something reasonably powerful.

Thursday, December 04, 2008

Getting Groovy with JRDF

In an effort to speed up and improve the test coverage in JRDF I've started writing some of the tests in Groovy. It's been a good experience so far - so much so that I'm probably not going back to writing tests in Java in the future.

One of the things I wanted to try was a RdfBuilder, this is similar to Groovy's NodeBuilder.

There are a couple of things that make it a bit tricky. When parsing or debugging builders I haven't yet found a way to find the methods/properties available even using MetaClass. And of course, when the magic goes wrong it's a bit harder to debug Groovy versus java.

It certainly smartens up the creation of triples, for example (bits from the NTriples test case):
def rdf = new RdfBuilder(graph)
rdf.with {
namespace("eg", "http://example.org/")
namespace("rdfs", "http://www.w3.org/2000/01/rdf-schema#")
"eg:resource1" "eg:property":"eg:resource2"
"_:anon" "eg:property":"eg:resource2"
"eg:resource1" "eg:property":"_:anon"
(3..6).each {
"eg:resource$it" "eg:property":"eg:resource2"
}
"eg:resource7" "eg:property":'"simple literal"'
"eg:resource17" ("eg:property":['"\\u20AC"',
'"\\uD800\\uDC00"', '"\\uD84C\\uDFB4"', '"\\uDBFF\\uDFFF"'])
"eg:resource24" "eg:property":'"<a></a>"^^rdfs:XMLLiteral'
"eg:resource31" "eg:property": '"chat"@en'
}
The first two lines defines the two namespaces used. The third line shows the general use of RDF and Groovy. It works out well, an RDF predicate and object maps to an attribute and value in Groovy. The next two lines show how you refer to the same blank node across two statements. And the following lines show using ranges and creating different types of literals. The third last line creates 4 triples with the same subject and predicate but with different objects.

Using the builder results in a file that's smaller than the test case file. You could remove some duplication by creating a method that takes in a number and the object and generates "eg:resource$number" "eg:property" "$object" but doing that may actually make it harder to read.

If you stick to only using URIs you can do things like:
rdf.with {
urn.foo6 {
urn.bar {
urn.baz1
urn.baz2
}
}
}
Which produces two triples: "urn:foo6, urn:bar, urn:baz1" and "urn:foo6, urn:bar, urn:baz2".

I expect that JRDF will only be more Groovy friendly in the future.