Thursday, October 02, 2003

Arguing about Metadata

Metadata, Semiotics, and the Tower of Babel

"Anyone who has built or been a heavy user of text databases or similar systems has run into this problem repeatedly. Fancy lexical or statistical processing does help, as does the integration of information like link patterns, but at the end there is always a significant and irreduceable noise, that means one either has a certain amount of garbage in the output (from the view of the user), or is dropping some amount of useful information. Nor is there a way out by using an artificial set of symbols, e.g., 'controlled terms', taxonomies and the like. In the large, with a heterogenous set of users, these perform no better than grinding up plain text."

Some of this is true, given a large enough set of users and a large enough set of documents the use of one word to mean one thing is lost. Not a big deal. But what Google is trying to do and has been for a long time (with the country based search engines, Froogle, etc.) is not only give documents context through things like link analysis but also reducing the data searched by making the search engine aware of the user's context.

Mathematicians, lawyers and falconers all have their own vocabulary and context. So you can describe your context when you're searching the Semantic Web. In fact, one of the early use cases for RDF was with P3P and automatic negotiation, requiring both client and server side descriptions of privacy policies.

"Now why should we suspect that taking character strings, and wrapping them in XML or RDF is going to change all of this? The syntactic sugar is all wonderful, and indeed a better mousetrap from the POV of systems integration, but the real basis for the blue sky claims that we're approaching Semantic Web nirvana is bound up in the signifiers, the symbols, that are to be wrapped in that sugar. Is there some magic in angle brackets that was not found in LISP parentheses, that will repeal human nature and semiotics? I think not. Call it a taxonomy, a controlled vocabulary, a metadata dictionary, it's all the same thing: yet another language, either small and brittle, or large and ambiguous. Either way, just another layer on the Tower of Babel."

"Coming soon: One place where the French and the Chicago school agree: economic reasons why the Semantic Web is a crock."

It's hard to disagree with the main arguments of the piece, that all language is symbolic and that meaning is based on context. That's fine. Not sure what that's got to do with RDF though. Saying that RDF will fail because it's based on language is making an argument in exactly the wrong direction. The reason why RDF will succeed is based on removing most of the complexities of language.

The same points that Cory lists can be used support why RDF can work. It pains me that this is still used as a good example of why metadata/RDF/Semantic Web will fail (I've commented on this before).

People lie - People also pay for (and sometimes get for free) reliable information from trustworthy sources (or sources they consider trustworthy).
People are lazy - Yes, and the best way to be lazy is to do the thing that requires the least effort. People want to find their book, document, music, video quickly and are willing either to do to it themselves or get some else to do it for them (sometimes for money). Because having the right metadata leads to less effort.
People are stupid - The web, search engines, classification tools, the Semantic Web, computers, etc. are all trying to help people become smarter - they are tools that help everyone think. In fact, it's a way to make the reliance on individual intelligence less important - it doesn't matter because you can look it up on Google. Many of the rules and requirements of language have been stripped out of RDF. You can't even make a false statement. It's easier to produce good RDF than to produce good English (luckily for me).

Good metadata gives a law firm an advantage over another, it lets you find that song in your MP3 collection when you're jogging, it stops your teeth from falling out, etc. It's wrong to say that because it's not going to be perfect it can't work. It's also wrong when plainly, people already rely on creating and using good metadata.

Joi Ito responds as well.

No comments: