Wednesday, January 01, 2003

Tag Soup

In Tag Soup TNG Mark Pilgrim writes that the Semantic Web will never succeed because people lie. He also says that it's machine readable only which is just not so. As Tim Berners-Lee wrote in Nature: "The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help users communicate with each other." It's extra to the Web not a replacement.

In the Semantic Web, as in life, you don't have to believe everything everybody says forever. If you are deeply cynical, then don't believe what anyone says. Still, the Semantic Web technologies could be of use. You could use your own text mining tool to extract metadata (unlike the document's own data in the meta tag). You can then combine this with other document's metadata and create your own ontology. You can use ontologies to further classify other documents or sets of documents.

If you can trust other people and their metadata extraction then things get even better or even exponentially better as you trust more (people, groups, companies, etc.) and/or use more tools (Semantic Web enabled P2P clients or Google, etc.).

RDF lets you create statements that the document/author made, that you or anyone else has made. The lies can be categorized and pruned to your or anyone elses whim.

I really hope no one or at least very few, will ever have to manually produce RDF/XML. If you're a programmer use a library, if you're not use an application. If you want to write it yourself it's just like code or English - if it doesn't parse it can't be understood.

The schema problem does not seem that hard either. People do it now for XML or database schemas. The conversion tools can either be as general or as specific as you want and can include as much or as little human intervention as you want. The OntoShare paper addresses some of these problems such as how do you evolve a shared ontology and the use of tools (in their case ViewSum) to consistently extract key concepts. Semantic Gossiping outlines the problems involved in doing this in a peer-to-peer environment.

No comments: