Sunday, February 09, 2003

From the Semantic to the Pedantic

Where to place Agora? "Moreover, in light in my semantic web involvement, I'm getting more and more unconfortable with RDF (see my semantic web fight club pictures in boston in the gallery at http://www.betaversion.org/~stefano/) and I'm more and more heading myself into the concept of 'data emergence' where you don't go around bothering people to markup their data as *you* like it, but *you* make an effort to collect their data and make a sense out of it. I'm starting to call it 'pedantic web' myself :)

Google showed how much value can be gained out of harvesting of simple information (hyperlink) that locally has no apparent global meaning. As do email replies or IP logs for CVS logins.

There is potentially a huge value in fostering research on data emergence, expecially if related to reasonable-sized and well logged communities like ours."

After looking at companies such as Covera, Endeca, FAST and others centralization is a valid concern and strategy. I still think that the RDF model gives you the best model. Faceted or not XML is still a hierachy hamburger.

This comes in various references such as: Zen, flow and emergence in information models. XML is still limited by its model.

There is a Happy Ending:
"Nevertheless, there are some features of RDF that may fit well with the emerging Infoset-centric XML processing model.

* RDF gives us a model for namespace mixing and data merging
There is no algorithm for merging two XML Infosets, to enable us to pool knowledge acquired from diverse sources. The RDF information model, by constrast, was designed with data aggregation (rather than structured documents) in mind. Merging RDF data is trivial: add the triples extracted from two RDF/XML documents, and store them in a new one.
* RDF views of the Infoset are explicit about the information we can throw away
Transforming Infosets into their RDF graph allows us to throw away irrelevant information, such as the aspects of the Infoset concerned with preserving a representation of document ordering. When we define transformations from an XML Infoset into RDF, we show XML processors which parts of the Infoset can be discarded without losing the essence of the message encoded in that XML."

No comments: