Four more updates to the recent Always On article:
Now on Slashdot "A research group at University of Maryland has published a blog describing the latest approach for finding and indexing Semantic Web Documents. They have published it in reaction to Peter Norvig's (director of search quality at Google) view on the Semantic Web..."
On finding semantic web documents. About who was looking for Semantic Web documents. "As of this writing, I'd guess there are at least two million SWDs accessible on the web. Most of these are FOAF or RSS documents...There are lots of other uses of RDF content: embedded RDF in HTML documents, in other document types (e.g., PDF, JPG), in databases, etc."
It'd be valid to say "the Semantic Web is 40 million triples" or something if you had to rely on everything being RDF to be useful. But things like MP3, file systems, SQL databases etc. can all be viewed as RDF - without conversion, on the fly conversion or whatever. Somethings maybe viewed as RDF but never stored as RDF.
Semantic Web: A different perspective on what works and what doesn't
"More importantly, the promise of Semantic Web is closely tied to having the tools for semantic annotations of heterogeneous content, i.e., create semantic metadata automatically. This is much easier to do when you have high quality domain ontologies that bound the scope of automatic extraction."
"Commercial technologies (example) can process millions of pages per day and extract semantic metadata, and all these can be represented as RDF (and that is a good idea because of the benefits esp. for high end semantic applications such as analytics)."
"These types of ontologies routinely have millions of instances (look at SWETO, NCI ontology, GlycO..."
What also works... (which I did a triple take - I thought Danny was linking to some new Semantic Web blog). Talks about how the Semantic Web and LSI are helpful but not tied to one another and points to Semantic Web != Text Analysis; Semantic Web != Controlled Vocabularies.