More Metadata Than Data?
I was reading this from David's Whitepaper:
"Tucana works by representing large amounts of information with a (typically) smaller amount of metadata. In the case of a word-processing document or an electronic mail message, for example, metadata might include the author, the recipients, the subject, keywords, concepts addressed, people named, dates or places mentioned, etc. Metadata is stored in a lingua franca to enable sharing across applications and geographic boundaries. Metadata is represented in the World Wide Web Consortium's Resource Description Framework (RDF), an international standard for the representation of metadata. RDF is part of the W3C's Semantic Web project."
Now, my feeling has always been that there will be more metadata than data and that trying to store it *all* is going to be very hard or impossible.
Even just putting one tool over a document it can produce more metadata than data. You put the rest of the tools over it. I mean you could even have different runs over the same data which will take up more space.
I even found evidence of this previously:
"Metadata itself isn't new: the Romans had it, and medieval legal manuscripts have more metadata than data."
In fact you can demonstrate this in a HTML page:
The tags actually take up more space than the actually content.