Traditionally, reference implementations (i.e. traditional source code) has been the way to do this. "Running code" is the final arbiter.
Maybe this is as good as it gets? Unfortunately, a fully blown word processor runs to many, many thousands of lines of code and the semantic devil is buried way down in the details...
Until we can convince (or force) web sites to embrace and standardize on Open Data formats — XML, JSON, or even CSV, as appropriate — we will be in some ways even more locked in than we were in the bad old desktop days.
Similarly, how much value do you think there is to be had from a snapshot of the source code for eBay or Facebook being made available? This is one area where Open Source offers no solution to the problem of vendor lock-in. In addition, the fact that we are increasingly moving to a Web-based world means that Open Source will be less and less effective as a mechanism for preventing vendor-lockin in the software industry. This is why Open Source is dead, as it will cease to be relevant in a world where most consumers of software actually use services as opposed to installing and maintaining software that is "distributed" to them.
The point the Sean is making is that even if we achieve what Dave is suggesting we still haven't solved the semantic problem. Making it explicit and non-proprietary is not found in XML, JSON or CSV - these just aren't descriptive enough. And having running code is all fine but it's not generic enough - it will be tied to Java or C# or whatever.
The answer is of course both, but both a data format that is descriptive enough (like RDF/OWL) and open source stores that have the ability to process large quantities of it (because you will have vaste quantities of your own data in the future and you won't want one company to own it).