Tuesday, March 04, 2008

Save Ontologies from the Ontologists

I presented a talk at InterOntology08 last week (list of slides presented). It was only 15 minutes so there wasn't room for much content. What I think is the most important slide was number 11 about how the BioMANTA project is attempting to produce ontologies as an agile, engineering artefact that are verified in reality (due to experiments being performed, provenance tracked and data analysis on the quality of the provenance to filter out irrelevant or incorrect data).

There were some good things to come out if it. Thinking about how to describe to other people problems with ontologies in terms of inconsistencies - what will be inferred that contradicts your ontology - was very useful. The work done by Werner Ceusters, Nicola Gurino and Yu Lin were the most close to our work. One of the speakers gave what I think as a succinct description of the difference between top down vs. bottom ontology development: "what to expect" vs. "what to extract". I also met a lot of great people who I hope to meet again and Japan was very cool.

Easily the best, in terms of the most thought provoking, was Barry Smith's, "The Evaluation of Ontologies: Editorial Review vs Democratic Ranking". This discussed the work of the Gene Ontology and the OBO Foundry. He cited the Gene Ontology as the most useful and most used ontology which has been developed using a top down process. It allows comparable data to be produced, it removes data silos and he compared it to creation of standard measures (metric system). He said that in order to achieve this standardisation you need editorial committees. An ontology becomes part of the peer review, journal process. He introduced the OBO Foundry which has many principles such as being open, has a formal language, collaborative, orthogonal components, versioned, well documented and must have data before it can be accepted.

The alternative view he offered was attributed to Mark Musen. It's a bottom up, annotation of ontologies and many of the slides were taken from a previous talk. Mark believes that ontologies are still a cottage industry and that it is often difficult to ascertain the quality of an ontology just by inspection. He said it is also true that we may wish to use parts of ontologies even if they are not well designed. He questions whether a top down approach can scale. He is developing BioPortal which offers a way to upload and rate various ontologies. The key question about BioPortal is whether it will generate enough interest to reach a critical mass of reviews.

I had many problems with this talk. Firstly, the way it was characterised as one vs. the other - why can't they both work? What stops peer reviews of popular ontologies or getting popular ratings of peer reviewed ontologies. Barry mentioned that a selection approach works for refrigerators (where peer review designs the function of the refrigerator and colour is selected by the masses) but questioned whether this should work for science. This is an obviously negative view of what mass selection can do - we choose representatives in a democracy or successful products in a capitalist market, surely these are very important things that are left to the masses. Are ontologies any less than these things?

Beyond that, both of these methods seem to suggest a certain centralisation. Doesn't this encourage gatekeepers, people holding onto power, hasn't the web (governments, capitalism, science etc.) shown that decentralisation is better? I see science as a competition of ideas, the best model is chosen over many possible ones that best fits existing data and predicts new observations.

One of the OBO Foundary principles is that you can't reuse an ontology. That is, if you're outside the OBO Foundry and you make a change you can't redistribute or use the same identifiers. This just seems wrong. I must be misunderstanding this part, because it is supported by people who I would expect to support the idea of reusing ideas and, most importantly, sharing them with others.

Many of these arguments seems to be around whether an ontology is attempting to create or represent reality or if its an engineering artefact. I see it as a bit of both but its primary utility, I'd suggest, is as an engineering artefact. It represents a (hopefully working) system.

A simple example is our "fixing" of BioPAX. BioPAX uses string literals for certain properties and this prevents them being used as subjects in RDF. I would like to link, maybe dereference them and do other cool things with them that you can only do with URIs. So I'd like to make a change now, get something working and distribute my software with these changes.

I do think that ontologies should be well documented but documentation can be a barrier when you want to change something, try it out, make more changes, try it out again - the documentation is potentially going to be missing or wrong. The whole process seems to be trying to do too much upfront - which is terrible for the few, overworked ontologist that there are.

I don't want to wait while my ontology gets peer reviewed necessarily - the chances of the right person finding a mistake really doesn't sit with a committee or voting process - I'd like it to include everyone. I'd like to do it cheaply both in time and money; if for no other reason than to see whether it works well. If it doesn't work then it's not a big deal I can just change it back. It seems that if this was part of a big process then it would be less likely to happen.

Both methods also lack verification (or at least it wasn't discussed). There's nothing to say that a bunch of people in the OBO Foundary or a voting process will necessarily achieve certain modelling objectives - something that is right for me or for everyone. Like most systems, ontologies will have contradictory requirements such as flexibility and completeness or security and privacy - there really isn't one true answer. I'd prefer a process that quickly adapts to changing requirements which can then be verified.

1 comment:

Unknown said...

Can I correct one statement in your blog ? "One of the OBO Foundary principles is that you can't reuse an ontology. That is, if you're outside the OBO Foundry and you make a change you can't redistribute or use the same identifiers."

This is not the OBO or GO position. First, let me state the reasons. There is a widely used ontology, say the Gene Ontology, maintained by the GO Consortium. It has an id space (GO) and each term has a unique numeric identifier. It is freely available without license, let nor hindrance.

Now THIRD PARTY BLOBLO downloads the GO, _changes_ its content, for example adds new terms, changes the text of existing terms, using the GO ID space and then makes this available to the public _as_ "The Gene Ontology". Now we have TWO versions of the GO out there, that of the GO Consortium and that of THIRD PARLY BLOBLO. The result, _utter_ confusion.

THAT is all we are trying to avoid.
If BLOBLO takes the GO, makes changes, releases it as the BLOBLO Ontology with a _different_ id space - fine. If it turns out to be better than the GO and people use it in preference to the GO, also fine.

But what we do NOT want is two artifacts with the same name and ID space.

So, our stricture is purely pragmatic and is not sinister. It applies to members of the OBO Foundry or anyone else equally.

Michael