OWL, OBO, JSON? Base, simple, full, basic? What should you use, and why?¶
For reference of the more technical aspects of release artefacts, please see documentation on Release Artefacts
Ontologies come in different serialisations, formalisms, and variants For example, their are a full 9 (!) different release files associated with an ontology released using the default settings of the Ontology Development Kit, which causes a lot of confusion for current and prospective users.
Note: In the OBO Foundry pages, "variant" is currently referred to as "products", i.e. the way we use "variant" here is synonymous with with notion of "product".
Overview of the relevant concepts¶
- A formalism or formal language can be used to describe entities and their relationships in an ontology. The most important formalisms we have are:
- Web Ontology Language (OWL): OWL is by far the dominant formalism in the biomedical domain due to its inference capabilities.
- RDF(S): Is a generally weaker language than OWL, but widely used by triple stores and other SPARQL engines. RDF(S) is lacking some of the strong logical guarantees that come with OWL and should only be used in scenarios where scalability (computation time) is the primary concern.
- OBO: OBO used to be the dominant language in the biomedical domain before the advent of OWL. I also used to have its own specific semantics associated with it. OBO semantics have since been mapped into OWL semantics, so that for all practical purposes, we consider "OBO" now a dialect of OWL, which means that when you hear 'OBO format' today, we are generally referring to the serialisation (see below), NOT the formalism. Note that when we say OBO ontologies we mean literally Open Biomedical and Biological Ontologies, and NOT Ontology in OBO format.
Some people like to also list SHACL and Shex as ontology languages and formalism. Formalisms define syntax (e.g. grammar rules) and semantics (what does what expression mean?). The analogue in the real world would be natural languages, like English or Greek.
- A format, or serialisation of a language is used to write down statements of a formal language in some way. Formats are not formalisms - they simply enable statements in a formalism to be expressed in some (usually textual) way. The most common formats in our domains are:
- RDF/XML. This is the default serialisation language of the OWL flavours of OBO ontologies. It is a pretty ugly format, really hard to understand by most users but it has one advantage - it can be understood widely by RDF-focused tools like rdflib, OWL-focused tools like those based on the OWL API
- OWL Functional Syntax: This is very common syntax for editing ontologies in OWL, because they look nice in diff tools such as
git diff
, i.e changes to ontologies in functional syntax are much easier to be reviewed. RDF/XML is not suitable for manual review, due to its verbosity and complexity. - OWL Manchester Syntax: This is the default language for OWL tutorials and for writing class expressions in editors such as Protege
- OBO Format: The most easy to read of all the serialisations. In many ontologies such as Mondo and Uberon, we still use OBO as the editors format (as opposed to OWL Functional Syntax, which is more wide-spread). OBO format looks clear and beautiful in diffs such as git diffs, and therefore still continues to be wide-spread. OBO Format does not cover all of owl, and should only be used in conjunction with ontologies that stay within the limit of the OBO format specification.
- OBO Graphs JSON: A simple JSON serialisation of ontologies. This format roughly reflects the capabilities of the OBO format, but is intended for consumption by tools. Again, it does not cover all of OWL, but it does cover the parts that are relevant in 99% of the use cases.
The real-world analogue of serialisation or format is the script, i.e. Latin or Cyrillic script (not quite clean analogue).
- A variant is a version of the ontology exported for a specific purpose. The most important variants are:
- Edit: The variant of the ontology that is edited by ontology curators. Its sole purpose is to be used by ontology editors, and should not be used by any other application. In a ODK-style repository, the edit file is typically located hidden from view, e.g.
src/ontology/cl-edit.owl
. - Full: The ontology with all its imports merged in, and classified using a reasoner, see docs. The Full variant should be used by users that require the use of reasoners and a guarantee that all the inferences work as intended by the ontology developers. This is the default variant of most OBO ontologies.
- Base: The axioms belonging to the ontology, excluding any axioms from imported ontologies, see docs. Base variants are used by ontology repository developers to combine the latest versions of all ontologies in a way that avoids problems due to conflicting versions. Base files should not be used by users that want to use the ontology in downstream tools, such as annotation tools or scientific databases, as they are incomplete, i.e. not fully classified.
- Simple: A version of the ontology that only contains only a subset of the ontology (only the direct relations, see docs). The simple variant should be used by most users that build tools that use the ontology, especially when serialised as OBO graphs json. This variant should probably be avoided by power-users working with reasoners, as many of the axioms that drive reasoning are missing.
- Basic: A variant of Simple, in that it is reduced to only a specific set of relations, such as
subClassOf
andpartOf
. Some users that require the ontology to correspond to acyclic graphs, or deliberately want to focus only on a set of core relations, will want to use this variant, see docs). The formal definition of the basic variant can be found here. - Other variants: Some variants are still used, like "non-classified", see docs), but should be avoided. Others like base-plus, a variant that corresponds to base + the inferred axioms, are still under development, and will be explained here when they are fully developed.
Best practices¶
- Tool developers developing tools that use the ontology (and do not need reasoners), such as database curation tools, web-browsers and similar, should typically use OBO graphs JSON and avoid using OBO format or any of the OWL focussed serialisations (Functional, Manchester or RDF/XML). OWL-focussed serialisations contain a huge deal of axiomatic content that make no sense to most users, and can lead to a variety of mistakes. We have seen it many times that software developers try to interpret OWL axioms somehow to extract relations. Do not do that! Work with the ontologies to ensure they provide the relationships you need in the appropriate form.
- Tool developers building tools to work with ontologies should typically ensure that they can read and write RDF/XML - this is the most widely understood serialisation. Work with ontologies means here 'enable operations that change the content of the ontology'.
- Tool developers building infrastructure to query across ontologies should consider using base variants - these ensure that you can always use the latest version of each ontology and avoid most of the common version clashes. It is important that such users are keenly aware of the role of OWL reasoning in such a process.
- Many users of ontologies think they need the reasoner actually don't. Make sure you consult with an expert before building a system that relies on OWL reasoners to deliver user facing services.
- As an ontology developer, it is great practice to provide the above variants in the common serialisations. The Ontology Development Kit provides defaults for all of these.
- As an ontology developer, you should avoid publishing your ontology with
owl:imports
statements - these are easily ignored by your users and make the intended "content" of the ontology quite none-transparent.