Introduction to ontologies¶
Based on CL editors training by David Osumi-Sutherland
Why do we need ontologies¶
We can't find what we're looking for¶
- Too flexible, multiple options, no enforcement standards
- Data at varying depth or granularity
For example, trying to refer to feces, in NCBI BioSample:
|Stool NOT faeces||21,798|
|Stool NOT feces||18,314|
We don't know what we're talking about¶
Because we have no single reference glossary, the result is millions of statements that are not obviously related.
Here is the male genitalia of a gasteruptiid wasp, and these 5 different structures here have each been labeled "paramere" by different people, each studying different hymenopteran lineages. How do we know what "paramere" means when it is referred to?
Controlled vocabulary (CV)¶
Any closed, prescribed list of terms.
- Terms are not usually defined
- Relationships between the terms are not usually defined
- Simplest form is a list
Example using wines¶
- Pinot noir
Hierarchical controlled vocabulary¶
Any controlled vocabulary that is arranged in a hierarchy.
- Terms are not usually defined
- Relationships between the terms are not usually named or defined
- Terms are arranged in a hierarchy
Example using wines (Taxonomy of wine)¶
- Pinot Noir
- Pinot Gris
Taxonomy – a hierarchical CV in which hierarchy = classification
Querying Hierachcical CV¶
The use of hierachical CV allows for querying with using inference from the hierarchy For example:
Being precisely vague¶
Ontologies allow annotation at varying levels of precision. For example, if the entity being annotated is a subtype of glial cell, but you don't know which type of glial cell, you can just annotate with 'glial cell'.
Ontologies allow for polyhierarchies in which a term can have multiple relationship types and hence classified under multiple terms. Multiple relationship types are useful for grouping and being precisely vague. See example for cardiac glial cell:
What is an ontology?¶
- A queryable store of knowledge
- A classification
- Terms are defined
- Terms are richly annotated:
- Textual definitions, references, synonyms, links, cross-references
- Relationships between terms are defined, allowing logical inference and sophisticated queries as well as graphs
- Terms are arranged in a classification hierarchy
- Expressed in a knowledge representation language such as RDFS, OBO, or OWL
- Gene Ontology, Uberon, Cell Ontology, EFO, SNOMED
Non-logical parts of onotologies¶
Terminology can be ambiguous, so text definitions, references, synonyms and images are key to helping users understand the intended meaning of a term.
Using nonmeaningful identifiers¶
Identifiers that do not hold any inherent meaning are important to ontologies. If you ever need to change the names of your terms, you're going to need identifiers that stay the same when the term name changes.
A microgilal cell is also known as: hortega cell, microglia, microgliocyte and brain resident macrophage.
In the cell ontology, it is however referred to by a unique identifier:
These identifiers are short ways of referring to IRIs (e.g., CL:000129 = http://purl.obolibrary.org/obo/CL_0000129)
This IRI is a unique, resolvable identifier on the web.
A group of ontologies - loosely co-ordinated through the OBO Foundry, have standardised their IRIs (e.g. http://purl.obolibrary.org/obo/CL_0000129 - A term in the cell ontology; http://purl.oblibrary.org/obo/cl.owl - The cell ontology)
IRIs? URIs? URLs?¶
- URI: Unique Resource Identifier - is a string of characters, following a standard specification, that unambiguously identifies a particular (web) resource.
- IRI: Internationalised Resource Identifier - a URI that can use characters in multiple languages
- URL: Uniform Resource Locator - a web-resolvable URI
Building scalable ontologies¶
OBO ontologies are mostly written in OWL2 or OBO syntax.
For a more in-depth explanation of formats (OWL, OBO, RDF etc.) refer to explainer on OWL format variants
An ontology as a classification¶
The ontology also functions as a classification. Below you will see a classification of parts of the insect and how it is represented in the ontology.
We use a SubClassOf (or is_a in obo format) to represent entities that are fully encapsulated by the parent class. For example: OWL: hindwing SubClassOf wing OBO: hindwing is_a wing
We use a relation
part_of to represent an entity that is a part of a whole entity.
English: all (insect) legs are part of a thoracic segment
OWL: 'leg' SubClassOf part_of some thoracic segment
OBO: 'leg'; relationship: part_of thoracic segment
Note the existential quantifier
some in OWL format -- it is interpreted as "there exists", "there is at least one", or "for some".
Note that there is a difference in how you order subClassOf:
'wing' SubClassOf part_of some 'thoracic segment' is correct
'thoracic segment' SubClassOf has_part some 'wing' is incorrect as it implies all thoracic segment has wings on it.
'claw' SubClassOf connected_to some 'tarsal segment' is correct
'tarsal segment' SubClassOf connected_to some 'claw' is incorrect as it implies all tarsal segments are connected to claws (for example some tarsal segments are connected to other tarsal segments)
There are many ways to classify things. For example, a neuron can be classified by structure, electrophysiology, neurotransmitter, lineage, etc.
Manually maintaining these multiple inheritances (that occur through multiple classifications) does not scale.
Problems with maintaining multiple inheritance classifications by hand
- Doesn’t scale
- When adding a new class, how are human editors to know - all of the relevant classifications to add? - how to rearrange the existing class hierarchy? - It is bad for consistency - Reasons for existing classifications often opaque - Hard to check for consistency with distant superclasses - Doesn’t allow for querying - A formalized ontology can be queried for classes with arbitrary sets of properties. A manual classification can not.
The knowledge an ontology contains can be used to automate classification For example:
English: Any sense organ that functions in the detection of smell is an olfactory sense organ OWL:
'olfactory sense organ' EquivalentTo ‘sense organ’ that capable_of some ‘detection of smell’
If we then have an entity
nose that is subClassOf
sense organ and
capable_of some detection of smell, it will be automatically classified as an olfacotry sense organ.
- David Osumi-Sutherland (original creator of slides)
- Nicole Vasilevsky (OSHU) Alex Diehl (Buffalo), Nico Matentzoglu, Matt Brush, Matt Yoder, Carlo Toriniai, Simon Jupp
- Chris Mungall (LNBL), Melissa Haendal (OSU), Jim Balhoff (RENCI), James Overton - slides, ideas & discussions
- Terry Meehan - who edited CL more than anyone
- Helen Parkinson (EBI)
- Michael Ashburner