Using Ontologies and Ontology Terms¶
These materials are under construction and may be incomplete.
- Sign up for a free GitHub account
What is delivered as part of the course¶
Description: Using ontology terms for annotations and structuring data.
- Explain why ontologies are useful
- Find good ontologies: ontology repositories, OBO
- Find terms using ontology browsers
- Assess ontologies for use: license, quality
- Map local terminology to ontology terms
- Identify missing terms
- Make term requests to existing ontologies
- Understand the differences between IRIs, CURIEs, and labels
Additional materials and resources¶
- How select and request terms from ontologies - Blog post by Chris Mungall
- Guidelines for writing definitions in Ontologies (paper)
- OntoTips - A guide by Chris Mungall covering various aspects of ontology engineering.
1. Why ontologies are useful¶
Ontologies provide a logical classification of information in a particular domain or subject area. Ontologies can be used for data annotations, for structuring disparate data types, classifying information, for inferencing and reasoning across data and computational analyses.
Difference between a terminology and an ontology¶
A terminology is a collection of terms; a term can have a definition and synonyms.
An ontology contains a formal classification of terminology in a domain that provides textual and machine readable definitions, and defines the relationships between terms. An ontology is a terminology, but a terminology is not (necessarily) an ontology.
2. Finding good ontologies¶
Numerous ontologies exist. Some recommended sources to find community developed, high quality and frequently used ontologies are listed below.
- OBO Foundry. Read more below
- The Ontology Lookup Service (OLS). The OLS contains over 200 ontologies.
- BioPortal. BioPortal aggregates almost 900 biomedical ontologies, and provides a search interface to look up terms. It is a popular repository for ontologies, but as only a fraction of the ontologies are reviewed by the OBO Foundry, you should carefully review any ontologies found on BioPortal before committing to use them.
- Ontobee. Ontobee indexes all 200+ OBO Foundry ontologies and is the default browser for OBO: For example, when you click http://purl.obolibrary.org/obo/IAO_0000112, you will be redirected to the a page in the Ontobee browser that describes the annotation property
example of usage.
3. Ontology repositories¶
The OBO Foundry is a community of ontology developers that are committed to developing a library of ontologies that are open, interoperable ontologies, logically well-formed and scientifically accurate. OBO Foundry participants follow and contribute to the development of an evolving set of principles including open use, collaborative development, non-overlapping and strictly-scoped content, and common syntax and relations, based on ontology models that work well, such as the Gene Ontology (GO).
The OBO Foundry is overseen by an Operations Committee with Editorial, Technical and Outreach working groups.
Find terms using ontology browsers¶
Various ontology browsers are available, we recommend using one of the ontology browsers listed below.
4. Assessing ontologies for use¶
Some considerations for determining which ontologies to use include the license and quality of the ontology.
Licenses define how an ontology can legally be used or reused. One requirement for OBO Foundry Ontologies is that they are open, meaning that the ontologies are openly and freely available for use with acknowledgement and without alteration. OBO ontologies are required to be released under a Creative Commons CC-BY license version 3.0 or later, OR released into the public domain under CC0. The license should be clearly stated in the ontology file.
Some criteria that can be applied to determine the quality of an ontology include:
- Is there an ontology tracker to report issues? All open ontologies should have some form of an issue tracker to report bugs, make new term requests or request other changes to the ontology. Many ontologies use GitHub to track their issues.
- Is it currently active? Are there a large number of open tickets on the ontology tracker that have not been commented on or otherwise addressed? Are the tickets very old, have been sitting for years?
- Commmunity involvement On the issue tracker, is there evidence of community involvement, such as issues and comments from outside community members?
- Scientifically sound Does the ontology accurately represent the domain in a scientifically sound way?
How to determine which is the right ontology to use?¶
- There are multiple ontologies that exist, start by selecting the appropriate ontology, then search and restrict your search to that ontology.
- Recommend using ontologies that are open and interoperable. Focusing on OBO foundry ontologies are a good place to start
- Make informed decision about which ontology to use
- Maybe the ontology you want to use does not have the term you want, so make a term request to that ontology
5. Mapping local terminology to ontology terms¶
Data can mapped to ontology terms manually, using spreadsheets, or via curation tools such as:
- BioPortal Annotator
- Canto - a web-based literature curation tools
- Textpresso - designed for C. elegans curation
- OntoBrowser - an online collaborative curation tool
6. Identifying missing terms¶
7. Making term requests to existing ontologies¶
Making a new term request to Mondo¶
- Go to Mondo GitHub tracker: Select New issue
- Pick appropriate template
- Fill in the information that is requested on the template below each header
- Please include:
- A definition in the proper format
- Sources/cross references for synonyms
- Your ORCID or the URL for your ClinGen working group
- Add any additional comments at the end
- Nicole will automatically be tagged
- Please email Nicole or comment on the ticket (Nicole will be emailed) if you have any additional questions or need the ticket is high priority
Best practices guidelines¶
Note: We appreciate your contributions to extending and improving Mondo. Following best guidelines is appreciated by the curators and developers, and assists them in addressing your issue more quickly. However, we understand if you are not always able to follow these best practices.
- New term requests should not match existing terms or synonyms
- Write a concise definition in the definition field. More info about writing definitions is here
- Synonyms - please provide a synonym scope and source/cross-reference
- Check OMIM for children classes (specific to new gene-related terms)
- Exact - an exact match
- Narrow - more specific term
- Broad - more general term
- Related - a word of phrase has been used synonymously with the primary term name in the literature, but the usage is not strictly correct
- Preferred term labels should be lowercase (unless it is a proper name or abbreviation)
- Write the request below the prompts on the template so the Markdown formatting displays properly
- Synonyms should be lowercase (with exceptions above)
- Definition source - if from PubMed, please use the format PMID:XXXXXX (no space)
- Include the Mondo ID and label for the parent term
- List the children terms with Mondo ID and label in a bulleted list
Tickets that followed best practices:¶
- https://github.com/monarch-initiative/mondo/issues/1719 Note: while this ticket generally follows best practices, one thing that can be improved is defining the synonym scope. Generally, when the synonym scope is not explicity mentioned, it is assumed it is an exact synonym.
Tickets that did not follow best practices:¶
Submitting other issues to Mondo¶
- Users may want to request other types of changes to Mondo (or any other ontology) beyond just adding a new term.
- The Mondo curation team created many issue templates for users, for specific types of requests.
- If none of the issue templates fit your issue, you can scroll to the bottom and click Open a blank issue
8.Differences between IRIs, CURIEs, and labels¶
A uniform resource identifier (URI) is a string of characters used to identify a name or a resource.
A URL is a URI that, in addition to identifying a network-homed resource, specifies the means of acting upon or obtaining the representation.
A URL such as this one:
has three main parts: 1. Protocol, e.g. https 2. Host, e.g. github.com 3. Path, e.g. /obophenotype/uberon/blob/master/uberon_edit.obo
The protocol tells you how to get the resource. Common protocols for web pages are http (HyperText Transfer Protocol) and https (HTTP Secure). The host is the name of the server to contact (the where), which can be a numeric IP address, but is more often a domain name. The path is the name of the resource on that server (the what), here the Uberon anatomy ontology file.
A Internationalized Resource Identifiers (IRI) is an internet protocol standard that allows permitted characters from a wide range of scripts. While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth. It is defined by RFC 3987.
More information is available here.
A Compact URI (CURIE) consists of a prefix and a suffix, where the prefix stands in place of a longer base IRI.
By converting the prefix and appending the suffix we get back to full IRI. For example, if we define the obo prefix to stand in place of the IRI as: http://purl.obolibrary.org/obo/, then the CURIE obo:UBERON_0002280 can be expanded to http://purl.obolibrary.org/obo/UBERON_0002280, which is the UBERON Anatomy term for ‘otolith’. Any file that contains CURIEs need to define the prefixes in the file header.
A label is the textual, human readable name that is given to a term, class property or instance in an ontology.