Skip to content

Basic SPARQL for OBO Engineers

In this tutorial we introduce SPARQL, with a particular spin on how we use it across OBO ontologies. Following this tutorial should give you a sense of how we use SPARQL across OBO, without going too much into technical details. You can find concrete tutorials on how to generate reports or QC checks with ROBOT and ODK towards the end of this page.

Preparation

SPARQL tools for OBO Engineers

  • RENCI Ubergraph Endpoint: Many key OBO ontologies are loaded here with lots of materialised inferences (docs).
  • Ontobee SPARQL endpoint: Useful to run queries across all OBO Foundry ontologies.
  • Yasgui: Yasgui is a simple and beautiful front-end for SPARQL endpoints which can be used not only to query, but also to share queries with others. For example this simple SPARQL query runs across the RENCI Ubergraph Endpoint.
  • GTF: A UI that allows one to run SPARQL queries on TTL files on the web, or upload them. Looks like its based on Yasgui, as it shares the same share functionality.
  • ROBOT query: ROBOT method to generate TSV reports from SPARQL queries, and applying data transformations (--update). ROBOT uses Jena internally to execute SPARQL queries.
  • ROBOT verify: ROBOT method to run SPARQL QC queries. If the query returns a result, the QC test fails.
  • ROBOT report: ROBOT report is a more powerful approach to running OBO QC queries. The default OBO report which ships with ROBOT can be customised by changing the error level, removing a test entirely and even extending the report to custom (SPARQL) checks. Robot report can generate beautiful HTML reports which are easy to read.

SPARQL in the OBO-sphere

SPARQL has many uses in the OBO-sphere, but the following in particular:

  1. Quality control checking
  2. Creating summary tables for ontologies
  3. Sophisticated data transformations in ontology pipelines

We will discuss each of these in the following and give examples. An informal discussion of SPARQL in OBO can be followed here:

Quality control checking

For us, ROBOT + SPARQL were a game changer for our quality control (QC) pipelines. This is how it works. First, we encode the error in the form of a SPARQL query (we sometimes call this "anti-pattern", i.e. an undesirable (anti-) representation). For example, the following check simply looks for entities that have more than one definition:

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT DISTINCT ?entity ?property ?value WHERE {
  VALUES ?property { obo:IAO_0000115
                     obo:IAO_0000600 }
  ?entity ?property ?value .
  ?entity ?property ?value2 .
  FILTER (?value != ?value2)
  FILTER NOT EXISTS { ?entity owl:deprecated true }
  FILTER (!isBlank(?entity))
}
ORDER BY ?entity

This is a typical workflow. Think of an ontology editor working on an ontology. Often, that curator notices that the same problem happens repeatedly and tell us, the Ontology Pipeline Developer, that they would like a check to prevent the error. We then capture the erroneous situation as a SPARQL query. Then, we add it to our ontology repository, and execute it with ROBOT report or ROBOT verify (see above) in our CI pipelines, usually based on GitHub actions or Travis. Note that the Ontology Development Kit provides a built-in framework for for such queries build on ROBOT verify and report.

Creating summary tables for ontologies

Many times, we need to create tabular reports of our ontologies to share with stakeholders or to help with internal reviews, e.g.:

  • create lists of ontology terms with their definitions and labels
  • create summaries of ontologies, like aggregate statistics

Sometimes using Yasgui, for example in conjunction with the RENCI Ubergraph Endpoint, is enough, but often, using ROBOT query is the better choice, especially if you want to make sure the right version of the ontology is used (Ubergraph occasionally is out of date).

Using ROBOT in conjunction with a Workflows Automation system like Github actions helps with generating up-to-date reports. Here is an example of a GitHub action that generates a few reports with ROBOT and pushes them back to the repository.

A note for Data Scientists

In many cases we are asked how to best "load an ontology" into a python notebook or similar. Very often the answer is that it is best to first extract the content of the ontology into a table form, and then load it using a CSV reader like pandas. In this scenario, the workflow for interacting with ontologies is:

  1. Define the information you want in the form of a SPARQL query.
  2. Extract the the information as a TSV table using ROBOT query.
  3. Load the information into your notebook.

If combined with for example a Makefile, you can always ensure that the report generation process is fully reproducible as well.

Sophisticated data transformations in ontology pipelines

Lastly, we use ROBOT query to implement complex ontology transformation processes. For example the following complex query transforms related synonyms to exact synonyms if some complex condition is met:

prefix owl: <http://www.w3.org/2002/07/owl#>
prefix oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

DELETE {
  ?term oboInOwl:hasRelatedSynonym ?related .
  ?relax a owl:Axiom ;
       owl:annotatedSource ?term ;
       owl:annotatedProperty oboInOwl:hasRelatedSynonym ;
       owl:annotatedTarget ?related ;
       oboInOwl:hasDbXref ?xref2 .
}

INSERT {
  ?relax a owl:Axiom ;
       owl:annotatedSource ?term ;
       owl:annotatedProperty oboInOwl:hasExactSynonym ;
       owl:annotatedTarget ?related ;
       oboInOwl:hasDbXref ?xref2 .
}
WHERE 
{ 
  { 
    ?term oboInOwl:hasRelatedSynonym ?related ;
      oboInOwl:hasExactSynonym ?exact ;
      a owl:Class .
      ?exax a owl:Axiom ;
           owl:annotatedSource ?term ;
           owl:annotatedProperty oboInOwl:hasExactSynonym ;
           owl:annotatedTarget ?exact ;
           oboInOwl:hasDbXref ?xref1 .
      ?relax a owl:Axiom ;
           owl:annotatedSource ?term ;
           owl:annotatedProperty oboInOwl:hasRelatedSynonym ;
           owl:annotatedTarget ?related ;
           oboInOwl:hasDbXref ?xref2 .

    FILTER (str(?related)=str(?exact))
    FILTER (isIRI(?term) && regex(str(?term), "^http://purl.obolibrary.org/obo/MONDO_"))
  }
}

This can be a very useful tool for bulk editing the ontology, in particular where it is difficult or impossible to achieve the same using regular expressions or other forms of "replacement"-techniques. Here are some example queries we collected to do such mass operations in Mondo.