Standardizing Prefixes, CURIEs, and URIs with the `curies` package¶

Uniform resource identifiers (URIs) and compact URIs (CURIEs) have become the predominant syntaxes for identifying concepts in linked data applications. Therefore, efficient, faultless, and idiomatic conversion between them is a crucial low-level utility whose need is ubiquitous across many codebases.

The quick version is 1) instantiate a "converter" object and 2) use it to expand and compress URIs like in

import curies

# Get a converter
converter = curies.get_obofoundry_converter()

>> > converter.compress("http://purl.obolibrary.org/obo/CHEBI_1")
'CHEBI:1'

>> > converter.expand("CHEBI:1")
'http://purl.obolibrary.org/obo/CHEBI_1'

Data Structures¶

Prefix Maps¶

A prefix map is a bijective mapping (i.e., no duplicate keys, no duplicate values) between CURIE prefixes and URI prefixes Found in semantic web applications like XML, RDF, SPARQL, and OWL

Example Prefix Map:

{
  "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
  "MONDO": "http://purl.obolibrary.org/obo/MONDO_",
  "GO": "http://purl.obolibrary.org/obo/GO_"
}

Extended Prefix Maps¶

Standard prefix maps and JSON-LD context have several shortcomings because they don't handle synonyms for CURIE prefixes nor URI prefixes. Therefore, we need a new format that can:

include an arbitrary set of prefixes for CURIE prefixes and URI prefixes
denote what's the preferred CURIE prefix and URI prefix
use a simple format that can be encoded in JSON, YAML, etc.

The extended prefix map (EPM) is a new format to support this. Here's a short example that only has one record:

[
  {
    "prefix": "CHEBI",
    "uri_prefix": "http://purl.obolibrary.org/obo/CHEBI_",
    "prefix_synonyms": [
      "chebi",
      ...
    ],
    "uri_prefix_synonyms": [
      "https://identifiers.org/chebi:"
    ]
  }
]

Loading a Context¶

There are several ways to load a converter:

From a pre-defined context, like the OBO Foundry, Bioregistry, Monarch, or others

From a prefix map

import curies

prefix_map = {
    "CHEBI": "http://purl.obolibrary.org/obo/CHEBI_",
}
converter = curies.load_prefix_map(prefix_map)

From an extended prefix map. This is the preferred format since it supports synonyms for CURIE prefixes and URI prefixes. See here.
From a JSON-LD context

A more details on each of these can be found here.

Chaining and Merging¶

curies implements a faultless chain operation curies.chain() that is configurable for case sensitivity and fully considers all synonyms.

curies.chain() prioritizes based on the order given. Therefore, if two prefix maps having the same prefix but different URI prefixes are given, the first is retained. The second is retained as a synonym

import curies

c1 = curies.load_prefix_map({"GO": "http://purl.obolibrary.org/obo/GO_"})
c2 = curies.load_prefix_map({"GO": "https://identifiers.org/go:"})
converter = curies.chain([c1, c2])

>> > converter.expand("GO:1234567")
'http://purl.obolibrary.org/obo/GO_1234567'
>> > converter.compress("http://purl.obolibrary.org/obo/GO_1234567")
'GO:1234567'
>> > converter.compress("https://identifiers.org/go:1234567")
'GO:1234567'

Chain is the perfect tool if you want to override parts of an existing extended prefix map. For example, if you want to use most of the Bioregistry, but you would like to specify a custom URI prefix (e.g., using Identifiers.org), you can do the following

import curies

overrides = curies.load_prefix_map({"pubmed": "https://identifiers.org/pubmed:"})
bioregistry_converter = curies.get_bioregistry_converter()
converter = curies.chain([overrides, bioregistry_converter])

>> > converter.expand("pubmed:1234")
'https://identifiers.org/pubmed:1234'

More information can be found here.

Reconciliation¶

Reconciliation is the high-level process of modifying an (extended) prefix map with domain-specific rules. This is important as it allows for building on existing (extended) prefix maps without having to start from scratch. Further, storing the rules to transform an existing prefix map allows for high-level discussion about the differences and their reasons.

As a specific example, the Bioregistry uses snomedct as a preferred prefix for the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT). The OBO Foundry community prefers to use SCTID as the preferred prefix for this resource. Rather than maintaining a different extended prefix map than the Bioregistry, the OBO Foundry community could enumerate its preferred modifications to the base (extended) prefix map, then create its prefix map by transforming the Bioregistry’s.

Similarly, a consumer of the OBO Foundry prefix map who’s implementing a resolver might want to override the URI prefix associated with the Ontology of Vaccine Adverse Events (OVAE) to point towards the Ontology Lookup Service instead of the default OntoBee.

There are two operations that are useful for transforming an existing (extended) prefix map:

Remapping is when a given CURIE prefix or URI prefix is replaced with another. See curies.remap_curie_prefixes() and curies.remap_uri_prefixes().
Rewiring is when the correspondence between a CURIE prefix and URI prefix is updated. See curies.rewire().

Simple CURIE Remapping¶

If we start with the following extended prefix map,

[
  {
    "prefix": "a",
    "uri_prefix": "https://example.org/a/",
    "prefix_synonyms": [
      "a1"
    ]
  },
  {
    "prefix": "b",
    "uri_prefix": "https://example.org/b/"
  }
]

we can apply the remapping {"a": "a1"} to get:

[
  {
    "prefix": "a1",
    "uri_prefix": "https://example.org/a/",
    "prefix_synonyms": [
      "a"
    ]
  },
  {
    "prefix": "b",
    "uri_prefix": "https://example.org/b/"
  }
]

Notice that the old prefix is retained as a synonym.

Transitive CURIE REmapping¶

There’s a special case of CURIE prefix remapping where one prefix is supposed to overwrite another. For example, in the Bioregistry, the Gene Expression Omnibus is given the prefix geo and the Geographical Entity Ontology is given the prefix geogeo. OBO Foundry users will want to rename the Gene Expression Omnibus record to something else like ncbi.geoand rename geogeo to geo. Taken by themselves, these two operations would not accomplish the desired results:

Remapping with {"geo": "ncbi.geo"} would retain geo as a CURIE prefix synonym
Remapping with {"geogeo": "geo"} would not change the mapping as geo is already part of a different record.

The curies.remap_curie_prefixes() implements special logic to identify scenarios where two (or more) remappings are dependent (we’re calling these transitive remappings) and apply them in the expected way. Therefore, we see the following:

from curies import Converter, Record, remap_curie_prefixes

converter = Converter([
    Record(prefix="geo", uri_prefix="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc="),
    Record(prefix="geogeo", uri_prefix="http://purl.obolibrary.org/obo/GEO_"),
])
remapping = {"geo": "ncbi.geo", "geogeo": "geo"}
converter = remap_curie_prefixes(converter, curie_remapping)

>> > converter.records
[
    Record(
        prefix="geo",
        prefix_synonyms=["geogeo"],
        uri_prefix="http://purl.obolibrary.org/obo/GEO_",
    ),
    Record(
        prefix="ncbi.geo",
        uri_prefix="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=",
    ),
]

geogeo is maintained as a CURIE prefix synonym for the Geographical Entity Ontology’s record. Synonyms of Gene Expression Omnibus would also be retained.

More information about reconciliation can be found here.

Standardizing Prefixes, CURIEs, and URIs with the curies package¶