Skip to main content

Terminology functions

The library also provides a set of functions for querying a FHIR terminology server from within your queries and transformations.

Value set membership

The member_of function can be used to test the membership of a code within a FHIR value set. This can be used with both explicit value sets (i.e. those that have been pre-defined and loaded into the terminology server) and implicit value sets (e.g. SNOMED CT Expression Constraint Language).

In this example, we take a list of SNOMED CT diagnosis codes and create a new column which shows which are viral infections. We use an ECL expression to define viral infection as a disease with a pathological process of "Infectious process", and a causative agent of "Virus".

from pathling import PathlingContext, to_snomed_coding, to_ecl_value_set, member_of

pc = PathlingContext.create()
csv = pc.spark.read.csv("conditions.csv")

VIRAL_INFECTION_ECL = """
<< 64572001|Disease| : (
<< 370135005|Pathological process| = << 441862004|Infectious process|,
<< 246075003|Causative agent| = << 49872002|Virus|
)
"""

csv.select(
"CODE",
"DESCRIPTION",
member_of(
to_snomed_coding(csv.CODE),
to_ecl_value_set(VIRAL_INFECTION_ECL)
).alias("VIRAL_INFECTION"),
).show()

Results in:

CODEDESCRIPTIONVIRAL_INFECTION
65363002Otitis mediafalse
16114001Fracture of anklefalse
444814009Viral sinusitistrue
444814009Viral sinusitistrue
43878008Streptococcal sore throatfalse

Concept translation

The translate function can be used to translate codes from one code system to another using maps that are known to the terminology server. In this example, we translate our SNOMED CT diagnosis codes into Read CTV3.

Please note that the type of the output column is the array of coding structs, as the translation may produce multiple results for each input coding.

from pathling import PathlingContext, to_snomed_coding, translate
from pyspark.sql.functions import explode_outer

pc = PathlingContext.create()
csv = pc.spark.read.csv("conditions.csv")

translate_result = csv.withColumn(
"READ_CODES",
translate(
to_snomed_coding(csv.CODE),
concept_map_uri="http://snomed.info/sct/900000000000207008?"
"fhir_cm=900000000000497000",
).code,
)
translate_result.select(
"CODE", "DESCRIPTION", explode_outer("READ_CODES").alias("READ_CODE")
).show()

Results in:

CODEDESCRIPTIONREAD_CODE
65363002Otitis mediaX00ik
16114001Fracture of ankleS34..
444814009Viral sinusitisXUjp0
444814009Viral sinusitisXUjp0
43878008Streptococcal sore throatA340.

Subsumption testing

Subsumption test is a fancy way of saying "is this code equal or a subtype of this other code".

For example, a code representing "ankle fracture" is subsumed by another code representing "fracture". The "fracture" code is more general, and using it with subsumption can help us find other codes representing different subtypes of fracture.

The subsumes function allows us to perform subsumption testing on codes within our data. The order of the left and right operands can be reversed to query whether a code is "subsumed by" another code.

from pathling import PathlingContext, Coding, to_snomed_coding, subsumes

pc = PathlingContext.create()
csv = pc.spark.read.csv("conditions.csv")

# 232208008 |Ear, nose and throat disorder|
left_coding = Coding('http://snomed.info/sct', '232208008')
right_coding_column = to_snomed_coding(csv.CODE)

csv.select(
'CODE', 'DESCRIPTION',
subsumes(left_coding, right_coding_column).alias('SUBSUMES')
).show()

Results in:

CODEDESCRIPTIONIS_ENT
65363002Otitis mediatrue
16114001Fracture of anklefalse
444814009Viral sinusitistrue

Retrieving properties

Some terminologies contain additional properties that are associated with codes. You can query these properties using the property_of function.

There is also a display function that can be used to retrieve the preferred display term for each code.

from pathling import PathlingContext, to_snomed_coding, property_of, display, PropertyType

pc = PathlingContext.create()
csv = pc.spark.read.csv("conditions.csv")

# Get the parent codes for each code in the dataset.
parents = csv.withColumn(
"PARENTS",
property_of(to_snomed_coding(csv.CODE), "parent", PropertyType.CODE),
)
# Split each parent code into a separate row.
exploded_parents = parents.selectExpr(
"CODE", "DESCRIPTION", "explode_outer(PARENTS) AS PARENT"
)
# Retrieve the preferred term for each parent code.
with_displays = exploded_parents.withColumn(
"PARENT_DISPLAY", display(to_snomed_coding(exploded_parents.PARENT))
)
with_displays.show()

Results in:

CODEDESCRIPTIONPARENTPARENT_DISPLAY
65363002Otitis media43275000Otitis
65363002Otitis media68996008Disorder of middle ear
16114001Fracture of ankle125603006Injury of ankle
16114001Fracture of ankle46866001Fracture of lower limb
444814009Viral sinusitis36971009Sinusitis
444814009Viral sinusitis281794004Viral upper respiratory tract infection
444814009Viral sinusitis363166002Infective disorder of head
444814009Viral sinusitis36971009Sinusitis
444814009Viral sinusitis281794004Viral upper respiratory tract infection
444814009Viral sinusitis363166002Infective disorder of head

Retrieving designations

Some terminologies contain additional display terms for codes. These can be used for language translations, synonyms, and more. You can query these terms using the designation function.

from pathling import PathlingContext, to_snomed_coding, Coding, designation

pc = PathlingContext.create()
csv = pc.spark.read.csv("conditions.csv")

# Get the synonyms for each code in the dataset.
synonyms = csv.withColumn(
"SYNONYMS",
designation(to_snomed_coding(csv.CODE),
Coding.of_snomed("900000000000013009")),
)
# Split each synonyms into a separate row.
exploded_synonyms = synonyms.selectExpr(
"CODE", "DESCRIPTION", "explode_outer(SYNONYMS) AS SYNONYM"
)
exploded_synonyms.show()

Results in:

CODEDESCRIPTIONSYNONYM
65363002Otitis mediaOM - Otitis media
16114001Fracture of ankleAnkle fracture
16114001Fracture of ankleFracture of distal end of tibia and fibula
444814009Viral sinusitis (disorder)NULL
444814009Viral sinusitis (disorder)NULL
43878008Streptococcal sore throat (disorder)Septic sore throat
43878008Streptococcal sore throat (disorder)Strep throat
43878008Streptococcal sore throat (disorder)Strept throat
43878008Streptococcal sore throat (disorder)Streptococcal angina
43878008Streptococcal sore throat (disorder)Streptococcal pharyngitis

Multi-language support

The library enables communication of a preferred language to the terminology server using the Accept-Language HTTP header, as described in Multi-language support in FHIR. The header may contain multiple languages, with weighted preferences as defined in RFC 9110. The server can use the header to return the result in the preferred language if it is able. The actual behaviour may depend on the server implementation and the code systems used.

The default value for the header can be configured during the creation of the PathlingContext with the accept_language or acceptLanguage parameter. The parameter with the same name can also be used to override the default value in display() and property_of() functions.

from pathling import PathlingContext, to_loinc_coding, property_of, display

# Configure the default language preferences to prioritise French.
pc = PathlingContext.create(accept_language="fr;q=0.9,en;q=0.5")
csv = pc.spark.read.csv("observations.csv")

# Get the display names with default language preferences (in French).
def_display = csv.withColumn(
"DISPLAY", display(to_loinc_coding(csv.CODE))
)

# Get the `display` property values with German as the preferred language.
def_and_german_display = def_display.withColumn(
"DISPLAY_DE",
property_of(to_loinc_coding(csv.CODE), "display",
accept_language="de-DE"),
)
def_and_german_display.show()

Results in:

CODEDESCRIPTIONDISPLAYDISPLAY_DE
8302-2Body HeightTaille du patient [Longueur] Patient ; NumériqueKörpergröße
29463-7Body WeightPoids corporel [Masse] Patient ; NumériqueKörpergewicht
718-7Hemoglobin [Mass/volume] in BloodHémoglobine [Masse/Volume] Sang ; NumériqueHämoglobin [Masse/Volumen] in Blut

Authentication

Pathling can be configured to connect to a protected terminology server by supplying a set of OAuth2 client credentials and a token endpoint.

Here is an example of how to authenticate to the NHS terminology server:

from pathling import PathlingContext

pc = PathlingContext.create(
terminology_server_url='https://ontology.nhs.uk/production1/fhir',
token_endpoint='https://ontology.nhs.uk/authorisation/auth/realms/nhs-digital-terminology/protocol/openid-connect/token',
client_id='[client ID]',
client_secret='[client secret]'
)