Skip to main content
Version: 8.0.1

Adding display terms to codes

Healthcare data often contains coded values that are not immediately human-readable. For example, a SNOMED CT code like 444814009 represents "Viral sinusitis", but without the corresponding display term, the data is difficult to interpret. This tutorial demonstrates how to use Pathling's terminology functions along with a FHIR terminology service to add human-readable display terms to codes in your datasets. Note that this guide is not specific to FHIR data - you can use these techniques with any dataset containing coded medical information.

Why display terms matter

Codes serve as standardised identifiers for clinical concepts, but they present several challenges:

  • Interpretation difficulty: Codes like 403190006 or 72892002 are meaningless without context
  • Data analysis barriers: Analysts need readable terms to understand patterns and trends
  • Reporting requirements: Human-readable reports require display terms alongside codes
  • Quality assurance: Display terms help validate that codes match intended concepts

Pathling's terminology functions solve these challenges by connecting to FHIR terminology servers to retrieve official display terms, synonyms, and other metadata for medical codes.

Loading data with codes

For this tutorial, we'll work with a sample dataset containing medical diagnosis codes. Let's load a CSV file with condition records:

# Load sample data with medical condition codes
csv = pc.spark.read.options(header=True).csv("conditions.csv")

# Display the structure of our data
csv.show()

Our example dataset contains the following columns:

  • PATIENT: Unique patient identifier
  • ENCOUNTER: Unique encounter identifier
  • CODE: SNOMED CT diagnosis codes
  • DESCRIPTION: Original description of the condition
  • START: Start date of the condition
  • STOP: End date of the condition (if resolved)

Adding display terms for SNOMED CT codes

The most common use case is adding display terms for SNOMED CT diagnosis codes. The display() function retrieves the preferred term for each code:

# Add SNOMED CT display terms
result = csv.withColumn("DISPLAY_NAME", display(to_snomed_coding(csv.CODE)))

# Show results
result.select("CODE", "DESCRIPTION", "DISPLAY_NAME").show(truncate=False)

This produces results like:

CODEDESCRIPTIONDISPLAY_NAME
65363002Otitis mediaOtitis media
16114001Fracture of ankleFracture of ankle
444814009Viral sinusitis (disorder)Viral sinusitis
43878008Streptococcal sore throat (disorder)Streptococcal sore throat
403190006First degree burnEpidermal burn of skin

Notice how the display terms often provide cleaner, more concise terminology. For example, "(disorder)" suffixes are removed, and some codes get more precise clinical terms like "Epidermal burn of skin" instead of "First degree burn".

Working with different code systems

Pathling supports multiple terminology systems beyond SNOMED CT. You can work with any code system supported by your FHIR terminology server by using the to_coding() function to specify the system URI. Here's an example using ICD-10-NL (Dutch ICD-10) codes:

from pathling import to_coding

# Add display terms for ICD-10-NL codes
icd_enriched = icd_data.withColumn(
"ICD_DISPLAY",
display(to_coding(icd_data.CODE, "http://hl7.org/fhir/sid/icd-10-nl"))
)

Requesting display terms in different languages

Many terminology servers support multi-language display terms, allowing you to retrieve terms in the language most appropriate for your users. Pathling provides language support through the accept_language parameter, which follows the HTTP Accept-Language header format.

from pathling import to_loinc_coding

# Create sample LOINC observation data
loinc_data = pc.spark.createDataFrame([
("8302-2", "Body Height"),
("29463-7", "Body Weight"),
("718-7", "Hemoglobin [Mass/volume] in Blood"),
("8867-4", "Heart rate")
], ["CODE", "DESCRIPTION"])

# Get display terms in French
loinc_french = loinc_data.withColumn(
"DISPLAY_FR",
display(to_loinc_coding(loinc_data.CODE), accept_language="fr")
)

# Get display terms in multiple languages
loinc_multilingual = loinc_data.withColumn(
"DISPLAY_EN",
display(to_loinc_coding(loinc_data.CODE)) # Default language (English)
).withColumn(
"DISPLAY_FR",
display(to_loinc_coding(loinc_data.CODE), accept_language="fr")
).withColumn(
"DISPLAY_DE",
display(to_loinc_coding(loinc_data.CODE), accept_language="de")
)

# Show the multilingual results
loinc_multilingual.select("CODE", "DISPLAY_EN", "DISPLAY_FR",
"DISPLAY_DE").show(truncate=False)

This produces multilingual display terms like:

CODEDISPLAY_ENDISPLAY_FRDISPLAY_DE
8302-2Body HeightTaille du patient [Longueur] Patient ; NumériqueKörpergröße
29463-7Body WeightPoids corporel [Masse] Patient ; NumériqueKörpergewicht
718-7Hemoglobin [Mass/volume] in BloodHémoglobine [Masse/Volume] Sang ; NumériqueHämoglobin [Masse/Volumen] in Blut
8867-4Heart rateFréquence cardiaqueHerzfrequenz

Language configuration options

You can configure language preferences in several ways:

  1. Set a default language for the entire session:

    pc = PathlingContext.create(accept_language="fr")
  2. Override language per function call:

    display(coding, accept_language="de")
  3. Use weighted preferences (server will use the first available):

    display(coding, accept_language="fr;q=0.9,en;q=0.5")

Note that language support depends on your terminology server's capabilities and the specific code system. LOINC and SNOMED CT generally have good multi-language support, while other systems may be more limited.

Working with large datasets

Pathling includes an internal terminology request cache that automatically optimises repeated code lookups. This means you can process large datasets directly without worrying about duplicate terminology server requests.

The internal cache ensures that once a code has been looked up, subsequent requests for the same code return immediately without contacting the terminology server. This makes the simple, direct approach both performant and easy to understand.

Further reading