Skip to main content
Version: 9.4.0

Search

The library provides functions for converting FHIR search expressions into Spark Columns. These columns can be used to filter resources based on search criteria defined in the FHIR specification.

Search parameters provide a standardised way to filter FHIR resources. For example, you can filter patients by gender, birth date, or active status using the same search syntax used in FHIR API queries.

Basic filtering

The search_to_column function converts a FHIR search expression into a boolean Column that can be used with the filter operation.

In this example, we filter patients by gender using a simple search parameter.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Filter patients by gender.
gender_filter = pc.search_to_column("Patient", "gender=male")
patients.filter(gender_filter).select("id", "gender", "name.family").show()

Results in:

idgenderfamily
8ee183e2-b3c0-4151-be94-b945d6aa8c6dmaleRunte378
93ee0b14-4f22-4c1a-93e2-b4e5c0d7f0d6maleSmith

Boolean logic

AND logic

Multiple search parameters can be combined using &, which applies AND logic. All conditions must be satisfied for a resource to match.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Filter patients by gender AND active status.
combined_filter = pc.search_to_column("Patient", "gender=male&active=true")
patients.filter(combined_filter).select("id", "gender", "active").show()

Results in:

idgenderactive
8ee183e2-b3c0-4151-be94-b945d6aa8c6dmaletrue

OR logic

Multiple values for the same parameter can be combined using commas, which applies OR logic. A resource matches if any of the values match.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Filter patients with gender male OR female.
or_filter = pc.search_to_column("Patient", "gender=male,female")
patients.filter(or_filter).select("id", "gender").show()

Results in:

idgender
8ee183e2-b3c0-4151-be94-b945d6aa8c6dmale
7b4d8c2f-9a3e-4d5b-8c1f-2e3d4c5b6a7dfemale

Comparison prefixes

Search parameters support prefixes for comparisons on dates, numbers, and quantities. The following prefixes are supported:

  • eq (equal, default)
  • ne (not equal)
  • lt (less than)
  • le (less than or equal)
  • gt (greater than)
  • ge (greater than or equal)

Date comparisons

Prefixes can be applied to date search parameters to filter based on temporal relationships.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Filter patients born on or after 1990-01-01.
date_filter = pc.search_to_column("Patient", "birthdate=ge1990-01-01")
patients.filter(date_filter).select("id", "birthDate").show()

Results in:

idbirthDate
93ee0b14-4f22-4c1a-93e2-b4e5c0d7f0d61995-06-15

Quantity comparisons

Prefixes also apply to quantity parameters, enabling filtering based on numeric values with units.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
observations = data_source.read("Observation")

# Filter observations with value greater than or equal to 80 mmHg.
quantity_filter = pc.search_to_column("Observation", "value-quantity=ge80|http://unitsofmeasure.org|mm[Hg]")
observations.filter(quantity_filter).select("id", "code.coding.code", "valueQuantity.value", "valueQuantity.unit").show()

Results in:

idcodevalueunit
1a2b3c4d-5e6f-7g8h-9i0j-1k2l3m4n5o6p85354-9120.0mm[Hg]
2b3c4d5e-6f7g-8h9i-0j1k-2l3m4n5o6p7q8480-690.0mm[Hg]

Search parameter types

Different FHIR search parameter types support different matching behaviours.

Quantity parameters

Quantity parameters match numeric values with units. The syntax is [prefix]value|system|code where the system and code identify the unit.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
observations = data_source.read("Observation")

# Filter observations by quantity with specific unit.
quantity_filter = pc.search_to_column("Observation", "value-quantity=5.4|http://unitsofmeasure.org|mmol/L")
observations.filter(quantity_filter).select("id", "valueQuantity.value", "valueQuantity.unit").show()

Results in:

idvalueunit
3c4d5e6f-7g8h-9i0j-1k2l-3m4n5o6p7q8r5.4mmol/L

String parameters

String parameters perform case-insensitive partial matching by default. The search value matches if it appears anywhere within the target string.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Filter patients by family name containing "smith".
name_filter = pc.search_to_column("Patient", "family=smith")
patients.filter(name_filter).select("id", "name.family").show()

Results in:

idfamily
93ee0b14-4f22-4c1a-93e2-b4e5c0d7f0d6Smith
4d5e6f7g-8h9i-0j1k-2l3m-4n5o6p7q8r9sGoldsmith

Reference parameters

Reference parameters filter resources based on references to other resources. The value can be a resource ID or a full reference.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
observations = data_source.read("Observation")

# Filter observations by patient reference.
ref_filter = pc.search_to_column("Observation", "subject=Patient/8ee183e2-b3c0-4151-be94-b945d6aa8c6d")
observations.filter(ref_filter).select("id", "subject.reference").show()

Results in:

idreference
5e6f7g8h-9i0j-1k2l-3m4n-5o6p7q8r9s0tPatient/8ee183e2-b3c0-4151-be94-b945d6aa8c6d

Number parameters

Number parameters match numeric values without units. Prefixes can be used for range comparisons.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
risk_assessments = data_source.read("RiskAssessment")

# Filter risk assessments by probability.
number_filter = pc.search_to_column("RiskAssessment", "probability=gt0.5")
risk_assessments.filter(number_filter).select("id", "prediction.probabilityDecimal").show()

Results in:

idprobabilityDecimal
6f7g8h9i-0j1k-2l3m-4n5o-6p7q8r9s0t1u0.75

URI parameters

URI parameters match Uniform Resource Identifiers exactly. These are commonly used for identifiers, profiles, and code system URIs.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Filter patients by identifier system.
uri_filter = pc.search_to_column("Patient", "identifier=http://example.org/fhir/identifier|")
patients.filter(uri_filter).select("id", "identifier.system", "identifier.value").show()

Results in:

idsystemvalue
8ee183e2-b3c0-4151-be94-b945d6aa8c6dhttp://example.org/fhir/identifierMRN123456

Search modifiers

Modifiers alter the behaviour of search parameters. They are appended to the parameter name using a colon.

:not modifier

The :not modifier negates the search condition, matching resources where the parameter does NOT have the specified value.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Filter patients where gender is NOT male.
not_filter = pc.search_to_column("Patient", "gender:not=male")
patients.filter(not_filter).select("id", "gender").show()

Results in:

idgender
7b4d8c2f-9a3e-4d5b-8c1f-2e3d4c5b6a7dfemale
9c0d1e2f-3a4b-5c6d-7e8f-9g0h1i2j3k4lunknown

:exact modifier

The :exact modifier changes string matching from case-insensitive partial matching to case-sensitive exact matching.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Filter patients with exact family name "Smith".
exact_filter = pc.search_to_column("Patient", "family:exact=Smith")
patients.filter(exact_filter).select("id", "name.family").show()

Results in:

idfamily
93ee0b14-4f22-4c1a-93e2-b4e5c0d7f0d6Smith

Note that "smith" (lowercase) and "Goldsmith" would not match with the :exact modifier.

FHIRPath expressions

For more complex filtering requirements beyond what search parameters support, you can use FHIRPath expressions directly.

Using fhirpath_to_column

The fhirpath_to_column method provides direct access to the FHIRPath engine, allowing you to evaluate arbitrary FHIRPath expressions against resources.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Filter patients using FHIRPath expression.
fhirpath_filter = pc.fhirpath_to_column("Patient", "name.family contains 'Smith'")
patients.filter(fhirpath_filter).select("id", "name.family").show()

Results in:

idfamily
93ee0b14-4f22-4c1a-93e2-b4e5c0d7f0d6Smith
4d5e6f7g-8h9i-0j1k-2l3m-4n5o6p7q8r9sGoldsmith

Combining filters

Multiple search column expressions can be combined using boolean operators to create complex filter conditions.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Create separate filters.
male_filter = pc.search_to_column("Patient", "gender=male")
female_filter = pc.search_to_column("Patient", "gender=female")

# Combine with OR logic using | operator.
gender_filter = male_filter | female_filter

patients.filter(gender_filter).select("id", "gender").show()

Results in:

idgender
8ee183e2-b3c0-4151-be94-b945d6aa8c6dmale
7b4d8c2f-9a3e-4d5b-8c1f-2e3d4c5b6a7dfemale

Empty query

An empty search expression matches all resources, which is useful for dynamic filtering scenarios where the filter may be conditionally applied.

from pathling import PathlingContext

pc = PathlingContext.create()
data_source = pc.read.ndjson("data/ndjson")
patients = data_source.read("Patient")

# Empty search expression matches all resources.
all_filter = pc.search_to_column("Patient", "")
patients.filter(all_filter).select("id", "gender").show()

Results in:

idgender
8ee183e2-b3c0-4151-be94-b945d6aa8c6dmale
7b4d8c2f-9a3e-4d5b-8c1f-2e3d4c5b6a7dfemale
93ee0b14-4f22-4c1a-93e2-b4e5c0d7f0d6male