pathling package

Submodules

pathling.coding module

class pathling.coding.Coding(system: str, code: str, version: Optional[str] = None, display: Optional[str] = None, user_selected: Optional[bool] = None)[source]

Bases: object

A Coding represents a code in a code system. See: https://hl7.org/fhir/R4/datatypes.html#Coding

to_literal()[source]

Converts a Coding into a Column that contains a Coding struct. The Coding struct Column can be used as an input to terminology functions such as member_of and translate. :return: a Column containing a Coding struct

pathling.context module

class pathling.context.PathlingContext(spark: SparkSession, jpc: JavaObject)[source]

Bases: object

Main entry point for Pathling API functionality. Should be instantiated with the PathlingContext.create() class method.

Example use:

ptl = PathlingContext.create(spark)
patient_df = ptl.encode(spark.read.text('ndjson_resources'), 'Patient')
patient_df.show()
classmethod create(spark: Optional[SparkSession] = None, fhir_version: Optional[str] = None, max_nesting_level: Optional[int] = None, enable_extensions: Optional[bool] = None, enabled_open_types: Optional[Sequence[str]] = None, terminology_server_url: Optional[str] = None, token_endpoint: Optional[str] = None, client_id: Optional[str] = None, client_secret: Optional[str] = None, scope: Optional[str] = None, token_expiry_tolerance: Optional[int] = None) PathlingContext[source]

Creates a PathlingContext with the given configuration options.

If no SparkSession is provided, and there is not one already present in this process - a new SparkSession will be created.

If a SparkSession is not provided, and one is already running within the current process, it will be reused - and it is assumed that the Pathling library API JAR is already on the classpath. If you are running your own cluster, make sure it is on the list of packages.

If a SparkSession is provided, it needs to include the Pathling library API JAR on its classpath. You can get the path for the JAR (which is bundled with the Python package) using the pathling.etc.find_jar method.

Parameters:
  • spark – the SparkSession instance.

  • fhir_version – the FHIR version to use. Must a valid FHIR version string. Defaults to R4.

  • max_nesting_level – the maximum nesting level for recursive data types. Zero (0) indicates that all direct or indirect fields of type T in element of type T should be skipped

  • enable_extensions – switches on/off the support for FHIR extensions

  • enabled_open_types – list of types that are encoded within open types, such as extensions

  • terminology_server_url – the URL of the FHIR terminology server used to resolve terminology queries

  • token_endpoint – an OAuth2 token endpoint for use with the client credentials grant

  • client_id – a client ID for use with the client credentials grant

  • client_secret – a client secret for use with the client credentials grant

  • scope – a scope value for use with the client credentials grant

  • token_expiry_tolerance – the minimum number of seconds that a token should have before expiry when deciding whether to send it with a terminology request

Returns:

a DataFrame containing the given resource encoded into Spark columns

encode(df: DataFrame, resource_name: str, input_type: Optional[str] = None, column: Optional[str] = None) DataFrame[source]

Takes a dataframe with a string representations of FHIR resources in the given column and encodes the resources of the given types as Spark dataframe.

Parameters:
  • df – a DataFrame containing the resources to encode.

  • resource_name – the name of the FHIR resource to extract (Condition, Observation, etc).

  • input_type – the mime type of input string encoding. Defaults to application/fhir+json.

  • column – the column in which the resources to encode are stored. If None then the input dataframe is assumed to have one column of type string.

Returns:

a DataFrame containing the given type of resources encoded into Spark columns

encode_bundle(df: DataFrame, resource_name: str, input_type: Optional[str] = None, column: Optional[str] = None) DataFrame[source]

Takes a dataframe with a string representations of FHIR bundles in the given column and encodes the resources of the given types as Spark dataframe.

Parameters:
  • df – a DataFrame containing the bundles with the resources to encode.

  • resource_name – the name of the FHIR resource to extract (condition, observation, etc).

  • input_type – the mime type of input string encoding. Defaults to application/fhir+json.

  • column – the column in which the resources to encode are stored. If None then the input dataframe is assumed to have one column of type string.

Returns:

a DataFrame containing the given type of resources encoded into Spark columns

member_of(df: DataFrame, coding_column: Column, value_set_uri: str, output_column_name: str)[source]

Takes a dataframe with a Coding column as input. A new column is created which contains a Boolean value, indicating whether the input Coding is a member of the specified FHIR ValueSet.

Parameters:
  • df – a DataFrame containing the input data

  • coding_column – a Column containing a struct representation of a Coding

  • value_set_uri – an identifier for a FHIR ValueSet

  • output_column_name – the name of the result column

Returns:

A new dataframe with an additional column containing the result of the operation.

property spark: SparkSession

Returns the SparkSession associated with this context.

subsumes(df: DataFrame, output_column_name: str, left_coding_column: Optional[Column] = None, right_coding_column: Optional[Column] = None, left_coding: Optional[Coding] = None, right_coding: Optional[Coding] = None)[source]

Takes a dataframe with two Coding columns. A new column is created which contains a Boolean value, indicating whether the left Coding subsumes the right Coding.

Parameters:
  • df – a DataFrame containing the input data

  • left_coding_column – a Column containing a struct representation of a Coding,

for the left-hand side of the subsumption test :param right_coding_column: a Column containing a struct representation of a Coding, for the right-hand side of the subsumption test :param left_coding: a Coding object for the left-hand side of the subsumption test :param right_coding: a Coding object for the right-hand side of the subsumption test :param output_column_name: the name of the result column :return: A new dataframe with an additional column containing the result of the operation.

translate(df: DataFrame, coding_column: Column, concept_map_uri: str, reverse: Optional[bool] = False, equivalence: Optional[str] = 'equivalent', output_column_name: Optional[str] = 'result')[source]

Takes a dataframe with a Coding column as input. A new column is created which contains a Coding value and contains translation targets from the specified FHIR ConceptMap. There may be more than one target concept for each input concept.

Parameters:
  • df – a DataFrame containing the input data

  • coding_column – a Column containing a struct representation of a Coding

  • concept_map_uri – an identifier for a FHIR ConceptMap

  • reverse – the direction to traverse the map - false results in “source to target”

mappings, while true results in “target to source” :param equivalence: a comma-delimited set of values from the ConceptMapEquivalence ValueSet :param output_column_name: the name of the result column :return: A new dataframe with an additional column containing the result of the operation.

pathling.etc module

pathling.etc.find_jar(verbose: bool = False) str[source]

Gets the path to the pathling encoders jar bundled with the python distribution

pathling.fhir module

class pathling.fhir.MimeType[source]

Bases: object

Constants for FHIR encoding mime types.

FHIR_JSON: str = 'application/fhir+json'
FHIR_XML: str = 'application/fhir+xml'
class pathling.fhir.Version[source]

Bases: object

Constants for FHIR versions.

R4: str = 'R4'

pathling.functions module

pathling.functions.to_coding(coding_column: Column, system: str, version: Optional[str] = None)[source]

Converts a Column containing codes into a Column that contains a Coding struct. The Coding struct Column can be used as an input to terminology functions such as member_of and translate. :param coding_column: the Column containing the codes :param system: the URI of the system the codes belong to :param version: the version of the code system :return: a Column containing a Coding struct

pathling.functions.to_ecl_value_set(ecl: str) str[source]

Converts a SNOMED CT ECL expression into a FHIR ValueSet URI. Can be used with the member_of function. :param ecl: the ECL expression :return: the ValueSet URI

Module contents

class pathling.PathlingContext(spark: SparkSession, jpc: JavaObject)[source]

Bases: object

Main entry point for Pathling API functionality. Should be instantiated with the PathlingContext.create() class method.

Example use:

ptl = PathlingContext.create(spark)
patient_df = ptl.encode(spark.read.text('ndjson_resources'), 'Patient')
patient_df.show()
classmethod create(spark: Optional[SparkSession] = None, fhir_version: Optional[str] = None, max_nesting_level: Optional[int] = None, enable_extensions: Optional[bool] = None, enabled_open_types: Optional[Sequence[str]] = None, terminology_server_url: Optional[str] = None, token_endpoint: Optional[str] = None, client_id: Optional[str] = None, client_secret: Optional[str] = None, scope: Optional[str] = None, token_expiry_tolerance: Optional[int] = None) PathlingContext[source]

Creates a PathlingContext with the given configuration options.

If no SparkSession is provided, and there is not one already present in this process - a new SparkSession will be created.

If a SparkSession is not provided, and one is already running within the current process, it will be reused - and it is assumed that the Pathling library API JAR is already on the classpath. If you are running your own cluster, make sure it is on the list of packages.

If a SparkSession is provided, it needs to include the Pathling library API JAR on its classpath. You can get the path for the JAR (which is bundled with the Python package) using the pathling.etc.find_jar method.

Parameters:
  • spark – the SparkSession instance.

  • fhir_version – the FHIR version to use. Must a valid FHIR version string. Defaults to R4.

  • max_nesting_level – the maximum nesting level for recursive data types. Zero (0) indicates that all direct or indirect fields of type T in element of type T should be skipped

  • enable_extensions – switches on/off the support for FHIR extensions

  • enabled_open_types – list of types that are encoded within open types, such as extensions

  • terminology_server_url – the URL of the FHIR terminology server used to resolve terminology queries

  • token_endpoint – an OAuth2 token endpoint for use with the client credentials grant

  • client_id – a client ID for use with the client credentials grant

  • client_secret – a client secret for use with the client credentials grant

  • scope – a scope value for use with the client credentials grant

  • token_expiry_tolerance – the minimum number of seconds that a token should have before expiry when deciding whether to send it with a terminology request

Returns:

a DataFrame containing the given resource encoded into Spark columns

encode(df: DataFrame, resource_name: str, input_type: Optional[str] = None, column: Optional[str] = None) DataFrame[source]

Takes a dataframe with a string representations of FHIR resources in the given column and encodes the resources of the given types as Spark dataframe.

Parameters:
  • df – a DataFrame containing the resources to encode.

  • resource_name – the name of the FHIR resource to extract (Condition, Observation, etc).

  • input_type – the mime type of input string encoding. Defaults to application/fhir+json.

  • column – the column in which the resources to encode are stored. If None then the input dataframe is assumed to have one column of type string.

Returns:

a DataFrame containing the given type of resources encoded into Spark columns

encode_bundle(df: DataFrame, resource_name: str, input_type: Optional[str] = None, column: Optional[str] = None) DataFrame[source]

Takes a dataframe with a string representations of FHIR bundles in the given column and encodes the resources of the given types as Spark dataframe.

Parameters:
  • df – a DataFrame containing the bundles with the resources to encode.

  • resource_name – the name of the FHIR resource to extract (condition, observation, etc).

  • input_type – the mime type of input string encoding. Defaults to application/fhir+json.

  • column – the column in which the resources to encode are stored. If None then the input dataframe is assumed to have one column of type string.

Returns:

a DataFrame containing the given type of resources encoded into Spark columns

member_of(df: DataFrame, coding_column: Column, value_set_uri: str, output_column_name: str)[source]

Takes a dataframe with a Coding column as input. A new column is created which contains a Boolean value, indicating whether the input Coding is a member of the specified FHIR ValueSet.

Parameters:
  • df – a DataFrame containing the input data

  • coding_column – a Column containing a struct representation of a Coding

  • value_set_uri – an identifier for a FHIR ValueSet

  • output_column_name – the name of the result column

Returns:

A new dataframe with an additional column containing the result of the operation.

property spark: SparkSession

Returns the SparkSession associated with this context.

subsumes(df: DataFrame, output_column_name: str, left_coding_column: Optional[Column] = None, right_coding_column: Optional[Column] = None, left_coding: Optional[Coding] = None, right_coding: Optional[Coding] = None)[source]

Takes a dataframe with two Coding columns. A new column is created which contains a Boolean value, indicating whether the left Coding subsumes the right Coding.

Parameters:
  • df – a DataFrame containing the input data

  • left_coding_column – a Column containing a struct representation of a Coding,

for the left-hand side of the subsumption test :param right_coding_column: a Column containing a struct representation of a Coding, for the right-hand side of the subsumption test :param left_coding: a Coding object for the left-hand side of the subsumption test :param right_coding: a Coding object for the right-hand side of the subsumption test :param output_column_name: the name of the result column :return: A new dataframe with an additional column containing the result of the operation.

translate(df: DataFrame, coding_column: Column, concept_map_uri: str, reverse: Optional[bool] = False, equivalence: Optional[str] = 'equivalent', output_column_name: Optional[str] = 'result')[source]

Takes a dataframe with a Coding column as input. A new column is created which contains a Coding value and contains translation targets from the specified FHIR ConceptMap. There may be more than one target concept for each input concept.

Parameters:
  • df – a DataFrame containing the input data

  • coding_column – a Column containing a struct representation of a Coding

  • concept_map_uri – an identifier for a FHIR ConceptMap

  • reverse – the direction to traverse the map - false results in “source to target”

mappings, while true results in “target to source” :param equivalence: a comma-delimited set of values from the ConceptMapEquivalence ValueSet :param output_column_name: the name of the result column :return: A new dataframe with an additional column containing the result of the operation.