pathling.cli package

Submodules

pathling.cli.config module

Configuration resolution for the Pathling command line interface.

A single config file is selected by the precedence explicit --config > project-local pathling.toml (in the current working directory) > user-level ${XDG_CONFIG_HOME:-~/.config}/pathling/config.toml > none. Files are never merged: keys absent from the chosen file fall back to built-in defaults, never to another file. The chosen file’s values then resolve with the precedence flag > config file > built-in default. When a project-local file is discovered, a one-line notice naming it is emitted via the on_notice callback. Secret values for authentication may be supplied as a literal, a @/path/to/file reference, or via an environment variable.

Author: John Grimes.

class pathling.cli.config.BulkAuth(client_id: str, token_endpoint: str | None = None, private_key_jwk: str | None = None, client_secret: str | None = None, scope: str | None = None)[source]

Bases: object

SMART backend services authentication settings for bulk export.

Parameters:
  • client_id – the OAuth2 client identifier.

  • private_key_jwk – the resolved private key JWK, or None.

  • client_secret – the resolved client secret, or None.

  • token_endpoint – the OAuth2 token endpoint.

  • scope – an optional OAuth2 scope.

client_id: str
client_secret: str | None = None
property mechanism: str

A human-readable name for the credential mechanism in use.

Returns:

a description of the credential type for error messages.

private_key_jwk: str | None = None
scope: str | None = None
token_endpoint: str | None = None
class pathling.cli.config.CliConfig(tx_server: str = 'https://tx.ontoserver.csiro.au/fhir', tx_auth: ~pathling.cli.config.TxAuth | None = None, fhir_version: str = 'R4', verbose: bool = False, config_path: ~pathlib.Path | None = None, spark_conf: dict = <factory>, bulk_auth_table: dict | None = None)[source]

Bases: object

Resolved global configuration for a single invocation.

Parameters:
  • tx_server – the terminology server URL.

  • tx_auth – terminology authentication settings, or None.

  • fhir_version – the FHIR version code.

  • verbose – whether verbose logging and stack traces are enabled.

  • config_path – the path the config file was read from, or None.

  • spark_conf – the resolved, validated, and merged Spark configuration map to apply when a session is built; empty when nothing is set.

  • bulk_auth_table – the parsed [bulk-auth] table from the chosen config file, or None when absent. Carried so export resolves bulk credentials from the already-loaded config rather than re-reading a file.

bulk_auth_table: dict | None = None
config_path: Path | None = None
fhir_version: str = 'R4'
spark_conf: dict
tx_auth: TxAuth | None = None
tx_server: str = 'https://tx.ontoserver.csiro.au/fhir'
verbose: bool = False
class pathling.cli.config.TxAuth(client_id: str | None = None, client_secret: str | None = None, token_endpoint: str | None = None, scope: str | None = None)[source]

Bases: object

Terminology server authentication settings.

Parameters:
  • client_id – the OAuth2 client identifier.

  • client_secret – the resolved client secret, or None.

  • token_endpoint – the OAuth2 token endpoint.

  • scope – an optional OAuth2 scope.

client_id: str | None = None
client_secret: str | None = None
property enabled: bool

Whether enough has been supplied to attempt authentication.

Returns:

True when a client identifier and token endpoint are present.

scope: str | None = None
token_endpoint: str | None = None
pathling.cli.config.default_config_path() Path[source]

Computes the default config file path, honouring XDG_CONFIG_HOME.

Returns:

the path to the default config file location.

pathling.cli.config.load_config_file(path: Path, on_warning: Callable[[str], None] | None = None) dict[source]

Loads a config file, warning about unknown keys.

Parameters:
  • path – the config file path.

  • on_warning – an optional callback invoked with each warning message; defaults to writing to stderr.

Returns:

the parsed config as a dict, or an empty dict when the file is absent.

pathling.cli.config.resolve_bulk_auth(file_bulk_auth: dict | None, client_id: str | None = None, client_secret: str | None = None, private_key_jwk: str | None = None, token_endpoint: str | None = None, scope: str | None = None, env: dict | None = None) BulkAuth | None[source]

Resolves bulk export authentication from flags and the config file.

Authentication is considered configured only when a client identifier is present. Secret values are resolved as a literal, a @file reference, or the PATHLING_CLIENT_SECRET / PATHLING_PRIVATE_KEY_JWK environment variables.

Parameters:
  • file_bulk_auth – the parsed [bulk-auth] table, or None.

  • client_id – the --client-id flag value, or None.

  • client_secret – the --client-secret flag value, or None.

  • private_key_jwk – the --private-key-jwk flag value, or None.

  • token_endpoint – the --token-endpoint flag value, or None.

  • scope – the --scope flag value, or None.

  • env – the environment mapping for secret resolution.

Returns:

a populated BulkAuth, or None when no auth input is given.

Raises:

CliError – when the auth configuration is incomplete or ambiguous.

pathling.cli.config.resolve_config(tx_server: str | None = None, tx_client_id: str | None = None, tx_client_secret: str | None = None, tx_token_endpoint: str | None = None, tx_scope: str | None = None, fhir_version: str | None = None, spark_conf_flags: dict | None = None, verbose: bool = False, config_path: Path | None = None, cwd: Path | None = None, env: dict | None = None, on_warning: Callable[[str], None] | None = None, on_notice: Callable[[str], None] | None = None) CliConfig[source]

Resolves global configuration from flags, the config file, and defaults.

A single config file is selected by the precedence explicit --config > project-local pathling.toml (in cwd) > user-level config file > none, then its values flow through the precedence flag > config file > built-in default. Files are never merged: keys absent from the chosen file fall back to built-in defaults, never to another file.

Parameters:
  • tx_server – the --tx-server flag value, or None.

  • tx_client_id – the --tx-client-id flag value, or None.

  • tx_client_secret – the --tx-client-secret flag value, or None.

  • tx_token_endpoint – the --tx-token-endpoint flag value, or None.

  • tx_scope – the --tx-scope flag value, or None.

  • fhir_version – the --fhir-version flag value, or None.

  • spark_conf_flags – the parsed --spark-conf flag map, or None; flag values override the [spark] table for the same key.

  • verbose – the --verbose flag value.

  • config_path – an explicit config file path, or None for discovery.

  • cwd – the directory searched for a project-local pathling.toml; defaults to the current working directory.

  • env – the environment mapping for secret resolution.

  • on_warning – an optional warning callback passed to the file loader and used to surface the managed Spark-package version-override warning; defaults to writing to stderr so warnings appear even in quiet mode.

  • on_notice – an optional callback invoked with a one-line notice when a project-local pathling.toml is discovered and used.

Returns:

the resolved CliConfig.

Raises:

CliError – if the resolved FHIR version is unsupported.

pathling.cli.config.resolve_config_source(config_path: Path | None, cwd: Path) tuple[Optional[pathlib.Path], str][source]

Selects the single config file to load and reports where it came from.

Exactly one file is chosen, by the precedence explicit --config > project-local pathling.toml in cwd > user-level config file > none. Values are never merged across files.

Parameters:
  • config_path – an explicit --config path, or None.

  • cwd – the directory searched for a project-local pathling.toml.

Returns:

a tuple of the chosen path and an origin tag, one of "explicit", "project", "user", or "none". The "none" case returns the (non-existent) user-level path so the existing “missing file yields defaults” behaviour is preserved.

pathling.cli.config.resolve_secret(value: str | None, env_var: str | None = None, env: dict | None = None) str | None[source]

Resolves a secret value from a literal, a @file reference, or an environment variable.

A value beginning with @ is treated as a path to a file whose stripped contents are returned. When value is None and env_var is given, the environment variable is consulted.

Parameters:
  • value – the literal value, @path reference, or None.

  • env_var – the name of a fallback environment variable, or None.

  • env – the environment mapping to read from; defaults to os.environ.

Returns:

the resolved secret, or None when nothing is available.

Raises:

CliError – if a @file reference cannot be read.

pathling.cli.console module

The pathling console command.

Opens an interactive IPython session with spark (the Spark session) and pathling (the configured Pathling context) bound in the user namespace, after a banner identifying the version and the variables in scope. IPython is imported inside the command body so that --help stays fast.

Author: John Grimes.

pathling.cli.console.build_banner() str[source]

Builds the banner shown before the console’s first prompt.

The banner identifies the Pathling and Python versions, lists the variables in scope, and explains how to exit.

Returns:

the banner text.

pathling.cli.convert module

The pathling convert command.

Reads any supported FHIR data source and writes ndjson, Parquet, or Delta to an output path, with save-mode control and a summary of the resource types and row counts written.

Author: John Grimes.

pathling.cli.departition module

Departitioning of Spark directory output into a single file.

Spark’s distributed writers produce a directory of part files rather than a single file. To restore the single-file experience expected from the command line, this module moves the single data part file out of that directory to the exact path the user requested and removes the directory. The operation runs over the Hadoop FileSystem associated with the target path, so it works uniformly for local, S3, HDFS, and other Hadoop-compatible destinations, keeping the move on a single filesystem.

Author: John Grimes.

pathling.cli.departition.departition(spark, source_dir: str | Path, target_path: str | Path, part_extension: str) None[source]

Moves the single data part file from a Spark output directory to a path.

Lists source_dir over the Hadoop FileSystem of target_path, selects the data part files (those ending in part_extension, ignoring Spark markers such as _SUCCESS and .crc checksums), and:

  • with exactly one data file, moves it to target_path;

  • with no data files (an empty result), creates an empty target_path;

  • with more than one data file, raises an error rather than choosing one.

The source directory is always removed afterwards.

Parameters:
  • spark – the Spark session, used to reach the JVM gateway and Hadoop configuration.

  • source_dir – the Spark output directory to departition.

  • target_path – the destination path for the single data file.

  • part_extension – the part-file extension to select, without a leading dot (e.g. "csv", "json", "parquet").

Raises:

CliError – when the source directory contains more than one data file.

pathling.cli.departition.remove_path(spark, path: str | Path) None[source]

Removes a file or directory over its Hadoop FileSystem, if it exists.

Used to clear an existing target before an overwrite and to clean up the temporary departition directory. The operation is idempotent: a path that does not exist is left untouched.

Parameters:
  • spark – the Spark session, used to reach the JVM gateway and Hadoop configuration.

  • path – the path to remove.

pathling.cli.errors module

Error handling for the Pathling command line interface.

JVM exceptions raised through Py4J carry verbose Java stack traces that are unhelpful at the command line. This module unwraps them to their root message and maps recognised categories onto concise, actionable guidance. Stack traces are shown only when the user passes --verbose.

Author: John Grimes.

exception pathling.cli.errors.CliError(message: str, exit_code: int = 1)[source]

Bases: Exception

An error with a message that is safe to show the user directly.

Parameters:
  • message – the human-readable error message.

  • exit_code – the process exit code to use; defaults to a runtime failure.

pathling.cli.errors.friendly_message(exc: BaseException, verbose: bool = False, server_url: str | None = None) str[source]

Builds a friendly, actionable message for an unexpected exception.

Parameters:
  • exc – the exception to describe.

  • verbose – when True, append the full traceback.

  • server_url – a server URL to name in connection errors, or None.

Returns:

the message to present to the user.

pathling.cli.errors.is_auth_error(exc: BaseException) bool[source]

Determines whether an exception represents an authentication failure.

Used to decide whether an export failure should be reported as an authentication problem; connection, timeout, and server-side errors that happen to occur while authentication is configured must not be misdiagnosed as bad credentials.

Parameters:

exc – the exception to classify.

Returns:

True when the unwrapped message looks like an authentication failure.

pathling.cli.errors.is_connection_error(exc: BaseException) bool[source]

Determines whether an exception represents a server connection failure.

Parameters:

exc – the exception to classify.

Returns:

True when the unwrapped message looks like a connection failure.

pathling.cli.errors.unwrap_java_exception(exc: BaseException) str[source]

Extracts the most useful single-line message from an exception.

Py4J wraps Java exceptions, whose str representation includes the full stack trace. This returns the leading message line, stripping the Java exception class prefix where present.

Parameters:

exc – the exception to unwrap.

Returns:

a concise message describing the underlying problem.

pathling.cli.export module

The pathling export command.

Performs a FHIR Bulk Data export (system, group, or patient level) with the library’s filters and SMART backend services authentication, downloading ndjson to the output directory and reporting a summary of files and resource counts.

Author: John Grimes.

pathling.cli.fhirpath module

The pathling fhirpath command.

Evaluates a FHIRPath expression against either a whole data source (one row of id and result per resource) or a single FHIR resource JSON file (one type and result row per result item). Supports context expressions, named variables, and a FHIR search --filter in data source mode.

Author: John Grimes.

pathling.cli.io module

Data source detection and reading for the Pathling command line interface.

Format is auto-detected from the contents of the input path, with an explicit --from override. Detection and validation run before any Spark session is started so that obvious mistakes (a missing or empty path, an ambiguous directory) fail quickly with an actionable message.

Author: John Grimes.

class pathling.cli.io.SourceFormat[source]

Bases: object

The recognised data source formats.

BUNDLES = 'bundles'
DELTA = 'delta'
NDJSON = 'ndjson'
PARQUET = 'parquet'
RESOURCE = 'resource'
class pathling.cli.io.SourceSpec(path: Path, format: str)[source]

Bases: object

A resolved data source input.

Parameters:
  • path – the input path; verified to exist.

  • format – one of the SourceFormat values.

format: str
path: Path
pathling.cli.io.detect_format(path: Path, allow_resource: bool = False) str[source]

Auto-detects the format of a data source path.

Parameters:
  • path – the input path.

  • allow_resource – when True, a single JSON file is treated as a single FHIR resource rather than rejected.

Returns:

the detected SourceFormat value.

Raises:

CliError – when the path is missing or empty, or the format cannot be determined unambiguously.

pathling.cli.io.discover_bundle_resource_types(path: Path) List[str][source]

Discovers the distinct resource types contained in a directory of FHIR Bundles.

Parameters:

path – the directory of bundle JSON files.

Returns:

a sorted list of distinct resource type codes found in the bundle entries.

Raises:

CliError – when no resource types can be found.

pathling.cli.io.read_single_resource(path: Path) tuple[source]

Reads a single FHIR resource JSON file.

Parameters:

path – the path to the resource file.

Returns:

a tuple of (resource_type, resource_json_string).

Raises:

CliError – when the file cannot be read or has no resource type.

pathling.cli.io.read_source(pc, spec: SourceSpec, types: List[str] | None = None)[source]

Reads a SourceSpec into a Pathling DataSource.

Parameters:
  • pc – the PathlingContext to read with.

  • spec – the resolved source specification.

  • types – the resource types to read from a Bundles source. When provided (non-empty), these are used directly and the driver-side discovery pass is skipped; when None or empty, discovery enumerates the types. Ignored for non-Bundles formats (FR-015).

Returns:

a Pathling DataSource for the input.

Raises:

CliError – when the format is not a data source format.

pathling.cli.io.resolve_source(path_str: str, from_format: str | None = None, allow_resource: bool = False) SourceSpec[source]

Resolves a source path and format, validating before Spark startup.

Parameters:
  • path_str – the positional source path.

  • from_format – an explicit --from override, or None to auto-detect.

  • allow_resource – whether a single resource JSON file is permitted.

Returns:

the resolved SourceSpec.

Raises:

CliError – when the path is invalid or the format cannot be resolved.

pathling.cli.main module

The root command group for the Pathling command line interface.

This module defines the global options, resolves configuration, registers every subcommand, and installs a single central error handler that turns exceptions (including unwrapped JVM exceptions) into concise messages with appropriate exit codes. Heavy imports (PySpark, the JVM-backed library) are deferred to command execution so that --help and --version stay fast.

Author: John Grimes.

class pathling.cli.main.CliContext(config: CliConfig, console: Console)[source]

Bases: object

The object carried on the Click context for every command.

Parameters:
  • config – the resolved global configuration.

  • console – the stderr console for progress and error output.

config: CliConfig
console: Console
class pathling.cli.main.PathlingCli(name: str | None = None, commands: MutableMapping[str, Command] | Sequence[Command] | None = None, **attrs: Any)[source]

Bases: Group

The root group with centralised error handling and exit codes.

invoke(ctx: Context)[source]

Invokes a command, mapping errors to friendly messages and codes.

Parameters:

ctx – the Click context.

Returns:

the command’s return value on success.

pathling.cli.render module

Output rendering and progress reporting for the command line interface.

Tabular results render to a human-readable table by default, with CSV and NDJSON alternatives for piping, and file output (CSV, NDJSON, Parquet, and Delta) via -o. File output is produced by Spark’s native writers and, by default, departitioned to a single file at the requested path. Data is written to stdout; progress, status, and the write confirmation go to stderr so that piped output stays clean.

Author: John Grimes.

class pathling.cli.render.OutputFormat[source]

Bases: object

The recognised output formats.

CSV = 'csv'
DELTA = 'delta'
NDJSON = 'ndjson'
PARQUET = 'parquet'
TABLE = 'table'
class pathling.cli.render.OutputSpec(path: Path | None, format: str, limit: int = 50, overwrite: bool = False, departition: bool = True)[source]

Bases: object

Describes where and how results leave the CLI.

Parameters:
  • path – the output path, or None to write to stdout.

  • format – one of the OutputFormat values.

  • limit – the row cap for stdout table rendering.

  • overwrite – whether an existing output path may be replaced.

  • departition – whether file output is departitioned to a single file (the default) rather than left as a Spark directory of part files.

departition: bool = True
format: str
limit: int = 50
overwrite: bool = False
path: Path | None
pathling.cli.render.check_overwrite(path: Path, overwrite: bool) None[source]

Fails when an output path already exists and overwrite was not requested.

Parameters:
  • path – the output path to check.

  • overwrite – whether overwriting is permitted.

Raises:

CliError – when the path exists and overwrite is False.

pathling.cli.render.infer_format_from_extension(path: Path) str | None[source]

Infers an output format from a file extension.

Parameters:

path – the output path.

Returns:

the inferred format, or None when the extension is unrecognised.

pathling.cli.render.output_options(func)[source]

Applies the shared output options to a command callback.

These options form the common output surface of every command that emits a result DataFrame - --format, -o/--output, --limit, --overwrite, and --departition/--no-departition - and are resolved together by resolve_output() and consumed by write_output().

Parameters:

func – the command callback to decorate.

Returns:

the decorated callback.

pathling.cli.render.progress_status(console: Console, message: str, verbose: bool = False)[source]

Shows a status spinner on stderr for a long-running stage.

Parameters:
  • console – the stderr console.

  • message – the status message to display.

  • verbose – when True, print a plain line instead of a spinner so that it does not interfere with verbose log output.

pathling.cli.render.render_csv(columns: Sequence[str], rows: Sequence[Sequence]) str[source]

Renders rows as CSV with a header line.

Parameters:
  • columns – the column names.

  • rows – the row values.

Returns:

the CSV text.

pathling.cli.render.render_ndjson(columns: Sequence[str], rows: Sequence[Sequence]) str[source]

Renders rows as newline-delimited JSON objects.

Parameters:
  • columns – the column names.

  • rows – the row values.

Returns:

the NDJSON text.

pathling.cli.render.render_rows(columns: Sequence[str], rows: Sequence[Sequence], fmt: str) str[source]

Renders rows in the requested stdout format.

Parameters:
  • columns – the column names.

  • rows – the row values.

  • fmt – one of table, csv, or ndjson.

Returns:

the rendered text.

Raises:

CliError – when the format cannot render to stdout.

pathling.cli.render.render_table(columns: Sequence[str], rows: Sequence[Sequence]) str[source]

Renders rows as a human-readable table with a row-count caption.

Parameters:
  • columns – the column names.

  • rows – the row values.

Returns:

the rendered table as a string.

pathling.cli.render.resolve_output(output_path: str | None, format_flag: str | None, limit: int = 50, overwrite: bool = False, departition: bool = True) OutputSpec[source]

Resolves and validates output options.

Parameters:
  • output_path – the -o path, or None for stdout.

  • format_flag – the --format value, or None to default/infer.

  • limit – the stdout table row cap.

  • overwrite – whether replacing an existing output path is allowed.

  • departition – whether file output is departitioned to a single file.

Returns:

the resolved OutputSpec.

Raises:

CliError – when the combination of options is invalid.

pathling.cli.render.stderr_console() Console[source]

Creates a Rich console that writes to stderr.

Markup and highlighting are disabled so that arbitrary message text - error messages, file paths, and JVM error codes that may contain square brackets - is printed verbatim and never misinterpreted as Rich markup (which would otherwise raise a MarkupError).

Returns:

a console bound to stderr for progress and status messages.

pathling.cli.render.write_output(df, spec: OutputSpec, console: Console) None[source]

Writes a result DataFrame to stdout or a file per the output spec.

For stdout, the table format is capped at spec.limit rows; other formats stream the full result. File output is produced by Spark’s native writers (see _write_file()) and confirmed with a single stderr line naming the format and path.

Parameters:
  • df – the result Spark DataFrame.

  • spec – the resolved output specification.

  • console – the stderr console for the confirmation message.

Raises:

CliError – when the output path exists without --overwrite.

pathling.cli.run module

The pathling run command.

Executes user-supplied Python code - from a script file, standard input, or an inline -c option - with spark (the Spark session) and pathling (the configured Pathling context) bound in the code’s global scope, reproducing Python interpreter script semantics (sys.argv, __main__, __file__, sys.path, traceback fidelity, and SystemExit propagation).

Author: John Grimes.

class pathling.cli.run.CodeSource(text: str, filename: str, argv0: str, path_entry: str, file_attr: str | None)[source]

Bases: object

A resolved source of program text for execution.

Parameters:
  • text – the program source code.

  • filename – the filename to compile under, which appears in tracebacks and syntax errors (the script path, <stdin>, or <string>).

  • argv0 – the value for sys.argv[0], following Python interpreter conventions (the script path, -, or -c).

  • path_entry – the entry to prepend to sys.path (the script’s directory for files, "" for stdin and inline code).

  • file_attr – the value for __file__ in the program’s globals, or None to leave it unset (stdin and inline code).

argv0: str
file_attr: str | None
filename: str
path_entry: str
text: str
class pathling.cli.run.RunCommand(name: str | None, context_settings: MutableMapping[str, Any] | None = None, callback: Callable[[...], Any] | None = None, params: List[Parameter] | None = None, help: str | None = None, epilog: str | None = None, short_help: str | None = None, options_metavar: str | None = '[OPTIONS]', add_help_option: bool = True, no_args_is_help: bool = False, hidden: bool = False, deprecated: bool = False)[source]

Bases: Command

A command that records its raw argument list before parsing.

The raw arguments are needed to distinguish run script.py -c CODE (a usage error: two code sources) from run -c CODE a b (inline code with trailing arguments), which parse to the same option values.

parse_args(ctx, args)[source]

Stores the raw arguments on the context, then parses as normal.

Parameters:
  • ctx – the Click context.

  • args – the raw argument list for this command.

Returns:

the remaining arguments after parsing.

pathling.cli.session module

Lazy Spark session and Pathling context creation for the CLI.

The Spark session is created only when a command actually needs it, behind a status spinner on stderr. Spark and JVM logging is suppressed by default by pointing the driver JVM at a packaged log4j2 configuration before launch, and lowering the log level once the context exists; --verbose leaves logging at its defaults.

Author: John Grimes.

pathling.cli.session.create_context(config: CliConfig, console: Console | None = None)[source]

Creates a PathlingContext configured from the CLI settings.

In the default (non-verbose) mode the JVM launcher’s startup banner and Ivy dependency-resolution report - which Spark prints to file descriptor 2 before any log configuration can take effect in local mode - are swallowed by redirecting that descriptor for the duration of session creation, while the status spinner is routed to a preserved copy of the real stderr so that progress remains visible.

Parameters:
  • config – the resolved CLI configuration.

  • console – the stderr console for the status spinner; created when None.

Returns:

a configured PathlingContext.

pathling.cli.session.quiet_log4j2_path() str[source]

Resolves a filesystem path to the packaged quiet log4j2 configuration.

The configuration ships as static package data, so no per-run temporary file is created. The materialised path is kept valid for the process lifetime so the driver JVM can read it during and after session start.

Returns:

the filesystem path to the quiet log4j2 properties file.

pathling.cli.sparkconf module

Parsing, validation, coercion, and merge for user-supplied Spark settings.

This module holds the pure logic that turns [spark] config-table entries and --spark-conf KEY=VALUE flags into the effective Spark configuration applied when a session is built. Keys must begin with spark.; scalar values are coerced to the string form Spark expects; values support the existing @file/environment secret resolution. The three managed keys (spark.jars.packages, spark.sql.extensions, spark.sql.catalog.spark_catalog) are merged with item-level protection against Pathling’s managed defaults so the library keeps working while user additions take effect. None of this requires PySpark, so it runs before any Spark session starts.

Author: John Grimes.

pathling.cli.sparkconf.merge_spark_conf(user_map: dict, on_warning: Callable[[str], None] | None = None) dict[source]

Merges the user Spark map with Pathling’s managed defaults.

Plain keys pass through unchanged. The managed list keys (spark.jars.packages, spark.sql.extensions) are unioned with the managed defaults and deduplicated. spark.sql.catalog.spark_catalog is dropped when it equals the managed Delta catalog and is an error otherwise. Only keys the user actually set appear in the result; keys they did not touch are left to the session builder’s managed defaults.

Parameters:
  • user_map – the validated, coerced, and resolved user Spark map.

  • on_warning – a callback for the managed-version-override warning, or None to suppress it.

Returns:

the effective Spark configuration to apply on top of the defaults.

Raises:

CliError – if the session catalog is set to a non-Delta value.

pathling.cli.sparkconf.parse_spark_conf_flags(flags) dict[source]

Parses repeatable --spark-conf KEY=VALUE flags into a mapping.

Each flag is split on the first = only, so a value may itself contain = (for example a JVM option). When the same key appears more than once, the last occurrence wins.

Parameters:

flags – an iterable of raw KEY=VALUE flag strings.

Returns:

a mapping of key to value, with later duplicates overriding earlier.

Raises:

CliError – if a flag is not of the form KEY=VALUE (exit code 2).

pathling.cli.sparkconf.resolve_spark_conf(file_table: dict | None, flag_map: dict | None, env: dict | None = None) dict[source]

Combines, validates, coerces, and secret-resolves the user Spark map.

The flag map overrides the file table per key. Each entry’s key and value are then validated and coerced, and string values are passed through the existing secret resolver so a @file reference is read from disk.

Parameters:
  • file_table – the parsed [spark] table, or None.

  • flag_map – the parsed --spark-conf flag map, or None.

  • env – the environment mapping for secret resolution.

Returns:

the validated, coerced, and resolved user Spark map.

Raises:

CliError – if a key or value is invalid, or a @file reference cannot be read.

pathling.cli.sparkconf.validate_and_coerce(key: str, value) str[source]

Validates a Spark configuration key and coerces its value to a string.

The key must begin with spark.. Scalar values (string, integer, float, boolean) are coerced to the string form Spark expects, with booleans rendered as true/false. Any other value type (a TOML array or table) is rejected.

Parameters:
  • key – the Spark configuration key.

  • value – the raw value from the config table or a flag.

Returns:

the value coerced to a string.

Raises:

CliError – if the key is not prefixed spark. or the value is not a scalar (exit code 2).

pathling.cli.terminology module

The Pathling terminology commands.

Each command reads a tabular dataset (CSV or Parquet), builds codings from a named code column plus either a fixed system URI or a per-row system column, calls the corresponding library terminology function, appends the result column(s), and emits the augmented dataset per the shared output options.

Author: John Grimes.

pathling.cli.view module

The pathling view command.

Executes a SQL on FHIR ViewDefinition (a JSON file or inline string) against a data source and emits the tabular result in the requested format, optionally restricting the resources processed with a FHIR search --filter.

Author: John Grimes.

Module contents

Command line interface for Pathling.

This subpackage exposes the Pathling Python library through a flat, verb-based command tree installed as the pathling console script. Modules are kept free of eager PySpark imports so that --help and --version remain fast; the Spark session is created lazily by pathling.cli.session only when a command needs it.

Author: John Grimes.