pathling.cli package
Submodules
pathling.cli.config module
Configuration resolution for the Pathling command line interface.
A single config file is selected by the precedence explicit --config >
project-local pathling.toml (in the current working directory) > user-level
${XDG_CONFIG_HOME:-~/.config}/pathling/config.toml > none. Files are never
merged: keys absent from the chosen file fall back to built-in defaults, never
to another file. The chosen file’s values then resolve with the precedence flag
> config file > built-in default. When a project-local file is discovered, a
one-line notice naming it is emitted via the on_notice callback. Secret
values for authentication may be supplied as a literal, a @/path/to/file
reference, or via an environment variable.
Author: John Grimes.
- class pathling.cli.config.BulkAuth(client_id: str, token_endpoint: str | None = None, private_key_jwk: str | None = None, client_secret: str | None = None, scope: str | None = None)[source]
Bases:
objectSMART backend services authentication settings for bulk export.
- Parameters:
client_id – the OAuth2 client identifier.
private_key_jwk – the resolved private key JWK, or None.
client_secret – the resolved client secret, or None.
token_endpoint – the OAuth2 token endpoint.
scope – an optional OAuth2 scope.
- client_id: str
- client_secret: str | None = None
- property mechanism: str
A human-readable name for the credential mechanism in use.
- Returns:
a description of the credential type for error messages.
- private_key_jwk: str | None = None
- scope: str | None = None
- token_endpoint: str | None = None
- class pathling.cli.config.CliConfig(tx_server: str = 'https://tx.ontoserver.csiro.au/fhir', tx_auth: ~pathling.cli.config.TxAuth | None = None, fhir_version: str = 'R4', verbose: bool = False, config_path: ~pathlib.Path | None = None, spark_conf: dict = <factory>, bulk_auth_table: dict | None = None)[source]
Bases:
objectResolved global configuration for a single invocation.
- Parameters:
tx_server – the terminology server URL.
tx_auth – terminology authentication settings, or None.
fhir_version – the FHIR version code.
verbose – whether verbose logging and stack traces are enabled.
config_path – the path the config file was read from, or None.
spark_conf – the resolved, validated, and merged Spark configuration map to apply when a session is built; empty when nothing is set.
bulk_auth_table – the parsed
[bulk-auth]table from the chosen config file, or None when absent. Carried soexportresolves bulk credentials from the already-loaded config rather than re-reading a file.
- bulk_auth_table: dict | None = None
- config_path: Path | None = None
- fhir_version: str = 'R4'
- spark_conf: dict
- tx_server: str = 'https://tx.ontoserver.csiro.au/fhir'
- verbose: bool = False
- class pathling.cli.config.TxAuth(client_id: str | None = None, client_secret: str | None = None, token_endpoint: str | None = None, scope: str | None = None)[source]
Bases:
objectTerminology server authentication settings.
- Parameters:
client_id – the OAuth2 client identifier.
client_secret – the resolved client secret, or None.
token_endpoint – the OAuth2 token endpoint.
scope – an optional OAuth2 scope.
- client_id: str | None = None
- client_secret: str | None = None
- property enabled: bool
Whether enough has been supplied to attempt authentication.
- Returns:
True when a client identifier and token endpoint are present.
- scope: str | None = None
- token_endpoint: str | None = None
- pathling.cli.config.default_config_path() Path[source]
Computes the default config file path, honouring
XDG_CONFIG_HOME.- Returns:
the path to the default config file location.
- pathling.cli.config.load_config_file(path: Path, on_warning: Callable[[str], None] | None = None) dict[source]
Loads a config file, warning about unknown keys.
- Parameters:
path – the config file path.
on_warning – an optional callback invoked with each warning message; defaults to writing to stderr.
- Returns:
the parsed config as a dict, or an empty dict when the file is absent.
- pathling.cli.config.resolve_bulk_auth(file_bulk_auth: dict | None, client_id: str | None = None, client_secret: str | None = None, private_key_jwk: str | None = None, token_endpoint: str | None = None, scope: str | None = None, env: dict | None = None) BulkAuth | None[source]
Resolves bulk export authentication from flags and the config file.
Authentication is considered configured only when a client identifier is present. Secret values are resolved as a literal, a
@filereference, or thePATHLING_CLIENT_SECRET/PATHLING_PRIVATE_KEY_JWKenvironment variables.- Parameters:
file_bulk_auth – the parsed
[bulk-auth]table, or None.client_id – the
--client-idflag value, or None.client_secret – the
--client-secretflag value, or None.private_key_jwk – the
--private-key-jwkflag value, or None.token_endpoint – the
--token-endpointflag value, or None.scope – the
--scopeflag value, or None.env – the environment mapping for secret resolution.
- Returns:
a populated
BulkAuth, or None when no auth input is given.- Raises:
CliError – when the auth configuration is incomplete or ambiguous.
- pathling.cli.config.resolve_config(tx_server: str | None = None, tx_client_id: str | None = None, tx_client_secret: str | None = None, tx_token_endpoint: str | None = None, tx_scope: str | None = None, fhir_version: str | None = None, spark_conf_flags: dict | None = None, verbose: bool = False, config_path: Path | None = None, cwd: Path | None = None, env: dict | None = None, on_warning: Callable[[str], None] | None = None, on_notice: Callable[[str], None] | None = None) CliConfig[source]
Resolves global configuration from flags, the config file, and defaults.
A single config file is selected by the precedence explicit
--config> project-localpathling.toml(incwd) > user-level config file > none, then its values flow through the precedence flag > config file > built-in default. Files are never merged: keys absent from the chosen file fall back to built-in defaults, never to another file.- Parameters:
tx_server – the
--tx-serverflag value, or None.tx_client_id – the
--tx-client-idflag value, or None.tx_client_secret – the
--tx-client-secretflag value, or None.tx_token_endpoint – the
--tx-token-endpointflag value, or None.tx_scope – the
--tx-scopeflag value, or None.fhir_version – the
--fhir-versionflag value, or None.spark_conf_flags – the parsed
--spark-confflag map, or None; flag values override the[spark]table for the same key.verbose – the
--verboseflag value.config_path – an explicit config file path, or None for discovery.
cwd – the directory searched for a project-local
pathling.toml; defaults to the current working directory.env – the environment mapping for secret resolution.
on_warning – an optional warning callback passed to the file loader and used to surface the managed Spark-package version-override warning; defaults to writing to stderr so warnings appear even in quiet mode.
on_notice – an optional callback invoked with a one-line notice when a project-local
pathling.tomlis discovered and used.
- Returns:
the resolved
CliConfig.- Raises:
CliError – if the resolved FHIR version is unsupported.
- pathling.cli.config.resolve_config_source(config_path: Path | None, cwd: Path) tuple[Optional[pathlib.Path], str][source]
Selects the single config file to load and reports where it came from.
Exactly one file is chosen, by the precedence explicit
--config> project-localpathling.tomlincwd> user-level config file > none. Values are never merged across files.- Parameters:
config_path – an explicit
--configpath, or None.cwd – the directory searched for a project-local
pathling.toml.
- Returns:
a tuple of the chosen path and an origin tag, one of
"explicit","project","user", or"none". The"none"case returns the (non-existent) user-level path so the existing “missing file yields defaults” behaviour is preserved.
- pathling.cli.config.resolve_secret(value: str | None, env_var: str | None = None, env: dict | None = None) str | None[source]
Resolves a secret value from a literal, a
@filereference, or an environment variable.A value beginning with
@is treated as a path to a file whose stripped contents are returned. Whenvalueis None andenv_varis given, the environment variable is consulted.- Parameters:
value – the literal value,
@pathreference, or None.env_var – the name of a fallback environment variable, or None.
env – the environment mapping to read from; defaults to
os.environ.
- Returns:
the resolved secret, or None when nothing is available.
- Raises:
CliError – if a
@filereference cannot be read.
pathling.cli.console module
The pathling console command.
Opens an interactive IPython session with spark (the Spark session) and
pathling (the configured Pathling context) bound in the user namespace,
after a banner identifying the version and the variables in scope. IPython is
imported inside the command body so that --help stays fast.
Author: John Grimes.
pathling.cli.convert module
The pathling convert command.
Reads any supported FHIR data source and writes ndjson, Parquet, or Delta to an output path, with save-mode control and a summary of the resource types and row counts written.
Author: John Grimes.
pathling.cli.departition module
Departitioning of Spark directory output into a single file.
Spark’s distributed writers produce a directory of part files rather than a
single file. To restore the single-file experience expected from the command
line, this module moves the single data part file out of that directory to the
exact path the user requested and removes the directory. The operation runs
over the Hadoop FileSystem associated with the target path, so it works
uniformly for local, S3, HDFS, and other Hadoop-compatible destinations,
keeping the move on a single filesystem.
Author: John Grimes.
- pathling.cli.departition.departition(spark, source_dir: str | Path, target_path: str | Path, part_extension: str) None[source]
Moves the single data part file from a Spark output directory to a path.
Lists
source_dirover the HadoopFileSystemoftarget_path, selects the data part files (those ending inpart_extension, ignoring Spark markers such as_SUCCESSand.crcchecksums), and:with exactly one data file, moves it to
target_path;with no data files (an empty result), creates an empty
target_path;with more than one data file, raises an error rather than choosing one.
The source directory is always removed afterwards.
- Parameters:
spark – the Spark session, used to reach the JVM gateway and Hadoop configuration.
source_dir – the Spark output directory to departition.
target_path – the destination path for the single data file.
part_extension – the part-file extension to select, without a leading dot (e.g.
"csv","json","parquet").
- Raises:
CliError – when the source directory contains more than one data file.
- pathling.cli.departition.remove_path(spark, path: str | Path) None[source]
Removes a file or directory over its Hadoop
FileSystem, if it exists.Used to clear an existing target before an overwrite and to clean up the temporary departition directory. The operation is idempotent: a path that does not exist is left untouched.
- Parameters:
spark – the Spark session, used to reach the JVM gateway and Hadoop configuration.
path – the path to remove.
pathling.cli.errors module
Error handling for the Pathling command line interface.
JVM exceptions raised through Py4J carry verbose Java stack traces that are
unhelpful at the command line. This module unwraps them to their root message
and maps recognised categories onto concise, actionable guidance. Stack traces
are shown only when the user passes --verbose.
Author: John Grimes.
- exception pathling.cli.errors.CliError(message: str, exit_code: int = 1)[source]
Bases:
ExceptionAn error with a message that is safe to show the user directly.
- Parameters:
message – the human-readable error message.
exit_code – the process exit code to use; defaults to a runtime failure.
- pathling.cli.errors.friendly_message(exc: BaseException, verbose: bool = False, server_url: str | None = None) str[source]
Builds a friendly, actionable message for an unexpected exception.
- Parameters:
exc – the exception to describe.
verbose – when True, append the full traceback.
server_url – a server URL to name in connection errors, or None.
- Returns:
the message to present to the user.
- pathling.cli.errors.is_auth_error(exc: BaseException) bool[source]
Determines whether an exception represents an authentication failure.
Used to decide whether an export failure should be reported as an authentication problem; connection, timeout, and server-side errors that happen to occur while authentication is configured must not be misdiagnosed as bad credentials.
- Parameters:
exc – the exception to classify.
- Returns:
True when the unwrapped message looks like an authentication failure.
- pathling.cli.errors.is_connection_error(exc: BaseException) bool[source]
Determines whether an exception represents a server connection failure.
- Parameters:
exc – the exception to classify.
- Returns:
True when the unwrapped message looks like a connection failure.
- pathling.cli.errors.unwrap_java_exception(exc: BaseException) str[source]
Extracts the most useful single-line message from an exception.
Py4J wraps Java exceptions, whose
strrepresentation includes the full stack trace. This returns the leading message line, stripping the Java exception class prefix where present.- Parameters:
exc – the exception to unwrap.
- Returns:
a concise message describing the underlying problem.
pathling.cli.export module
The pathling export command.
Performs a FHIR Bulk Data export (system, group, or patient level) with the library’s filters and SMART backend services authentication, downloading ndjson to the output directory and reporting a summary of files and resource counts.
Author: John Grimes.
pathling.cli.fhirpath module
The pathling fhirpath command.
Evaluates a FHIRPath expression against either a whole data source (one row of
id and result per resource) or a single FHIR resource JSON file (one
type and result row per result item). Supports context expressions,
named variables, and a FHIR search --filter in data source mode.
Author: John Grimes.
pathling.cli.io module
Data source detection and reading for the Pathling command line interface.
Format is auto-detected from the contents of the input path, with an explicit
--from override. Detection and validation run before any Spark session is
started so that obvious mistakes (a missing or empty path, an ambiguous
directory) fail quickly with an actionable message.
Author: John Grimes.
- class pathling.cli.io.SourceFormat[source]
Bases:
objectThe recognised data source formats.
- BUNDLES = 'bundles'
- DELTA = 'delta'
- NDJSON = 'ndjson'
- PARQUET = 'parquet'
- RESOURCE = 'resource'
- class pathling.cli.io.SourceSpec(path: Path, format: str)[source]
Bases:
objectA resolved data source input.
- Parameters:
path – the input path; verified to exist.
format – one of the
SourceFormatvalues.
- format: str
- path: Path
- pathling.cli.io.detect_format(path: Path, allow_resource: bool = False) str[source]
Auto-detects the format of a data source path.
- Parameters:
path – the input path.
allow_resource – when True, a single JSON file is treated as a single FHIR resource rather than rejected.
- Returns:
the detected
SourceFormatvalue.- Raises:
CliError – when the path is missing or empty, or the format cannot be determined unambiguously.
- pathling.cli.io.discover_bundle_resource_types(path: Path) List[str][source]
Discovers the distinct resource types contained in a directory of FHIR Bundles.
- Parameters:
path – the directory of bundle JSON files.
- Returns:
a sorted list of distinct resource type codes found in the bundle entries.
- Raises:
CliError – when no resource types can be found.
- pathling.cli.io.read_single_resource(path: Path) tuple[source]
Reads a single FHIR resource JSON file.
- Parameters:
path – the path to the resource file.
- Returns:
a tuple of (resource_type, resource_json_string).
- Raises:
CliError – when the file cannot be read or has no resource type.
- pathling.cli.io.read_source(pc, spec: SourceSpec, types: List[str] | None = None)[source]
Reads a
SourceSpecinto a PathlingDataSource.- Parameters:
pc – the
PathlingContextto read with.spec – the resolved source specification.
types – the resource types to read from a Bundles source. When provided (non-empty), these are used directly and the driver-side discovery pass is skipped; when None or empty, discovery enumerates the types. Ignored for non-Bundles formats (FR-015).
- Returns:
a Pathling
DataSourcefor the input.- Raises:
CliError – when the format is not a data source format.
- pathling.cli.io.resolve_source(path_str: str, from_format: str | None = None, allow_resource: bool = False) SourceSpec[source]
Resolves a source path and format, validating before Spark startup.
- Parameters:
path_str – the positional source path.
from_format – an explicit
--fromoverride, or None to auto-detect.allow_resource – whether a single resource JSON file is permitted.
- Returns:
the resolved
SourceSpec.- Raises:
CliError – when the path is invalid or the format cannot be resolved.
pathling.cli.main module
The root command group for the Pathling command line interface.
This module defines the global options, resolves configuration, registers every
subcommand, and installs a single central error handler that turns exceptions
(including unwrapped JVM exceptions) into concise messages with appropriate exit
codes. Heavy imports (PySpark, the JVM-backed library) are deferred to command
execution so that --help and --version stay fast.
Author: John Grimes.
- class pathling.cli.main.CliContext(config: CliConfig, console: Console)[source]
Bases:
objectThe object carried on the Click context for every command.
- Parameters:
config – the resolved global configuration.
console – the stderr console for progress and error output.
- console: Console
pathling.cli.render module
Output rendering and progress reporting for the command line interface.
Tabular results render to a human-readable table by default, with CSV and
NDJSON alternatives for piping, and file output (CSV, NDJSON, Parquet, and
Delta) via -o. File output is produced by Spark’s native writers and, by
default, departitioned to a single file at the requested path. Data is written
to stdout; progress, status, and the write confirmation go to stderr so that
piped output stays clean.
Author: John Grimes.
- class pathling.cli.render.OutputFormat[source]
Bases:
objectThe recognised output formats.
- CSV = 'csv'
- DELTA = 'delta'
- NDJSON = 'ndjson'
- PARQUET = 'parquet'
- TABLE = 'table'
- class pathling.cli.render.OutputSpec(path: Path | None, format: str, limit: int = 50, overwrite: bool = False, departition: bool = True)[source]
Bases:
objectDescribes where and how results leave the CLI.
- Parameters:
path – the output path, or None to write to stdout.
format – one of the
OutputFormatvalues.limit – the row cap for stdout table rendering.
overwrite – whether an existing output path may be replaced.
departition – whether file output is departitioned to a single file (the default) rather than left as a Spark directory of part files.
- departition: bool = True
- format: str
- limit: int = 50
- overwrite: bool = False
- path: Path | None
- pathling.cli.render.check_overwrite(path: Path, overwrite: bool) None[source]
Fails when an output path already exists and overwrite was not requested.
- Parameters:
path – the output path to check.
overwrite – whether overwriting is permitted.
- Raises:
CliError – when the path exists and overwrite is False.
- pathling.cli.render.infer_format_from_extension(path: Path) str | None[source]
Infers an output format from a file extension.
- Parameters:
path – the output path.
- Returns:
the inferred format, or None when the extension is unrecognised.
- pathling.cli.render.output_options(func)[source]
Applies the shared output options to a command callback.
These options form the common output surface of every command that emits a result DataFrame -
--format,-o/--output,--limit,--overwrite, and--departition/--no-departition- and are resolved together byresolve_output()and consumed bywrite_output().- Parameters:
func – the command callback to decorate.
- Returns:
the decorated callback.
- pathling.cli.render.progress_status(console: Console, message: str, verbose: bool = False)[source]
Shows a status spinner on stderr for a long-running stage.
- Parameters:
console – the stderr console.
message – the status message to display.
verbose – when True, print a plain line instead of a spinner so that it does not interfere with verbose log output.
- pathling.cli.render.render_csv(columns: Sequence[str], rows: Sequence[Sequence]) str[source]
Renders rows as CSV with a header line.
- Parameters:
columns – the column names.
rows – the row values.
- Returns:
the CSV text.
- pathling.cli.render.render_ndjson(columns: Sequence[str], rows: Sequence[Sequence]) str[source]
Renders rows as newline-delimited JSON objects.
- Parameters:
columns – the column names.
rows – the row values.
- Returns:
the NDJSON text.
- pathling.cli.render.render_rows(columns: Sequence[str], rows: Sequence[Sequence], fmt: str) str[source]
Renders rows in the requested stdout format.
- Parameters:
columns – the column names.
rows – the row values.
fmt – one of table, csv, or ndjson.
- Returns:
the rendered text.
- Raises:
CliError – when the format cannot render to stdout.
- pathling.cli.render.render_table(columns: Sequence[str], rows: Sequence[Sequence]) str[source]
Renders rows as a human-readable table with a row-count caption.
- Parameters:
columns – the column names.
rows – the row values.
- Returns:
the rendered table as a string.
- pathling.cli.render.resolve_output(output_path: str | None, format_flag: str | None, limit: int = 50, overwrite: bool = False, departition: bool = True) OutputSpec[source]
Resolves and validates output options.
- Parameters:
output_path – the
-opath, or None for stdout.format_flag – the
--formatvalue, or None to default/infer.limit – the stdout table row cap.
overwrite – whether replacing an existing output path is allowed.
departition – whether file output is departitioned to a single file.
- Returns:
the resolved
OutputSpec.- Raises:
CliError – when the combination of options is invalid.
- pathling.cli.render.stderr_console() Console[source]
Creates a Rich console that writes to stderr.
Markup and highlighting are disabled so that arbitrary message text - error messages, file paths, and JVM error codes that may contain square brackets - is printed verbatim and never misinterpreted as Rich markup (which would otherwise raise a
MarkupError).- Returns:
a console bound to stderr for progress and status messages.
- pathling.cli.render.write_output(df, spec: OutputSpec, console: Console) None[source]
Writes a result DataFrame to stdout or a file per the output spec.
For stdout, the table format is capped at
spec.limitrows; other formats stream the full result. File output is produced by Spark’s native writers (see_write_file()) and confirmed with a single stderr line naming the format and path.- Parameters:
df – the result Spark DataFrame.
spec – the resolved output specification.
console – the stderr console for the confirmation message.
- Raises:
CliError – when the output path exists without
--overwrite.
pathling.cli.run module
The pathling run command.
Executes user-supplied Python code - from a script file, standard input, or an
inline -c option - with spark (the Spark session) and pathling (the
configured Pathling context) bound in the code’s global scope, reproducing
Python interpreter script semantics (sys.argv, __main__, __file__,
sys.path, traceback fidelity, and SystemExit propagation).
Author: John Grimes.
- class pathling.cli.run.CodeSource(text: str, filename: str, argv0: str, path_entry: str, file_attr: str | None)[source]
Bases:
objectA resolved source of program text for execution.
- Parameters:
text – the program source code.
filename – the filename to compile under, which appears in tracebacks and syntax errors (the script path,
<stdin>, or<string>).argv0 – the value for
sys.argv[0], following Python interpreter conventions (the script path,-, or-c).path_entry – the entry to prepend to
sys.path(the script’s directory for files,""for stdin and inline code).file_attr – the value for
__file__in the program’s globals, or None to leave it unset (stdin and inline code).
- argv0: str
- file_attr: str | None
- filename: str
- path_entry: str
- text: str
- class pathling.cli.run.RunCommand(name: str | None, context_settings: MutableMapping[str, Any] | None = None, callback: Callable[[...], Any] | None = None, params: List[Parameter] | None = None, help: str | None = None, epilog: str | None = None, short_help: str | None = None, options_metavar: str | None = '[OPTIONS]', add_help_option: bool = True, no_args_is_help: bool = False, hidden: bool = False, deprecated: bool = False)[source]
Bases:
CommandA command that records its raw argument list before parsing.
The raw arguments are needed to distinguish
run script.py -c CODE(a usage error: two code sources) fromrun -c CODE a b(inline code with trailing arguments), which parse to the same option values.
pathling.cli.session module
Lazy Spark session and Pathling context creation for the CLI.
The Spark session is created only when a command actually needs it, behind a
status spinner on stderr. Spark and JVM logging is suppressed by default by
pointing the driver JVM at a packaged log4j2 configuration before launch, and
lowering the log level once the context exists; --verbose leaves logging at
its defaults.
Author: John Grimes.
- pathling.cli.session.create_context(config: CliConfig, console: Console | None = None)[source]
Creates a
PathlingContextconfigured from the CLI settings.In the default (non-verbose) mode the JVM launcher’s startup banner and Ivy dependency-resolution report - which Spark prints to file descriptor 2 before any log configuration can take effect in local mode - are swallowed by redirecting that descriptor for the duration of session creation, while the status spinner is routed to a preserved copy of the real stderr so that progress remains visible.
- Parameters:
config – the resolved CLI configuration.
console – the stderr console for the status spinner; created when None.
- Returns:
a configured
PathlingContext.
- pathling.cli.session.quiet_log4j2_path() str[source]
Resolves a filesystem path to the packaged quiet log4j2 configuration.
The configuration ships as static package data, so no per-run temporary file is created. The materialised path is kept valid for the process lifetime so the driver JVM can read it during and after session start.
- Returns:
the filesystem path to the quiet log4j2 properties file.
pathling.cli.sparkconf module
Parsing, validation, coercion, and merge for user-supplied Spark settings.
This module holds the pure logic that turns [spark] config-table entries and
--spark-conf KEY=VALUE flags into the effective Spark configuration applied
when a session is built. Keys must begin with spark.; scalar values are
coerced to the string form Spark expects; values support the existing
@file/environment secret resolution. The three managed keys
(spark.jars.packages, spark.sql.extensions,
spark.sql.catalog.spark_catalog) are merged with item-level protection
against Pathling’s managed defaults so the library keeps working while user
additions take effect. None of this requires PySpark, so it runs before any
Spark session starts.
Author: John Grimes.
- pathling.cli.sparkconf.merge_spark_conf(user_map: dict, on_warning: Callable[[str], None] | None = None) dict[source]
Merges the user Spark map with Pathling’s managed defaults.
Plain keys pass through unchanged. The managed list keys (
spark.jars.packages,spark.sql.extensions) are unioned with the managed defaults and deduplicated.spark.sql.catalog.spark_catalogis dropped when it equals the managed Delta catalog and is an error otherwise. Only keys the user actually set appear in the result; keys they did not touch are left to the session builder’s managed defaults.- Parameters:
user_map – the validated, coerced, and resolved user Spark map.
on_warning – a callback for the managed-version-override warning, or None to suppress it.
- Returns:
the effective Spark configuration to apply on top of the defaults.
- Raises:
CliError – if the session catalog is set to a non-Delta value.
- pathling.cli.sparkconf.parse_spark_conf_flags(flags) dict[source]
Parses repeatable
--spark-conf KEY=VALUEflags into a mapping.Each flag is split on the first
=only, so a value may itself contain=(for example a JVM option). When the same key appears more than once, the last occurrence wins.- Parameters:
flags – an iterable of raw
KEY=VALUEflag strings.- Returns:
a mapping of key to value, with later duplicates overriding earlier.
- Raises:
CliError – if a flag is not of the form
KEY=VALUE(exit code 2).
- pathling.cli.sparkconf.resolve_spark_conf(file_table: dict | None, flag_map: dict | None, env: dict | None = None) dict[source]
Combines, validates, coerces, and secret-resolves the user Spark map.
The flag map overrides the file table per key. Each entry’s key and value are then validated and coerced, and string values are passed through the existing secret resolver so a
@filereference is read from disk.- Parameters:
file_table – the parsed
[spark]table, or None.flag_map – the parsed
--spark-confflag map, or None.env – the environment mapping for secret resolution.
- Returns:
the validated, coerced, and resolved user Spark map.
- Raises:
CliError – if a key or value is invalid, or a
@filereference cannot be read.
- pathling.cli.sparkconf.validate_and_coerce(key: str, value) str[source]
Validates a Spark configuration key and coerces its value to a string.
The key must begin with
spark.. Scalar values (string, integer, float, boolean) are coerced to the string form Spark expects, with booleans rendered astrue/false. Any other value type (a TOML array or table) is rejected.- Parameters:
key – the Spark configuration key.
value – the raw value from the config table or a flag.
- Returns:
the value coerced to a string.
- Raises:
CliError – if the key is not prefixed
spark.or the value is not a scalar (exit code 2).
pathling.cli.terminology module
The Pathling terminology commands.
Each command reads a tabular dataset (CSV or Parquet), builds codings from a named code column plus either a fixed system URI or a per-row system column, calls the corresponding library terminology function, appends the result column(s), and emits the augmented dataset per the shared output options.
Author: John Grimes.
pathling.cli.view module
The pathling view command.
Executes a SQL on FHIR ViewDefinition (a JSON file or inline string) against a
data source and emits the tabular result in the requested format, optionally
restricting the resources processed with a FHIR search --filter.
Author: John Grimes.
Module contents
Command line interface for Pathling.
This subpackage exposes the Pathling Python library through a flat, verb-based
command tree installed as the pathling console script. Modules are kept
free of eager PySpark imports so that --help and --version remain fast;
the Spark session is created lazily by pathling.cli.session only when a
command needs it.
Author: John Grimes.