Kubernetes

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Support for deploying Pathling on Kubernetes is provided via a Helm chart.

The Helm chart includes the following features:

Support for startup, liveness and readiness probes powered by the Spring Boot Actuator endpoint
Services for the FHIR API, Actuator management API, Spark UI, driver endpoint and block manager endpoint
Support for the Spark Kubernetes cluster manager, including a service account, role and role binding to allow it to manage executor pods
Customisation of resource requests and limits
Configuration of volumes and volume mounts
Image pull secrets for private Docker registries
Tolerations and affinity for control over pod scheduling
Secret config for sensitive values

Installation

To install the chart, run the following commands:

# Add the Pathling Helm repository.
helm repo add pathling https://pathling.csiro.au/helm

# Get the latest information about charts from the repository.
helm repo update

# Install the Pathling server chart as a release named `pathling`, with the 
# default values.
helm install pathling pathling/pathling

Values

This is the list of the configuration values that the chart supports, along with their default values.

Key	Default	Description
`pathling.image`	`aehrc/pathling:latest`	The Pathling Docker image to use
`pathling.resources.requests.cpu`	`2`	The CPU request for the Pathling pod
`pathling.resources.requests.memory`	`4G`	The memory request for the Pathling pod
`pathling.resources.limits.memory`	`4G`	The memory limit for the Pathling pod
`pathling.resources.maxHeapSize`	`2800m`	The maximum heap size for the JVM, should usually be about 75% of the available memory
`pathling.additionalJavaOptions`	`-Duser.timezone=UTC`	Additional Java options to pass to the JVM
`pathling.deployment.strategy`	`Recreate`	The deployment strategy to use
`pathling.deployment.imagePullPolicy`	`Always`	The image pull policy to use
`pathling.volumes`	`[ ]`	A list of volumes to mount in the pod
`pathling.volumeMounts`	`[ ]`	A list of volume mounts to mount
`pathling.serviceAccount`	`~`	The service account to assign to the pod
`pathling.imagePullSecrets`	`[ ]`	A list of image pull secrets to use
`pathling.tolerations`	`[ ]`	A list of tolerations to apply to the pod
`pathling.affinity`	`~`	Affinity to apply to the pod
`pathling.config`	`{ }`	A map of configuration values to pass to Pathling
`pathling.secretConfig`	`{ }`	A map of secret configuration values to pass to Pathling, these values will be stored using Kubernetes secrets

Example configuration

Here are a few examples of how to configure the Pathling Helm chart for different deployment scenarios.

Single node

This configuration is suitable for a single node deployment of Pathling. In this scenario, all processing is performed on a single pod.

pathling:
    image: aehrc/pathling:7
    resources:
        requests:
            cpu: 2
            memory: 4G
        limits:
            memory: 4G
        maxHeapSize: 3g
    volumes:
        -   name: warehouse
            hostPath:
                path: /home/user/data/pathling
    volumeMounts:
        -   name: warehouse
            mountPath: /usr/share/warehouse
            readOnly: false
    config:
        pathling.implementationDescription: My Pathling Server
        pathling.terminology.cache.maxEntries: 500000
        pathling.terminology.cache.overrideExpiry: "2592000"
        pathling.encoding.openTypes: string,code,decimal,Coding,Address
        logging.level.au.csiro.pathling: debug

Cluster

This configuration is suitable for a cluster deployment of Pathling, using the Spark Kubernetes cluster manager. In this scenario, the driver pod hosts an API but processing is performed on executor pods, which are spawned by the driver pod through calls to the Kubernetes API.

This configuration is suitable for the processing of larger datasets, or scenarios where it may be desirable to run a small driver pod and spawn executor pods on demand (at the cost of some latency).

pathling:
    image: aehrc/pathling:7
    resources:
        requests:
            cpu: 1
            memory: 2G
        limits:
            memory: 2G
        maxHeapSize: 1500m
    volumes:
        -   name: warehouse
            hostPath:
                path: /home/user/data/pathling
    volumeMounts:
        -   name: warehouse
            mountPath: /usr/share/warehouse
            readOnly: false
    serviceAccount: spark-service-account
    config:
        pathling.implementationDescription: My Pathling Server
        pathling.terminology.cache.maxEntries: 500000
        pathling.terminology.cache.overrideExpiry: "2592000"
        pathling.encoding.openTypes: string,code,decimal,Coding,Address
        logging.level.au.csiro.pathling: debug
        spark.master: k8s://https://kubernetes.default.svc
        spark.kubernetes.namespace: pathling
        spark.kubernetes.executor.container.image: aehrc/pathling:7
        spark.kubernetes.executor.volumes.hostPath.warehouse.options.path: /home/user/data/pathling
        spark.kubernetes.executor.volumes.hostPath.warehouse.mount.path: /usr/share/warehouse
        spark.kubernetes.executor.volumes.hostPath.warehouse.mount.readOnly: false
        spark.executor.instances: 3
        spark.executor.memory: 3G
        spark.kubernetes.executor.request.cores: 2
        spark.kubernetes.executor.limit.cores: 2
        spark.kubernetes.executor.request.memory: 4G
        spark.kubernetes.executor.limit.memory: 4G

Kubernetes

Installation​

Values​

Example configuration​

Single node​

Cluster​

Installation

Values

Example configuration

Single node

Cluster