Skip to main content

Kubernetes

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Support for deploying Pathling on Kubernetes is provided via a Helm chart.

The Helm chart includes the following features:

Installation

To install the chart, run the following commands:

# Add the Pathling Helm repository.
helm repo add pathling https://pathling.csiro.au/helm

# Get the latest information about charts from the repository.
helm repo update

# Install the Pathling server chart as a release named `pathling`, with the
# default values.
helm install pathling pathling/pathling

Values

This is the list of the configuration values that the chart supports, along with their default values.

KeyDefaultDescription
pathling.imageaehrc/pathling:latestThe Pathling Docker image to use
pathling.resources.requests.cpu2The CPU request for the Pathling pod
pathling.resources.requests.memory4GThe memory request for the Pathling pod
pathling.resources.limits.memory4GThe memory limit for the Pathling pod
pathling.resources.maxHeapSize2800mThe maximum heap size for the JVM, should usually be about 75% of the available memory
pathling.additionalJavaOptions-Duser.timezone=UTCAdditional Java options to pass to the JVM
pathling.deployment.strategyRecreateThe deployment strategy to use
pathling.deployment.imagePullPolicyAlwaysThe image pull policy to use
pathling.volumes[ ]A list of volumes to mount in the pod
pathling.volumeMounts[ ]A list of volume mounts to mount
pathling.serviceAccount~The service account to assign to the pod
pathling.imagePullSecrets[ ]A list of image pull secrets to use
pathling.tolerations[ ]A list of tolerations to apply to the pod
pathling.affinity~Affinity to apply to the pod
pathling.config{ }A map of configuration values to pass to Pathling
pathling.secretConfig{ }A map of secret configuration values to pass to Pathling, these values will be stored using Kubernetes secrets

Example configuration

Here are a few examples of how to configure the Pathling Helm chart for different deployment scenarios.

Single node

This configuration is suitable for a single node deployment of Pathling. In this scenario, all processing is performed on a single pod.

pathling:
image: aehrc/pathling:7
resources:
requests:
cpu: 2
memory: 4G
limits:
memory: 4G
maxHeapSize: 3g
volumes:
- name: warehouse
hostPath:
path: /home/user/data/pathling
volumeMounts:
- name: warehouse
mountPath: /usr/share/warehouse
readOnly: false
config:
pathling.implementationDescription: My Pathling Server
pathling.terminology.cache.maxEntries: 500000
pathling.terminology.cache.overrideExpiry: "2592000"
pathling.encoding.openTypes: string,code,decimal,Coding,Address
logging.level.au.csiro.pathling: debug

Cluster

This configuration is suitable for a cluster deployment of Pathling, using the Spark Kubernetes cluster manager. In this scenario, the driver pod hosts an API but processing is performed on executor pods, which are spawned by the driver pod through calls to the Kubernetes API.

This configuration is suitable for the processing of larger datasets, or scenarios where it may be desirable to run a small driver pod and spawn executor pods on demand (at the cost of some latency).

pathling:
image: aehrc/pathling:7
resources:
requests:
cpu: 1
memory: 2G
limits:
memory: 2G
maxHeapSize: 1500m
volumes:
- name: warehouse
hostPath:
path: /home/user/data/pathling
volumeMounts:
- name: warehouse
mountPath: /usr/share/warehouse
readOnly: false
serviceAccount: spark-service-account
config:
pathling.implementationDescription: My Pathling Server
pathling.terminology.cache.maxEntries: 500000
pathling.terminology.cache.overrideExpiry: "2592000"
pathling.encoding.openTypes: string,code,decimal,Coding,Address
logging.level.au.csiro.pathling: debug
spark.master: k8s://https://kubernetes.default.svc
spark.kubernetes.namespace: pathling
spark.kubernetes.executor.container.image: aehrc/pathling:7
spark.kubernetes.executor.volumes.hostPath.warehouse.options.path: /home/user/data/pathling
spark.kubernetes.executor.volumes.hostPath.warehouse.mount.path: /usr/share/warehouse
spark.kubernetes.executor.volumes.hostPath.warehouse.mount.readOnly: false
spark.executor.instances: 3
spark.executor.memory: 3G
spark.kubernetes.executor.request.cores: 2
spark.kubernetes.executor.limit.cores: 2
spark.kubernetes.executor.request.memory: 4G
spark.kubernetes.executor.limit.memory: 4G