Skip to main content

Parquet specification

This specification describes a scheme for representing FHIR resources within a Parquet schema.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", " SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Version compatibility

The following versions of FHIR are supported by this specification:

Resource type support

All R4 resource types are supported by this specification, except the following:

  • Parameters
  • Task
  • StructureDefinition
  • StructureMap
  • Bundle

Configuration options

There are several options that determine the structure of the schema. Any two schemas that conform to this specification with the same configuration values SHALL be identical and compatible.

OptionDescription
Maximum nesting levelThe maximum supported depth of nested element data.
Extensions enabledWhether extension content is included within the schema.
Supported open typesThe list of types that are supported for open choice types.

Out of scope

The following features are not currently supported by this specification:

  • Primitive extensions
  • Profiled resources
  • Contained resources

Schema structure

All fields SHALL be encoded as OPTIONAL, unless otherwise specified. All groups SHALL be encoded as REQUIRED.

Primitive elements

Each primitive element within a resource definition SHALL be represented as a field within the schema with the same name (except where otherwise specified). The data type SHALL be determined by the element type according to the following table:

FHIR typeParquet typeAdditional requirements
booleanBOOLEAN
canonicalBINARY (UTF8)Compliant with the FHIR canonical format
codeBINARY (UTF8)Compliant with the FHIR code format
dateTimeBINARY (UTF8)Compliant with the FHIR dateTime format
dateBINARY (UTF8)Compliant with the FHIR date format
decimalDECIMAL(32,6)See Decimal type
idBINARY (UTF8)See ID type
instantINT96
integerINT32Compliant with the FHIR integer format
markdownBINARY (UTF8)Compliant with the FHIR markdown format
oidBINARY (UTF8)Compliant with the FHIR oid format
positiveIntINT32Compliant with the FHIR positiveInt format
stringBINARY (UTF8)Compliant with the FHIR string format
timeBINARY (UTF8)Compliant with the FHIR time format
unsignedIntINT32Compliant with the FHIR unsignedInt format
uriBINARY (UTF8)Compliant with the FHIR uri format
urlBINARY (UTF8)Compliant with the FHIR url format
uuidBINARY (UTF8)Compliant with the FHIR uuid format

Complex and backbone elements

Each complex and backbone element within a resource definition SHALL be represented as a group within the schema with the same name. The group SHALL contain a fields for each of the child elements.

Each field SHALL be represented as described in the Primitive elements and Choice types sections (or this section in the case of a nested complex or backbone element).

If the "extensions enabled" option is set to true, an additional field SHALL be included with the name _fid. This field SHALL have a type of INT32. This field SHALL contain a value that uniquely identifies the complex or backbone element within the resource.

Choice types

If the choice type is open (i.e. a type of *), the group SHALL contain a field for each type listed in the configuration value "supported open types".

If the choice type is not open, it SHALL be represented as a group with a field for each of its valid types.

The name of each field SHALL follow the format value[type], where [type] is the name of the type in upper camel case.

Decimal type

An element of type decimal SHALL be represented as a DECIMAL field with a precision of 32 and a scale of 6.

In addition, an INT32 field SHALL be included with the suffix _scale. This field SHALL be used to store the scale of the decimal value from the original FHIR data.

ID type

An element of type id SHALL be represented as a BINARY (UTF8) field. This field SHALL be used to store the FHIR resource logical ID.

In addition to this field, a BINARY (UTF8) field SHALL be included with the suffix _versioned. This field SHALL be used to store a fully qualified logical ID that includes the resource type and the technical version. The data in this field SHALL follow the format [resource type]/[logical ID]/_history/[version].

Quantity type

If a complex element is of type Quantity, an additional two fields SHALL be included as part of the group. These fields SHALL be named _value_canonicalized and _code_canonicalized.

The _value_canonicalized field SHALL be encoded as a group with the following fields:

  • value with a type of decimal, precision 38 and scale 0.
  • scale with a type of integer.

The _code_canonicalized field SHALL be encoded as a string field.

These fields MAY be used to store canonicalized versions of the value and code fields from the original FHIR data, for easier comparison and querying.

Extensions

If the "extensions enabled" option is true, an additional field SHALL be included at the root of the schema named _extension. This field SHALL have a type of MAP, with an INT32 key and a repeated group value.

The group used for each value in the map SHALL be represented using the Extension type, as described in Complex and backbone elements.

If the "extensions enabled" option is true, this field SHALL be used to store extension data from the original FHIR data (excluding primitive extensions). Each key within the map SHALL contain a reference to the _fid of the element that the extension is attached to.

Example

The following schema is an example of how to represent the Patient resource type using the following configuration values:

  • Maximum nesting level: 3
  • Extensions enabled: true
  • Supported open types: boolean, code, date, dateTime, decimal, integer, string, Coding, CodeableConcept, Address, Identifier, Reference
message spark_schema {
optional binary id (STRING);
optional binary id_versioned (STRING);
optional group meta {
optional binary id (STRING);
optional binary versionId (STRING);
optional binary versionId_versioned (STRING);
optional int96 lastUpdated;
optional binary source (STRING);
optional group profile (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional group security (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary version (STRING);
optional binary code (STRING);
optional binary display (STRING);
optional boolean userSelected;
optional int32 _fid;
}
}
}
optional group tag (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary version (STRING);
optional binary code (STRING);
optional binary display (STRING);
optional boolean userSelected;
optional int32 _fid;
}
}
}
optional int32 _fid;
}
optional binary implicitRules (STRING);
optional binary language (STRING);
optional group text {
optional binary id (STRING);
optional binary status (STRING);
optional binary div (STRING);
optional int32 _fid;
}
optional group identifier (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary use (STRING);
optional group type {
optional binary id (STRING);
optional group coding (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary version (STRING);
optional binary code (STRING);
optional binary display (STRING);
optional boolean userSelected;
optional int32 _fid;
}
}
}
optional binary text (STRING);
optional int32 _fid;
}
optional binary system (STRING);
optional binary value (STRING);
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional group assigner {
optional binary reference (STRING);
optional binary display (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
}
}
optional boolean active;
optional group name (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary use (STRING);
optional binary text (STRING);
optional binary family (STRING);
optional group given (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional group prefix (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional group suffix (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
}
}
optional group telecom (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary value (STRING);
optional binary use (STRING);
optional int32 rank;
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
}
}
optional binary gender (STRING);
optional binary birthDate (STRING);
optional boolean deceasedBoolean;
optional binary deceasedDateTime (STRING);
optional group address (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary use (STRING);
optional binary type (STRING);
optional binary text (STRING);
optional group line (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional binary city (STRING);
optional binary district (STRING);
optional binary state (STRING);
optional binary postalCode (STRING);
optional binary country (STRING);
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
}
}
optional group maritalStatus {
optional binary id (STRING);
optional group coding (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary version (STRING);
optional binary code (STRING);
optional binary display (STRING);
optional boolean userSelected;
optional int32 _fid;
}
}
}
optional binary text (STRING);
optional int32 _fid;
}
optional boolean multipleBirthBoolean;
optional int32 multipleBirthInteger;
optional group photo (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary contentType (STRING);
optional binary language (STRING);
optional binary data;
optional binary url (STRING);
optional int32 size;
optional binary hash;
optional binary title (STRING);
optional binary creation (STRING);
optional int32 _fid;
}
}
}
optional group contact (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional group relationship (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional group coding (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary version (STRING);
optional binary code (STRING);
optional binary display (STRING);
optional boolean userSelected;
optional int32 _fid;
}
}
}
optional binary text (STRING);
optional int32 _fid;
}
}
}
optional group name {
optional binary id (STRING);
optional binary use (STRING);
optional binary text (STRING);
optional binary family (STRING);
optional group given (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional group prefix (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional group suffix (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
optional group telecom (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary value (STRING);
optional binary use (STRING);
optional int32 rank;
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
}
}
optional group address {
optional binary id (STRING);
optional binary use (STRING);
optional binary type (STRING);
optional binary text (STRING);
optional group line (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional binary city (STRING);
optional binary district (STRING);
optional binary state (STRING);
optional binary postalCode (STRING);
optional binary country (STRING);
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
optional binary gender (STRING);
optional group organization {
optional binary reference (STRING);
optional binary display (STRING);
optional int32 _fid;
}
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
}
}
optional group communication (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional group language {
optional binary id (STRING);
optional group coding (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary version (STRING);
optional binary code (STRING);
optional binary display (STRING);
optional boolean userSelected;
optional int32 _fid;
}
}
}
optional binary text (STRING);
optional int32 _fid;
}
optional boolean preferred;
optional int32 _fid;
}
}
}
optional group generalPractitioner (LIST) {
repeated group list {
optional group element {
optional binary reference (STRING);
optional binary display (STRING);
optional int32 _fid;
}
}
}
optional group managingOrganization {
optional binary reference (STRING);
optional binary display (STRING);
optional int32 _fid;
}
optional group link (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional group other {
optional binary reference (STRING);
optional binary display (STRING);
optional int32 _fid;
}
optional binary type (STRING);
optional int32 _fid;
}
}
}
optional int32 _fid;
optional group _extension (MAP) {
repeated group key_value {
required int32 key;
required group value (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary url (STRING);
optional group valueAddress {
optional binary id (STRING);
optional binary use (STRING);
optional binary type (STRING);
optional binary text (STRING);
optional group line (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional binary city (STRING);
optional binary district (STRING);
optional binary state (STRING);
optional binary postalCode (STRING);
optional binary country (STRING);
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
optional boolean valueBoolean;
optional binary valueCode (STRING);
optional group valueCodeableConcept {
optional binary id (STRING);
optional group coding (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary version (STRING);
optional binary code (STRING);
optional binary display (STRING);
optional boolean userSelected;
optional int32 _fid;
}
}
}
optional binary text (STRING);
optional int32 _fid;
}
optional group valueCoding {
optional binary id (STRING);
optional binary system (STRING);
optional binary version (STRING);
optional binary code (STRING);
optional binary display (STRING);
optional boolean userSelected;
optional int32 _fid;
}
optional binary valueDateTime (STRING);
optional binary valueDate (STRING);
optional fixed_len_byte_array(14) valueDecimal (DECIMAL(32,6));
optional int32 valueDecimal_scale;
optional group valueIdentifier {
optional binary id (STRING);
optional binary use (STRING);
optional group type {
optional binary id (STRING);
optional group coding (LIST) {
repeated group list {
optional group element {
optional binary id (STRING);
optional binary system (STRING);
optional binary version (STRING);
optional binary code (STRING);
optional binary display (STRING);
optional boolean userSelected;
optional int32 _fid;
}
}
}
optional binary text (STRING);
optional int32 _fid;
}
optional binary system (STRING);
optional binary value (STRING);
optional group period {
optional binary id (STRING);
optional binary start (STRING);
optional binary end (STRING);
optional int32 _fid;
}
optional group assigner {
optional binary reference (STRING);
optional binary display (STRING);
optional int32 _fid;
}
optional int32 _fid;
}
optional int32 valueInteger;
optional group valueReference {
optional binary reference (STRING);
optional binary display (STRING);
optional int32 _fid;
}
optional binary valueString (STRING);
optional int32 _fid;
}
}
}
}
}
}