Writes the data from a data source to a set of tables in the Spark catalog.
ds_write_tables(ds, schema = NULL, import_mode = ImportMode$OVERWRITE)
The DataSource object.
The name of the schema to write the tables to.
The import mode to use when writing the data - "overwrite" will overwrite any existing data, "merge" will merge the new data with the existing data based on resource ID.
No return value, called for side effects only.
Pathling documentation - Writing managed tables
Other data sink functions:
ds_write_delta()
,
ds_write_ndjson()
,
ds_write_parquet()
# Create a temporary warehouse location, which will be used when we call ds_write_tables().
temp_dir_path <- tempfile()
dir.create(temp_dir_path)
sc <- sparklyr::spark_connect(master = "local[*]", config = list(
"sparklyr.shell.conf" = c(
paste0("spark.sql.warehouse.dir=", temp_dir_path),
"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension",
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
)
), version = pathling_spark_info()$spark_version)
pc <- pathling_connect(sc)
data_source <- pc %>% pathling_read_ndjson(pathling_examples('ndjson'))
# Write the data to a set of Spark tables in the 'default' database.
data_source %>% ds_write_tables("default", import_mode = ImportMode$MERGE)
pathling_disconnect(pc)
unlink(temp_dir_path, recursive = TRUE)