Data Automation

New Features

ELT Snap Pack

  • The ELT Load Snap can infer the schema from the source files located in S3 buckets and create, overwrite, and append the target table in your Redshift instance with the source data. It can infer the table schema from the data available in AVRO, CSV, JSON, ORC, and PARQUET files. Learn more at Automatic Schema Inference with ELT Load Snap.

Known Issues

ELT Snap Pack

  • In the case of Azure Synapse and Redshift, if the source value is NULL the ELT Pivot Snap fails to return valid results either to the target table or the downstream Snaps.
  • When loading data from a CSV file to a target DLP table, the header names in the file must exactly match the column names in the target table. Otherwise, the ELT Load Snap returns the error—Column names should be the same in the target table and CSV file and aborts the load operation.
  • You cannot add a column to your BigQuery target table with a deleted column name using the ELT Load Snap, because BigQuery reserves deleted column names and data until the pre-configured time travel duration (from 2 through 7 days).
  • Due to an issue with BigQuery table schema management (the time travel feature), an ALTER TABLE action (Add or Update column) that you attempt after deleting a column (DROP action) in your BigQuery target table causes the table to break and the Snap (ELT Load, ELT SCD2, or ELT Execute) to fail.

    As a workaround, you can consider either avoiding ALTER TABLE actions on your BigQuery instance or creating (CREATE) a temporary copy of your table and deleting (DROP) it after you use it.

  • Due to an issue with the Simba Spark JDBC driver for DLP, you cannot insert a NULL value in the nullable columns of Boolean data type in your DLP instance using any of the write-type Snaps—ELT Load, ELT SCD2, ELT Insert-Select, ELT Merge Into, and ELT Execute, when the target table schema is available.

    The only workaround currently available for this issue is to upgrade your JDBC driver to databricks-jdbc-2.6.25-1.jar, use the corresponding JDBC driver class (com.databricks.client.jdbc.Driver) and JDBC URL in your Snap account.

    Note: This latest JDBC driver for DLP uses a JDBC URL structure and driver class that is different from the Simba Spark JDBC driver.
  • The ELT Load Snap does not cause any NULL values in new columns added to the target table through the Alter table Load Action.
  • The ELT Merge Into Snap fails when you perform an UPDATE action in the (hash) distribution key column of an Azure Synapse table. The failure occurs because Azure Synapse does not support modifying values in a table (hash) distribution key column.
  • Due to an issue with DLP, aborting an ELT Pipeline validation (with preview data enabled) causes only those SQL statements that retrieve data using bind parameters to get aborted while all other static statements (that use values instead of bind parameters) persist.

    For example, select * from a_table where id = 10 will not be aborted while select * from test where id = ? gets aborted.

    To avoid this issue, ensure that you always configure your Snap settings to use bind parameters inside its SQL queries.

  • The ELT Math Function Snap fails during Pipeline execution even after successful validation against the Redshift CDW due to the incompatible or incorrect data types associated with the target table columns created during the Pipeline validation. To prevent this failure, we recommend that you manually delete the table created during validation before running the Pipeline.