File Operation

Overview

You can use this Snap to perform a file operation (move/copy/rename) for a file from source to target in the same server. The supported file protocols are: local file (file:///), FTP, SFTP, S3, WASB, and WASBS. The File Operation Snap performs extract and load operations on the input file or folder.

Note:

You must install the AzCopy utility, if you use the ABFS (Azure Blob File Storage) file protocol Azure Data Lake Gen 2 for bulk operation. The utility must be installed in Snaplex to fetch the file path. If the path is null, the native Azure Storage SDK is used for all operations. Learn more about the AzCopy command. If AzCopy Utility is not installed for ABS file transfer, the file transfer will not be as fast as using AzCopy because a REST call will be invoked for each file content instead of a bulk operation.

The SnapLogic Platform does not support the installation of utilities or processes on Cloudplexes. Learn more.

Important: We plan to introduce additional S3 features exclusively in Amazon S3 Snaps, while Binary Snaps with S3 support will not contain these updates. Therefore, we recommend you to use the Amazon S3 Snap Pack for all your S3 operations within your pipelines. However, Binary Snaps will be retained as is to maintain backward compatibility, but be aware that we will no longer provide S3 support for the Binary Snaps.

Learn more: Migrate from Binary to S3 Snaps.



Prerequisites

  • The provided account must have 'write' access to the specified directory and file in order to perform the file operation successfully.
  • IAM Roles for Amazon EC2,

IAM Roles for Amazon EC2

The 'IAM_CREDENTIAL_FOR_S3' feature is used to access S3 files from EC2 Groundplex, without Access-key ID and Secret key in the AWS S3 account in the Snap. The IAM credential stored in the EC2 metadata is used to gain access rights to the S3 buckets. To enable this feature, set the Global properties (Key-Value parameters) and restart the JCC:jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE

This feature is supported in the EC2-type Groundplex only. Learn more.

Limitations

  • The Snap can move, rename, or copy a file within the same file server, but not across file servers.

  • The Snap can move or copy S3 files across buckets within the same region, but not across regions.

Known issues

  • This Snap does not support using the ABFS protocol with a Windows-based Snaplex.
  • When you use special characters, such asðø©¢¾A²½µ®÷¶þ~ for Source and Target directory and filenames, this Snap fails and results in the following error as the special characters are not supported.
    Error: Illegal character in fragment at index 71: bfs://bigdataqa@bigdataqassl.dfs.core.windows.net/simplechar/owner!@#$^&()_¢äâêîôûñç¡¿ÉÙËǨ°¸ðø©¢¾A²½µ®§÷¶þ~.json
  • The format: abfs(s)://filesystem@accountname.endpoint/<path> for file path for Source and Target fields does not work as this URL syntax is not supported. The file protocol must begin with abfs(s):///, else the container, account name, and endpoint are interpolated into the URL at runtime, which results in the following error.

    Error: Unsupported protocol or URL syntax error in abfs(s)://filesystem@accountname.endpoint/<path>,

    Workaround: Use the supported file protocol and correct URL syntax.

    • This Snap Pack no longer natively supports RSA-SHA1 authentication with the Secure File Transfer Protocol (SFTP). To enable support for RSA-SHA1 authentication, set the following property from the Node Properties section of Configuration Options :

    - Djsch.server_host_key=ssh-rsa -Djsch.client_pubkey=ssh-rsa

    With the 4.33 GA release of the Binary Snap Pack , support for some algorithms for SFTP connection negotiation is removed for improved security and because we’ve updated the library used to connect to SFTP sources. If you want to revert to the previous settings, you can set the following jcc.jvm_options from the Node Properties section Configuration Options. To update Cloudplexes, contact SnapLogic Support.

    -Djsch.kex=ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1-Djsch.server_host_key=ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521-Djsch.client_pubkey=ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521-Djsch.cipher=aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc-Djsch.check_ciphers=aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256-Djsch.check_kexes=diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521-Djsch.check_signatures=ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521

Snap views

View Description Examples of upstream and downstream Snaps
Input Any document with key-value pairs to evaluate expression properties in the File Operator Snap can be used, where each input document will cause one complete execution of the Snap.
Output A typical output from this Snap is a document with column headers, such as Source, Target, and Status (Moved or Copied).
{
	"Source": "ftp://ftp.snaplogic.com/home/mrtest/new1/sample.csv",
	"Target":  "ftp://ftp.snaplogic.com/home/mrtest/new2/new.csv",
	"Status": "Moved"
}
Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution Stops the current pipeline execution when an error occurs.
  • Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records.
  • Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap settings

Legend:
  • Expression icon (): Allows using pipeline parameters to set field values dynamically (if enabled). SnapLogic Expressions are not supported. If disabled, you can provide a static value.
  • SnapGPT (): Generates SnapLogic Expressions based on natural language using SnapGPT. Learn more.
  • Suggestion icon (): Populates a list of values dynamically based on your Snap configuration. You can select only one attribute at a time using the icon. Type into the field if it supports a comma-separated list of values.
  • Upload : Uploads files. Learn more.
Learn more about the icons in the Snap settings dialog.
Field / Field set Type Description
Label String

Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if more than one of the same Snaps is in the pipeline.

Default value: File Operation

Example: File Operation
Source String/Expression Required. Specify the URL for the source, where the binary data is read from. This Snap also supports S3 Virtual Private Cloud (VPC) endpoint. For example, s3://my-bucket@bucket.vpce-028b7814794578709-vu0vvauy.s3.us-west-2.vpce.amazonaws.com

Example of Source as an expression: "s3:///mybucket/out_" + Date.now() + ".csv"the evaluated filename will be: s3:///mybucket/out_2013-11-13T00:22:31.880Z.csv

This property should have the syntax: scheme://[hostname:port]/[path to source]

You can also copy or move the file from your local system to Azure blob container for better performance.

Warning: When using expressions to build a file name, ensure that the resulting file name does not contain characters that are not supported by the target platform.
Warning: Amazon AWS S3 SDK Limitation - File operations
  • When performing operations on S3 folders ensure that a trailing forward slash ( / ) is added; otherwise, the Snap considers it a file instead of a folder and an error message will be displayed.
  • Ensure the source does not contain '?' character because it is not fully supported and when present, the Snap might fail.

Default value: None.

Example:
  • file:///tmp/test.csv (if you're using Linux)
  • ftp://ftp.snaplogic.com/home/mrtest/source.csv (if your file is on an FTP server)
  • sftp://ftp.snaplogic.com/home/mrtest/source.csv (if you are using Secure FTP)
  • s3:///test_bucket/folder1/test.json (if you are using AWS without specifying any region)
  • s3:///mybucket@eu-west-1/folder1/test.json (if you're using an AWS account that is region-specific)
  • wasb:///test_container/testFolder/sample.json (if you are using Windows Azure Storage Blob)
  • wasbs:///test_container/testFolder/sample.json (if you are using Secure Windows Azure Storage Blob)
  • _source (A key/value pair with "source" key should be defined as a Pipeline parameter.)
  • $source(A key/value pair with "source" key should be defined the input document.)
  • file:///Z:/somedirectory/somefile.csv (if you have a Groundplex on Windows)
  • file:////somedirectory/somfile.csv (if you're using the Universal Naming Convention
  • abfs(s):///filesystem/<path>/
  • abfs(s)://filesystem@accountname.endpoint/<path>
Target String/Expression

Required. This property specifies the URL of the destination where the selected file operation must be performed. This Snap also supports S3 Virtual Private Cloud (VPC) endpoint. For example, s3://my-bucket@bucket.vpce-028b7814794578709-vu0vvauy.s3.us-west-2.vpce.amazonaws.com

Warning: Amazon AWS S3 SDK Limitation - File operations
  • When performing operations on S3 folders ensure that a trailing forward slash ( / ) is added; otherwise, the Snap considers it a file instead of a folder and an error message will be displayed.
  • Ensure the file name, folder name, or the file path does not contain '?' character because it is not supported. The Snap fails with an error if you include the '?' character.

Default value: None.

Example:
  • file:///tmp/test.csv
  • ftp://ftp.snaplogic.com/home/mrtest/source.csv
  • ftp://ftp.snaplogic.com/home/mrtest/
  • sftp://ftp.snaplogic.com/home/mrtest/source.csv
  • sftp://ftp.snaplogic.com/home/mrtest/
  • s3:///test_bucket/folder1/test.json
  • s3://test_bucket/folder1/
  • wasb:///test_container/testFolder/sample.json
  • wasbs:///test_container/testFolder/sample.json
  • _target (A key/value pair with "target" key should be defined as a Pipeline parameter.)
  • $target (A key/value pair with "target" key should be defined the input document.)
  • abfs(s):///filesystem/<path>/
  • abfs(s)://filesystem@accountname.endpoint/<path>
File Operation String/Expression/ Suggestion
Required. Enter or select the operation you want the Snap to perform on the file. Available options are:
  • Move - Moves the source file to the target location.

  • Copy - Creates a copy of the source file in the target location.

  • Rename - Renames a file in the source location. Rename is handled the same as Move. The source URL and target URL are the same, with the exception of the file name.

Tip: This Snap supports Azure Data Lake Storage (ADLS) Gen 2 Storage and ADLS Gen 2 Storage (ABFS) protocol for moving and copying files in the Azure Blob File System (ABFS).
Warning: Rename is handled the same as Move. The source URL and target URL are the same, with the exception of the file name.

Default value: Move

Example:
  • Copy
  • _operation (A key/value pair with "operation" key should be defined as a Pipeline parameter.)
  • $operation(A key/value pair with "operation" key should be defined the input document.)
Error if exists Checkbox If enabled, the Snap displays an error when the target exists. If disabled, the Snap replaces or overwrites the target with the source.

Default status: Selected

Advanced properties Use this field set to customize or control the Snap's validation and execution mechanism.
Properties Dropdown list The available options are:
  • SAS URI:
The URI of the Shared Access Storage (SAS) to be accessed. Supported SAS types are:
  • Service SAS on container
  • Service SAS on blob
  • Account SAS
Tip: If the SAS URI value is provided in the Snap settings, then the settings provided in the account (if any account is attached) are ignored.
  • Simple file operation: Select and set this property to true to allow the Snap carry out the move or copy operation without checking if the source and target are directory or regular file or does not exist. This feature is useful when the account does not have the permissions to list the source and/or target directories.

    Note: This feature is currently supported only for FTP file operations.

Note: If Simple file operation is set to true,
  • The selection for Error if exists property is ignored.

  • None of the logic described in Snap Behavior for Key Operations applies.

  • The exact behavior of the Snap varies depending on the File Server implementation and its administration settings.
  • Az copy absolute path: Specify the directory and wildcard for Azure blob storage for Copy, Move, and Rename operations. For example, "\t" + blob_demo

You can also copy or move the file from your local system to Azure blob container for better performance.

Warning: This field is applicable only if you are using Azure Storage account for this Snap.
Tip:
  • You must install the AzCopy utility in the Snaplex for fetching the path. If the path is null, native Azure Storage SDK is used for all operations. If you specify '/' in this field, the Snap searches for AzCopy in the path or in the specified path and runs the AzCopy command. You can use https as a valid URL, as AzCopy supports httpsfile system. Learn more about AzCopy command.

  • Azure storage SDK does not support file system for https. So, when you do not provide AzCopy absolute path and Source/Target URL is https, the Snap displays an error. For this to work along with the container URL of the Azure blob, you must provide SAS URI under the Advanced properties field set for proper authentication.

Default value: SAS URI

Example: Simple file operation

Values String/Expression Specify a value for the above property.

Default value: None.

Snap execution Dropdown list
Choose one of the three modes in which the Snap executes. Available options are:
  • Validate & Execute: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only: Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default value: Execute only

Example: Validate & Execute

Examples