Remove duplicate records

The following example pipeline demonstrates how to use the Unique Snap to process employee data from a CSV file and remove duplicate records. The data includes employee information such as ID, name, department, location, hire date, and email address.


Unique Snap example pipeline overview

  1. Configure the CSV Generator Snap

    Configure the CSV Generator Snap to generate a CSV dataset containing employee records with fields for employee_id, first_name, last_name, department, location, hire_date, and email. The dataset includes multiple duplicate records.


    CSV with duplicate employee records

  2. Configure the Unique Snap

    Configure the Unique Snap to remove duplicate employee records from the dataset, ensuring each record appears only once.

  3. Validate and view unique output

    On validation, you can view all unique records in the output preview as shown below.


    Unique Snap output preview with deduplicated records

  1. Validate the pipeline to generate the output preview.
  2. Review the output to confirm only unique records remain.