Matching Country Records Using the Match Snap

This example demonstrates how to match country records across two datasets using the Match Snap by comparing country names and capitals.

When working with datasets from different sources, entities such as countries may appear with slight variations in naming or formatting. This example demonstrates how the Match Snap helps identify matching records between two datasets based on country name and capital.


Match Pipeline

Download this pipeline.

  1. Create the input datasets.

    Use two CSV Generator Snaps to create input datasets:

    • Dataset 1: Contains fields such as country URL, name, capital, and area.
      Country Dataset 1 Configuration


      Country Dataset 1 Configuration

    • Dataset 2: Contains fields such as country ID, name, capital, and area.
      Country Dataset 2 Configuration


    Country Dataset 1 Configuration

  2. Configure the Match Snap to compare records.
    Connect the two input datasets to the Match Snap. Configure it to match records based on $country and $capital fields.
    Match Snap Configuration

    Enable all three output views of the Snap:

    • First output: Matched records
      Match Output preview

    • Second output: Unmatched records from the first input
      Match Output preview

    • Third output: Unmatched records from the second input
      Match Output 2 preview

  3. Copy and process matched records.

    Use a Copy Snap to duplicate the matched records stream.

    From the first copy:

    • Use a Mapper Snap to retain country fields and confidence score.

    • Mapper Snap

    • Sort the results by confidence using a Sort Snap.
      Sort Snap

    • Write the results to a file using the File Writer Snap.
      Mapper Snap

    From the second copy:

    • Use a Mapper Snap to retain only the matched country and capital fields.
      Remove Confidence Mapper Snap

    • Write the cleaned matched results to a file.
  4. Write unmatched records to separate files.

    Use File Writer Snaps to write the unmatched records from both datasets (second and third output views) to separate files.

  5. Adjust the threshold to optimize matches.

    Tune the Threshold property in the Match Snap to control sensitivity:

    • Decrease the threshold to increase matches, but be cautious of false positives.
    • Increase the threshold for higher confidence but fewer matches.

    For example, lowering the threshold from 0.8 to 0.5 may include unreliable matches. You may determine that 0.51 is the lowest acceptable threshold.

To successfully reuse pipelines:
  1. Download and import the pipeline in to the SnapLogic Platform.
  2. Configure Snap accounts, as applicable.
  3. Provide pipeline parameters, as applicable.