Matching Country Records Using the Match Snap

This example demonstrates how to match country records across two datasets using the Match Snap by comparing country names and capitals.

When working with datasets from different sources, entities such as countries may appear with slight variations in naming or formatting. This example demonstrates how the Match Snap helps identify matching records between two datasets based on country name and capital.

Download this pipeline.

Create the input datasets.
Use two CSV Generator Snaps to create input datasets:
- Dataset 1: Contains fields such as country URL, name, capital, and area.
- Dataset 2: Contains fields such as country ID, name, capital, and area.
Configure the Match Snap to compare records.
Connect the two input datasets to the Match Snap. Configure it to match records based on $country and $capital fields.

Enable all three output views of the Snap:
- First output: Matched records
- Second output: Unmatched records from the first input
- Third output: Unmatched records from the second input
Copy and process matched records.
Use a Copy Snap to duplicate the matched records stream.

From the first copy:
- Use a Mapper Snap to retain country fields and confidence score.
- Sort the results by confidence using a Sort Snap.
- Write the results to a file using the File Writer Snap.
From the second copy:
- Use a Mapper Snap to retain only the matched country and capital fields.
- Write the cleaned matched results to a file.
Write unmatched records to separate files.

Use File Writer Snaps to write the unmatched records from both datasets (second and third output views) to separate files.
Adjust the threshold to optimize matches.
Tune the Threshold property in the Match Snap to control sensitivity:
- Decrease the threshold to increase matches, but be cautious of false positives.
- Increase the threshold for higher confidence but fewer matches.
For example, lowering the threshold from 0.8 to 0.5 may include unreliable matches. You may determine that 0.51 is the lowest acceptable threshold.

To successfully reuse pipelines:

Download and import the pipeline in to the SnapLogic Platform.
Configure Snap accounts, as applicable.
Provide pipeline parameters, as applicable.