This example demonstrates how to match country records across two datasets using the
Match Snap by comparing country names and
capitals.
When working with datasets from different sources, entities such as countries may appear
with slight variations in naming or formatting. This example demonstrates how the Match Snap helps identify matching records
between two datasets based on country name and capital.
Download this pipeline.
-
Create the input datasets.
Use two
CSV Generator
Snaps to create
input datasets:
-
Dataset 1: Contains fields such as country URL, name, capital, and area.
-
Dataset 2: Contains fields such as country ID, name, capital, and area.
-
Configure the Match Snap to compare records.
Connect the two input datasets to the
Match Snap. Configure it to match records
based on
$country and
$capital fields.
Enable all three output views of the Snap:
- First output: Matched records
- Second output: Unmatched records from the first input
- Third output: Unmatched records from the second input
-
Copy and process matched records.
Use a
Copy
Snap to duplicate the matched
records stream.
From the first copy:
- Use a Mapper Snap to retain country fields and confidence score.
-
- Sort the results by confidence using a Sort Snap.
- Write the results to a file using the
File Writer
Snap.
From the second copy:
- Use a Mapper Snap to retain only the
matched country and capital fields.
- Write the cleaned matched results to a file.
-
Write unmatched records to separate files.
Use
File Writer
Snaps to write the unmatched
records from both datasets (second and third output views) to separate files.
-
Adjust the threshold to optimize matches.
Tune the Threshold property in the Match Snap to control sensitivity:
-
Decrease the threshold to increase matches, but be cautious of false positives.
-
Increase the threshold for higher confidence but fewer matches.
For example, lowering the threshold from 0.8 to 0.5 may include unreliable matches. You may determine that 0.51 is the lowest acceptable threshold.
To successfully reuse pipelines:
- Download and import the pipeline in to the SnapLogic Platform.
- Configure Snap accounts, as applicable.
- Provide pipeline parameters, as applicable.