Splink Fake 1000: Clerical Pairwise Labels — fake_1000

Pairwise clerical labels for the fake_1000 dataset. Each row records whether a pair of records from fake_1000 is a true match (clerical_match_score = 1) or a non-match (clerical_match_score = 0). These labels enable evaluation of model accuracy, ROC curves, and precision-recall metrics.

Usage

fake_1000_labels

Format

A tibble with 3,176 rows and 5 columns:

unique_id_l: Integer. unique_id of the left record.
source_dataset_l: Character. Source dataset name ("fake_1000").
unique_id_r: Integer. unique_id of the right record.
source_dataset_r: Character. Source dataset name ("fake_1000").
clerical_match_score: Numeric. 1 for a match, 0 for a non-match.

Source

From the splink datasets repository maintained by the UK Ministry of Justice Analytical Services: https://github.com/moj-analytical-services/splink_datasets. Original data generated by the splink team (Linacre et al.) under the MIT license.