Pairwise clerical labels for the fake_1000 dataset.
Each row records whether a pair of records from fake_1000 is a true
match (clerical_match_score = 1) or a non-match
(clerical_match_score = 0).
These labels enable evaluation of model accuracy, ROC curves, and
precision-recall metrics.
Format
A tibble with 3,176 rows and 5 columns:
- unique_id_l
Integer.
unique_idof the left record.- source_dataset_l
Character. Source dataset name (
"fake_1000").- unique_id_r
Integer.
unique_idof the right record.- source_dataset_r
Character. Source dataset name (
"fake_1000").- clerical_match_score
Numeric. 1 for a match, 0 for a non-match.
Source
From the splink datasets repository maintained by the UK Ministry of Justice Analytical Services: https://github.com/moj-analytical-services/splink_datasets. Original data generated by the splink team (Linacre et al.) under the MIT license.
