Skip to contents

Pairwise clerical labels for the fake_1000 dataset. Each row records whether a pair of records from fake_1000 is a true match (clerical_match_score = 1) or a non-match (clerical_match_score = 0). These labels enable evaluation of model accuracy, ROC curves, and precision-recall metrics.

Usage

fake_1000_labels

Format

A tibble with 3,176 rows and 5 columns:

unique_id_l

Integer. unique_id of the left record.

source_dataset_l

Character. Source dataset name ("fake_1000").

unique_id_r

Integer. unique_id of the right record.

source_dataset_r

Character. Source dataset name ("fake_1000").

clerical_match_score

Numeric. 1 for a match, 0 for a non-match.

Source

From the splink datasets repository maintained by the UK Ministry of Justice Analytical Services: https://github.com/moj-analytical-services/splink_datasets. Original data generated by the splink team (Linacre et al.) under the MIT license.