Skip to contents

Linkage Specification

Define the comparisons and blocking rules that drive the model.

il_spec()
Create an Empty Linkage Specification
il_compare()
Add a Comparison Layer to a Specification
il_block_on()
Add a Prediction Blocking Rule
block_on()
Create a Training-Time Blocking Rule
il_transform()
Compose Multiple Transforms into a Chain
il_substr()
Extract a Substring Column Transform
il_regex_extract()
Regex Extraction Column Transform
il_nullif()
Replace a Value with NA Column Transform
il_cast_to_string()
Cast to String Column Transform
il_try_parse_date()
Try-Parse Date Column Transform
il_try_parse_timestamp()
Try-Parse Timestamp Column Transform
il_array_element()
Array Element Column Transform
is_il_spec()
Test if an Object is an irelink Specification

Comparison Levels

Building blocks for scoring how similar two records are on a given field.

cl_exact()
Exact Equality Comparison
cl_levenshtein()
Levenshtein Edit-Distance Comparison
cl_damerau_levenshtein()
Damerau-Levenshtein Edit-Distance Comparison
cl_jaro()
Jaro String Similarity Comparison
cl_jaro_winkler()
Jaro-Winkler String Similarity Comparison
cl_jaccard()
Jaccard Set Similarity Comparison
cl_cosine()
Cosine Similarity Comparison
cl_numeric_diff()
Numeric Absolute Difference Comparison
cl_pct_diff()
Numeric Percentage Difference Comparison
cl_date_diff()
Date Difference Comparison
cl_time_diff()
Time Difference Comparison
cl_geo_distance()
Geographic Distance Comparison
cl_array_intersect()
Array Intersection Comparison
cl_array_subset()
Array Subset Comparison
cl_array_min_distance()
Pairwise Array Minimum Distance Comparison
cl_columns_reversed()
Swap Detection for Two Columns
cl_custom()
Custom SQL Comparison
cl_literal()
Literal Value Comparison
cl_null()
Null / Missing Value Level
cl_else()
Catch-All Else Level
cl_levels()
Compose Custom Comparison Levels
cl_and()
Combine Comparison Conditions with AND
cl_or()
Combine Comparison Conditions with OR
cl_not()
Negate a Comparison Condition

Phonetic Transforms

Phonetic encoding functions for blocking and comparison on name fields.

il_soundex() il_metaphone() il_dmetaphone()
Phonetic Transform Functions
cl_soundex()
Soundex Phonetic Comparison

Domain-Specific Comparisons

Pre-configured multi-level comparisons for common field types.

cl_name()
Personal Name Comparison
cl_first_last_name()
First Name and Last Name Comparison with Swap Detection
cl_forename_surname()
Forename and Surname Comparison with Swap Detection
cl_dob()
Date of Birth Comparison
cl_email()
Email Address Comparison
cl_zip_code()
ZIP Code Comparison
cl_postcode()
Postcode Comparison

Model Fitting

Create and train a probabilistic linkage model.

il_model()
Create a Linkage Model
il_estimate_u()
Estimate Non-Match (u) Parameters
il_estimate_em()
Train Parameters via Expectation-Maximization
il_estimate_prior()
Estimate the Prior Match Probability
il_prior_prevalence()
Add a Prevalence Prior
il_prior_m()
Add a Matched-Class Comparison Prior
il_constrain_m()
Add a Fixed Matched-Class Constraint
il_estimate_m_from_column()
Estimate Match (m) Parameters from a Label Column
il_estimate_m_from_labels()
Estimate Match (m) Parameters from Labeled Data
is_il_model()
Test if an Object is an irelink Model

Prediction and Clustering

Score record pairs and resolve them into linked entities.

predict(<il_model>)
Score Record Pairs from a Trained Model
il_cluster()
Cluster Scored Pairs into Entities
il_deterministic_link()
Deterministic Record Linkage
il_find_matches()
Find Matches for New Records
il_score_patterns()
Score Comparison Patterns
il_score_missing_edges()
Score Missing Edges Within Clusters

Model Inspection

Examine parameters, weights, and training diagnostics.

il_parameters()
Extract Model Parameters
il_priors()
Inspect Model Priors
il_constraints()
Inspect Model Constraints
il_weights()
Extract Match Weights by Comparison Level
il_training_history()
Extract EM Training History
autoplot(<il_training_history>)
Plot EM Training History
il_waterfall()
Extract Waterfall Data for a Single Pair
il_compare_records()
Compare Two Individual Records
il_string_similarity()
Compute String Similarity Scores
autoplot(<il_string_similarity>)
Comparator Score Bar Chart
il_tf_chart()
Term Frequency Adjustment Chart
il_comparison_vectors()
Comparison Vector Distribution
autoplot(<il_comparison_vectors>)
Plot Comparison Vector Distribution
print(<il_model>)
Print an irelink Model
print(<il_spec>)
Print an irelink Specification
summary(<il_model>)
Summarize an irelink Model
autoplot(<il_model>)
Quick Match-Weights Plot for a Model
autoplot(<il_compared>)
Quick Plot for Scored Pairs

Evaluation

Assess model quality against labeled data.

il_accuracy()
Accuracy Metrics Across Thresholds
il_confusion_matrix()
Confusion Matrix at a Threshold
il_cluster_confusion_matrix()
Cluster-Level Confusion Matrix for Deduplication
labels_from_column()
Derive Pairwise Labels from a Ground-Truth Column
il_precision_recall()
Compute Precision-Recall Curve Data
il_roc()
Compute ROC Curve Data
il_errors()
Identify Prediction Errors
il_unlinkables()
Compute Unlinkable Records
il_graph_metrics()
Compute Graph Metrics for Clusters
autoplot(<il_accuracy>)
Plot Accuracy Metrics Across Thresholds
autoplot(<il_roc>)
Plot ROC Curve
autoplot(<il_precision_recall>)
Plot Precision–Recall Curve
autoplot(<il_unlinkables>)
Plot Unlinkables Curve

Blocking

Suggest and evaluate blocking rules to reduce candidate pairs.

il_suggest_blocking()
Suggest Blocking Rules
il_find_blocking_below()
Find Blocking Rules Below a Pair-Count Threshold
block_from_labels()
Derive Blocking Rules from Labeled Pairs

Data Profiling

Explore and summarize input data before linkage.

il_completeness()
Column Completeness Across Datasets
autoplot(<il_completeness>)
Plot Column Completeness
il_count_pairs()
Count Candidate Pairs Under Blocking Rules
autoplot(<il_count_pairs>)
Plot Blocking Rule Pair Counts
il_largest_blocks()
Identify the Largest Blocking Bins
il_profile()
Profile Column Value Distributions
autoplot(<il_profile>)
Plot Column Value Profiles
il_comparator_score()
Batch String Similarity Scores
autoplot(<il_comparator_score>)
Plot Batch Comparator Scores
il_comparator_threshold_chart()
Comparator Score Threshold Chart
il_phonetic_chart()
Phonetic Match Chart
il_register_tf()
Register Pre-Computed Term Frequency Tables

Persistence and Utilities

Save, load, and manage linkage models and resources.

il_save()
Save a Model to Disk
il_load()
Load a Saved Model
il_attach()
Attach a Saved Model to Fresh Data
il_cleanup()
Remove Model-Owned Temporary Tables from Database
il_cleanup_all()
Remove All irelink Temporary Tables from a Database
irelink irelink-package
irelink: Fast Probabilistic Record Linkage

Datasets

Bundled benchmark datasets from the splink ecosystem.

fake_20
Fake 20: Minimal Deduplication Example
fake_1000
Splink Fake 1000: Deduplication Benchmark
fake_1000_labels
Splink Fake 1000: Clerical Pairwise Labels
febrl4a
FEBRL 4a: Record Linkage Original Records
febrl4b
FEBRL 4b: Record Linkage Duplicate Records

Unit Helpers

Lightweight constructors for physical and temporal units.

days()
Create a Duration in Days
months()
Create a Duration in Months
years()
Create a Duration in Years
hours()
Create a Duration in Hours
minutes()
Create a Duration in Minutes
seconds()
Create a Duration in Seconds
km()
Create a Distance in Kilometres
mi()
Create a Distance in Miles