Skip to contents

Performs fast, scalable probabilistic record linkage and deduplication using the Fellegi-Sunter model. Records lacking a shared unique identifier are compared across configurable dimensions using exact, fuzzy, and distance-based comparisons, with model parameters estimated via unsupervised Expectation-Maximization. Multiple SQL backends are supported through 'DBI', enabling execution from laptop-scale ('DuckDB') through to distributed engines. This package is a translation of the Python 'splink' library by Linacre et al. into idiomatic R.

Author

Maintainer: Christopher T. Kenny ctkenny@proton.me (ORCID) [copyright holder]

Other contributors:

  • Robin Linacre (Lead author of splink, the Python package this is derived from) [copyright holder]

  • Sam Lindsay (Author of splink) [copyright holder]

  • Theodore Manassis (Author of splink) [copyright holder]

  • Tom Hepworth (Author of splink) [copyright holder]

  • Andy Bond (Author of splink) [copyright holder]

  • Ross Kennedy (Author of splink) [copyright holder]

  • UK Ministry of Justice (Copyright holder of splink) [copyright holder]