
Estimate Match (m) Parameters from a Label Column
Source:R/il_estimate_m_from_column.R
il_estimate_m_from_column.RdLearns the m probabilities from a ground-truth identifier column
(e.g., Social Security Number) present in the input data. Records
sharing the same label value are treated as true matches. This is an
alternative to il_estimate_m_from_labels(), which requires a
separate table of pairwise labels.
Examples
con <- DBI::dbConnect(duckdb::duckdb())
spec <- il_spec() |>
il_compare(first_name, cl_jaro_winkler(0.9, 0.7)) |>
il_compare(surname, cl_exact()) |>
il_compare(dob, cl_exact()) |>
il_block_on(surname)
model <- il_model(fake_20, spec = spec, con = con)
model <- il_estimate_u(model)
model <- il_estimate_m_from_column(model, city)
DBI::dbDisconnect(con, shutdown = TRUE)