Binds one or more datasets to a specification and a database connection, producing an untrained model. Accepts in-memory data frames, dbplyr::tbl_lazy table references, or character table names for data that already lives in a database.
Usage
il_model(
.data,
...,
spec,
con = NULL,
link_type = c("dedupe", "link", "link_and_dedupe")
)Arguments
- .data
A data frame,
tibble::tibble(), dbplyr::tbl_lazy, or character table name. The first (or only) input dataset. If nounique_idcolumn is present, one is generated automatically.- ...
Additional datasets for multi-table linkage (same types as
.data).- spec
An
il_specobject built withil_spec(),il_compare(), andil_block_on().- con
A DBI connection object from
DBI::dbConnect()(e.g., fromDBI::dbConnect(duckdb::duckdb())). Optional when.datais a dbplyr::tbl_lazy, the connection is extracted from the table reference.- link_type
One of
"dedupe"(default),"link", or"link_and_dedupe".
Details
When .data is a dbplyr::tbl_lazy (from dplyr::tbl()), the connection
is extracted automatically and data stays in-database with zero
copying. A unique_id column is injected automatically if not
already present.
Examples
con <- DBI::dbConnect(duckdb::duckdb())
spec <- il_spec() |>
il_compare(first_name, cl_jaro_winkler(0.9, 0.7)) |>
il_block_on(surname)
model <- il_model(fake_20, spec = spec, con = con)
# Database-backed: pass a dbplyr reference directly
DBI::dbWriteTable(con, 'my_data', fake_20, overwrite = TRUE)
tbl_ref <- dplyr::tbl(con, 'my_data')
model2 <- il_model(tbl_ref, spec = spec)
DBI::dbDisconnect(con, shutdown = TRUE)
