
Iterative proportional fitting (raking)
rake.RdAdjusts survey weights so that weighted marginal distributions match known population targets. Supports automatic variable selection, iterative re-raking, and weight bounding.
Usage
rake(
data,
targets,
base_weights = NULL,
cap = 5,
bounds = NULL,
type = c("nolim", "pctlim", "nlim"),
pctlim = 0.05,
nlim = 5L,
choosemethod = c("total", "max", "average", "totalsquared", "maxsquared",
"averagesquared"),
na_method = c("exclude", "bucket"),
iterate = TRUE,
max_iter = 1000L,
tol = 1e-06,
verbose = FALSE,
diagnostics_every = 0L
)Arguments
- data
A data frame or tibble containing the survey data.
- targets
A named list of named numeric vectors specifying target proportions for each raking variable. Names of the list must match column names in
data. Each vector's names must match the levels of the corresponding variable. Values should sum to 1 (proportions); if not, they are normalized with a warning.- base_weights
Optional numeric vector of base (design) weights. If
NULL(default), uniform weights of 1 are used. Centered to mean 1 before raking.- cap
Maximum weight value (ratio cap). Weights exceeding this value are trimmed and all weights are renormalized. Default
5. Ignored ifboundsis specified.- bounds
Optional numeric vector of length 2,
c(lo, hi), specifying minimum and maximum weight bounds. Overridescap.- type
Variable selection method:
"nolim"(default): use all variables intargets."pctlim": use only variables with discrepancy >=pctlim."nlim": use thenlimmost discrepant variables.
- pctlim
Discrepancy threshold for
type = "pctlim". Default0.05(5 percentage points).- nlim
Number of variables for
type = "nlim". Default5.- choosemethod
Method for aggregating per-category discrepancies into a single variable score. One of
"total","max","average","totalsquared","maxsquared","averagesquared".- na_method
How to handle
NAvalues in raking variables."exclude"(default): targets are proportions among non-NA cases only; NA cases are invisible to that margin. Matches anesrake."bucket": NAs become a frozen extra category; their total weight is preserved and the named targets are rescaled to the remaining non-NA mass.- iterate
Logical. If
TRUEandtype = "pctlim", re-check discrepancies after raking and add newly discrepant variables, repeating up to 10 times. DefaultTRUE.- max_iter
Maximum number of raking iterations. Default
1000.- tol
Convergence tolerance (max proportional error). Default
1e-6.- verbose
Logical. If
TRUE, print iteration progress. DefaultFALSE.- diagnostics_every
Record per-margin diagnostics every
kiterations.0means only baseline. Default0.
Value
An ipf_rake object (S3 class) containing:
weights: final raked weight vectordata: the input data frameconverged: logicaliterations: number of iterationsmax_prop_err: final max proportional errortargets: normalized targets usedvars_used: character vector of variables raked onbase_weights: original base weightstype,choosemethod,na_method,cap: settings useddeff,n_eff: design effect and effective sample sizediagnostics: tibble of per-iteration diagnostics
Examples
data <- data.frame(
gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)),
age = sample(c('young', 'old'), 100, replace = TRUE, prob = c(0.7, 0.3))
)
targets <- list(
gender = c(M = 0.5, F = 0.5),
age = c(young = 0.6, old = 0.4)
)
result <- rake(data, targets)
print(result)
#>
#> ── Raking result (ipf)
#> Converged: Yes (2 iterations, max prop err = 3.13e-08)
#> Variables raked: "gender" and "age"
#> Missing handling: "exclude"
#> Design effect: 1.074 | Effective n: 93 / 100
#> Weight range: [0.735, 1.55] | Mean: 1 | SD: 0.272