Research

Research in American Politics and Political Methodology

Published

April 5, 2024

Publications

Evaluating Bias and Noise Induced by the U.S. Census Bureau’s Privacy Protection Methods

(with Shiro Kuriwaki, Cory McCartan, Tyler Simko, and Kosuke Imai). Forthcoming. Science Advances.

BibTeX
@misc{kenn:etal:24b,
      title={Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods}, 
      author={Christopher T. Kenny and Shiro Kuriwaki and Cory McCartan and Tyler Simko and Kosuke Imai},
      year={2023},
      eprint={2306.07521},
      archivePrefix={arXiv},
      primaryClass={cs.CY}
}
Abstract The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau’s two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the 1990, 2000, and 2010 Censuses. Our evaluation leverages the recent release of the Noisy Measure File (NMF) as well as the availability of two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful alone, especially for Hispanic and multiracial populations. TopDown’s post-processing dramatically reduces the NMF noise and produces similarly accurate data to swapping in terms of bias and noise. These patterns hold across census geographies with varying population sizes and racial diversity. While the estimated errors for both TopDown and swapping are generally no larger than other sources of Census error, they can be relatively substantial for geographies with small total populations.

Census Officials Must Constructively Engage with Independent Evaluations

(with Cory McCartan, Tyler Simko, and Kosuke Imai). 2024. PNAS.

BibTeX
@article{kenn:etal:24a,
  title={Census officials must constructively engage with independent evaluations},
  author={Kenny, Christopher T. and McCartan, Cory and Simko, Tyler and Imai, Kosuke},
  journal={Proceedings of the National Academy of Sciences},
  volume={121},
  number={11},
  pages={e2321196121},
  year={2024},
  publisher={National Acad Sciences}
}
First paragraph Current and former Census Bureau officials Jarmin et al. argue that differential privacy, which underlies the 2020 Census’s Disclosure Avoidance System (DAS), satisfies more desirable theoretical criteria than alternatives. They provide detailed criticisms of many published evaluations of the 2020 DAS, including our work. In this letter, we show that their criticisms are unfounded, grossly mischaracterize our research, and ignore critical issues that merit public discussion.

Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition

(with Cory McCartan, Tyler Simko, Shiro Kuriwaki, and Kosuke Imai). 2023. PNAS.

BibTeX
@article{kenn:etal:23b,
author = {Christopher T. Kenny and Cory McCartan and Tyler Simko and Shiro Kuriwaki and Kosuke Imai},
title = {Widespread partisan gerrymandering mostly cancels nationally, but reduces electoral competition},
journal = {Proceedings of the National Academy of Sciences},
volume = {120},
number = {25},
pages = {e2217322120},
year = {2023},
doi = {10.1073/pnas.2217322120},
URL = {https://www.pnas.org/doi/abs/10.1073/pnas.2217322120},
eprint = {https://www.pnas.org/doi/pdf/10.1073/pnas.2217322120},
}
Abstract Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To isolate the electoral impact of gerrymandering from the effects of other factors including geography and redistricting rules, we compare predicted election outcomes under the enacted plan with those under a large sample of non-partisan, simulated alternative plans for all states. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the bias it creates cancels at the national level, giving Republicans two additional seats, on average. In contrast, moderate pro-Republican bias due to geography and redistricting rules remains. Finally, we find that partisan gerrymandering reduces electoral competition and makes the House’s partisan composition less responsive to shifts in the national vote.

Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System

(with Shiro Kuriwaki, Cory McCartan, Evan T. R. Rosenman, and Tyler Simko). 2023. Harvard Data Science Review.

BibTeX
@article{kenn:etal:23,
    author = {Kenny, Christopher T. and Kuriwaki, Shiro and McCartan, Cory and Rosenman, Evan T. R. and Simko, Tyler and Imai, Kosuke},
    journal = {Harvard Data Science Review},
    number = {Special Issue 2},
    year = {2023},
    month = {jan 31},
    note = {https://hdsr.mitpress.mit.edu/pub/6ffzuq19},
    publisher = {},
    title = {Comment: The {Essential} {Role} of {Policy} {Evaluation} for the 2020 {Census} {DisclosureAvoidance} {System}},
    volume = { },
}
Abstract In “Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau’s Use of Differential Privacy,” boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary, we explain why policy evaluation, which was the main goal of our analysis, is still meaningful without access to a perfect ground truth. We also point out that our evaluation leveraged features specific to the decennial Census and redistricting data, such as block-level population invariance under swapping and voter file racial identification, better approximating a comparison with the ground truth. Lastly, we show that accurate statistical predictions of individual race based on the Bayesian Improved Surname Geocoding, while not a violation of differential privacy, substantially increases the disclosure risk of private information the Census Bureau sought to protect. We conclude by arguing that policy makers must confront a key trade-off between data utility and privacy protection, and an epistemic disconnect alone is insufficient to explain disagreements between policy choices.

Simulated redistricting plans for the analysis and evaluation of redistricting in the United States

(with Cory McCartan, Tyler Simko, George Garcia III, Kevin Wang, Melissa Wu, Shiro Kuriwaki, and Kosuke Imai). 2022. Scientific Data.

BibTeX
@article{50statesSimulations,
  title = {Simulated Redistricting Plans for the Analysis and Evaluation of Redistricting in the {{United States}}},
  author = {McCartan, Cory and Kenny, Christopher T. and Simko, Tyler and Garcia, George and Wang, Kevin and Wu, Melissa and Kuriwaki, Shiro and Imai, Kosuke},
  year = {2022},
  month = nov,
  journal = {Scientific Data},
  volume = {9},
  number = {1},
  pages = {689},
  issn = {2052-4463},
  doi = {10.1038/s41597-022-01808-2},
  abstract = {This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.}
}
Abstract This article introduces the 50stateSimulations, a collection of simulated congressional districting plans and underlying code developed by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. The 50stateSimulations allow for the evaluation of enacted and other congressional redistricting plans in the United States. While the use of redistricting simulation algorithms has become standard in academic research and court cases, any simulation analysis requires non-trivial efforts to combine multiple data sets, identify state-specific redistricting criteria, implement complex simulation algorithms, and summarize and visualize simulation outputs. We have developed a complete workflow that facilitates this entire process of simulation-based redistricting analysis for the congressional districts of all 50 states. The resulting 50stateSimulations include ensembles of simulated 2020 congressional redistricting plans and necessary replication data. We also provide the underlying code, which serves as a template for customized analyses. All data and code are free and publicly available. This article details the design, creation, and validation of the data.

The use of differential privacy for census data and its impact on redistricting: The case of the 2020 U.S. Census

(with Shiro Kuriwaki, Cory McCartan, Evan T. R. Rosenman, and Tyler Simko). 2021. Science Advances.

Covered by The Washington Post, Associated Press, NC Policy Watch, and The Harvard Crimson.

BibTeX
@article{kenn:etal:21,
author = {Christopher T. Kenny  and Shiro Kuriwaki  and Cory McCartan  and Evan T. R. Rosenman  and Tyler Simko  and Kosuke Imai },
title = {The Use of Differential Privacy for Census Data and its Impact on Redistricting: The Case of the 2020 U.S. Census},
journal = {Science Advances},
volume = {7},
number = {41},
pages = {eabk3283},
year = {2021},
doi = {10.1126/sciadv.abk3283},
URL = {https://www.science.org/doi/abs/10.1126/sciadv.abk3283},
eprint = {https://www.science.org/doi/pdf/10.1126/sciadv.abk3283},
}
Abstract The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS), which attempts to achieve differential privacy guarantees by adding noise to the Census microdata. By applying redistricting simulation and analysis methods to DAS-protected 2010 Census data, we find that the protected data are not of sufficient quality for redistricting purposes. We demonstrate that the injected noise makes it impossible for states to accurately comply with the One Person, One Vote principle. Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition, and that these biases lead to large and unpredictable errors in the analysis of partisan and racial gerrymanders. Finally, we show that the DAS algorithm does not universally protect respondent privacy. Based on the names and addresses of registered voters, we are able to predict their race as accurately using the DAS-protected data as when using the 2010 Census data. Despite this, the DAS-protected data can still inaccurately estimate the number of majority-minority districts. We conclude with recommendations for how the Census Bureau should proceed with privacy protection for the 2020 Census.

The Essential Role of Empirical Validation in Legislative Redistricting Simulation

(with Benjamin Fifield, Kosuke Imai, and Jun Kawahara). 2020. Statistics and Public Policy.

BibTeX
@article{fife:etal:20,
  author = {Benjamin Fifield and Kosuke Imai and Jun Kawahara and Christopher T. Kenny},
  title = {The Essential Role of Empirical Validation in Legislative Redistricting Simulation},
  journal = {Statistics and Public Policy},
  volume = {7},
  number = {1},
  pages = {52-68},
  year  = {2020},
  publisher = {Taylor & Francis},
  doi = {10.1080/2330443X.2020.1791773},
  URL = {https://doi.org/10.1080/2330443X.2020.1791773},
  eprint = {https://doi.org/10.1080/2330443X.2020.1791773},
}
Abstract As granular data about elections and voters become available, redistricting simulation methods are playing an increasingly important role when legislatures adopt redistricting plans and courts determine their legality. These simulation methods are designed to yield a representative sample of all redistricting plans that satisfy statutory guidelines and requirements such as contiguity, population parity, and compactness. A proposed redistricting plan can be considered gerrymandered if it constitutes an outlier relative to this sample according to partisan fairness metrics. Despite their growing use, an insufficient effort has been made to empirically validate the accuracy of the simulation methods. We apply a recently developed computational method that can efficiently enumerate all possible redistricting plans and yield an independent sample from this population. We show that this algorithm scales to a state with a couple of hundred geographical units. Finally, we empirically examine how existing simulation methods perform on realistic validation datasets.

Working Papers

Individual and Differential Harm in Redistricting

(with Cory McCartan). Current version: 2022-06-24.

BibTeX
@misc{mcca:kenn:22,
  doi = {10.31235/osf.io/nc2x7},
  url = {https://osf.io/preprints/socarxiv/nc2x7/},
  author = {McCartan, Cory and Kenny, Christopher T.},
  keywords = {representation, redistricting, voting rights, individual harm},
  title = {Individual and Differential Harm in Redistricting},
  publisher = {SocArXiv},
  year = {2022}
}
Abstract Social scientists have developed dozens of measures for assessing partisan bias in redistricting.But these measures cannot be easily adapted to other groups, including those defined by race, class, or geography. Nor are they applicable to single- or no-party contexts such as local redistricting. To overcome these limitations, we propose a unified framework of harm for evaluating the impacts of a districting plan on individual voters and the groups to which they belong. We consider a voter harmed if their chosen candidate is not elected under the current plan, but would be under a different plan. Harm improves on existing measures by both focusing on the choices of individual voters and directly incorporating counterfactual plans. We discuss strategies for estimating harm, and demonstrate the utility of our framework through analyses of partisan gerrymandering in New Jersey, voting rights litigation in Alabama, and racial dynamics of Boston City Council elections.

Works-in-Progress

Inequality in Administrative Democracy: Large-Sample Evidence from American Financial Regulation

(with Daniel P. Carpenter, Angelo Dagonel, Devin Judge-Lord, Brian Libgober, Steven Rashin, Jacob Waggoner, and Susan Webb Yackee)

Awarded the 2021 Herbert Kaufman Award.

Algorithm-Assisted Redistricting Methodology

(with Kosuke Imai, Cory McCartan, and Tyler Simko). Book project.