Software
Packages on CRAN
Packages for redistricting
redist: Simulation Methods for Legislative Redistricting
(with Cory McCartan, Ben Fifield, and Kosuke Imai)
Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023), the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020), the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020), the Merge-split/Recombination algorithms of Carter et al. (2019) and DeFord et al. (2021), and the Short-burst optimization algorithm of Cannon et al. (2020).
redistmetrics: Redistricting metrics
(with Cory McCartan, Ben Fifield, and Kosuke Imai)
Reliable and flexible tools for scoring redistricting plans using common measures and metrics. These functions provide key direct access to tools useful for non-simulation analyses of redistricting plans, such as for measuring compactness or partisan fairness. Tools are designed to work with the 'redist' package seamlessly.
geomander: Geographic Tools for Studying Gerrymandering
A compilation of tools to complete common tasks for studying gerrymandering. This focuses on the geographic tool side of common problems, such as linking different levels of spatial units or estimating how to break up units. Functions exist for creating redistricting-focused data for the US.
alarmdata: Download, Merge, and Process Redistricting Data
(with Cory McCartan, Tyler Simko, Michael Zhao, and Kosuke Imai)
Utility functions to download and process data produced by the ALARM Project, including 2020 redistricting files Kenny and McCartan (2021) https://alarm-redist.org/posts/2021-08-10-census-2020/ and the 50-State Redistricting Simulations of McCartan, Kenny, Simko, Garcia, Wang, Wu, Kuriwaki, and Imai (2022). The package extends the data introduced in McCartan, Kenny, Simko, Garcia, Wang, Wu, Kuriwaki, and Imai (2022) to also include states with only a single district.
redistverse: Easily Install and Load Redistricting Software
(with Cory McCartan)
Easy installation, loading, and control of packages for redistricting data downloading, spatial data processing, simulation, analysis, and visualization. This package makes it easy to install and load multiple 'redistverse' packages at once. The 'redistverse' is developed and maintained by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. For more details see https://alarm-redist.org.
baf: Block Assignment Files
Download and read US Census Bureau data relationship files. Provides support for cleaning and using block assignment files since 2010, as described in https://www.census.gov/geographies/reference-files/time-series/geo/block-assignment-files.html. Also includes support for working with block equivalency files, used for years outside of decennial census years.
Packages for working with Census Bureau data
PL94171: Tabulate P.L. 94-171 Redistricting Data Summary Files
(with Cory McCartan)
Tools to process legacy format summary redistricting data files produced by the United States Census Bureau pursuant to P.L. 94-171. These files are generally available earlier but are difficult to work with as-is.
censable: Making Census Data More Usable
Creates a common framework for organizing, naming, and gathering population, age, race, and ethnicity data from the Census Bureau. Accesses the API https://www.census.gov/data/developers/data-sets.html. Provides tools for adding information to existing data to line up with Census data.
tinytiger: Lightweight Interface to TIGER/Line Shapefiles
(with Cory McCartan)
Download geographic shapes from the United States Census Bureau TIGER/Line Shapefiles https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html. Functions support downloading and reading in geographic boundary data. All downloads can be set up with a cache to avoid multiple downloads. Data is available back to 2000 for most geographies.
cvap: Citizen Voting Age Population
Works with the Citizen Voting Age Population special tabulation from the US Census Bureau https://www.census.gov/programs-surveys/decennial-census/about/voting-rights/cvap.html. Provides tools to download and process raw data. Also provides a downloading interface to processed data. Implements a very basic approach to estimate block level citizen voting age population from block group data.
ppmf: Read Census Privacy Protected Microdata Files
Implements data processing described in to align modern differentially private data with formatting of older US Census data releases. The primary goal is to read in Census Privacy Protected Microdata Files data in a reproducible way. This includes tools for aggregating to relevant levels of geography by creating geographic identifiers which match the US Census Bureau's numbering. Additionally, there are tools for grouping race numeric identifiers into categories, consistent with OMB (Office of Management and Budget) classifications. Functions exist for downloading and linking to existing sources of privacy protected microdata.
apportion: Apportion Seats
Convert populations into integer number of seats for legislative bodies. Implements apportionment methods used historically and currently in the United States for reapportionment after the Census, as described in https://www.census.gov/history/www/reference/apportionment/methods_of_apportionment.html.
Packages for plotting data
dots: Dot Density Maps
Generate point data for representing people within spatial data. This collects a suite of tools for creating simple dot density maps. Several functions from different spatial packages are standardized to take the same arguments so that they can be easily substituted for each other.
ggredist: Scales, Palettes, and Extensions of ggplot2 for Redistricting
(with Cory McCartan)
Provides 'ggplot2' extensions for political map making. Implements new geometries for groups of simple feature geometries. Adds palettes and scales for red to blue color mapping and for discrete maps. Implements tools for easy label generation and placement, automatic map coloring, and themes.
crayons: Color Palettes from Crayon Boxes
Provides color palettes based on crayon colors since the early 1900s. Colors are based on various crayon colors, sets, and promotional palettes, most of which can be found at https://en.wikipedia.org/wiki/List_of_Crayola_crayon_colors. All palettes are discrete palettes and are not necessarily color-blind friendly. Provides scales for 'ggplot2' for discrete coloring.
palette: Color Scheme Helpers
Hexadecimal codes are typically used to represent colors in R. Connecting these codes to their colors requires practice or memorization. 'palette' provides a 'vctrs' class for working with color palettes, including printing and plotting functions. The goal of the class is to place visual representations of color palettes directly on or, at least, next to their corresponding character representations. Palette extensions also are provided for data frames using 'pillar'.
Packages interfacing with API services
congress: Access the Congress.gov API
Download and read data on United States congressional proceedings. Data is read from the Library of Congress's Congress.gov Application Programming Interface (https://github.com/LibraryOfCongress/api.congress.gov/). Functions exist for all version 3 endpoints, including for bills, amendments, congresses, summaries, members, reports, communications, nominations, and treaties.
feltr: Access the Felt API
Upload, download, and edit internet maps with the Felt API (https://feltmaps.notion.site/Felt-Public-API-reference-c01e0e6b0d954a678c608131b894e8e1). Allows users to create new maps, edit existing maps, and extract data. Provides tools for working with layers, which represent geographic data, and elements, which are interactive annotations. Spatial data accessed from the API is transformed to work with 'sf'.
planscorer: Score Redistricting Plans with PlanScore
Provides access to the 'PlanScore' Application Programming Interface (https://github.com/PlanScore/PlanScore/blob/main/API.md) for scoring redistricting plans. Allows for upload of plans from block assignment files and shape files. For shapes in memory, such as from 'sf' or 'redist', it processes them to save and upload. Includes tools for tidying responses and saving output from the website.
gptzeror: Identify Text Written by Large Language Models using GPTZero
An R interface to the 'GPTZero' API (https://gptzero.me/docs). Allows users to classify text into human and computer written with probabilities. Formats the data into data frames where each sentence is an observation. Paragraph-level and document-level predictions are organized to align with the sentences.
bskyr: Interact with Bluesky Social
Collect data from and make posts on 'Bluesky' Social via the Hypertext Transfer Protocol (HTTP) Application Programming Interface (API), as documented at https://atproto.com/specs/xrpc. This further supports broader queries to the Authenticated Transfer (AT) Protocol https://atproto.com/ which 'Bluesky' Social relies on. Data is returned in a tidy format and posts can be made using a simple interface.
Other R packages
divseg: Compute Diversity and Segregation Indices
Implements common measures of diversity and spatial segregation. This package has tools to compute the majority of measures are reviewed in Massey and Denton (1988). Multiple common measures of within-geography diversity are implemented as well. All functions operate on data frames with a 'tidyselect' based workflow.
name: Tools for Working with Names
A system for organizing column names in data. Aimed at supporting a prefix-based and suffix-based column naming scheme. Extends 'dplyr' functionality to add ordering by function and more explicit renaming.
jot: Jot Down Notes for Later
Reproducible work requires a record of where every statistic originated. When writing reports, some data is too big to load in the same environment and some statistics take a while to compute. This package offers a way to keep notes on statistics, simple functions, and small objects. Notepads can be locked to avoid accidental updates. Notepads keep track of who added the notes and when the notes were added. A simple text representation is used to allow for clear version histories.
opengraph: Process Metadata from the Open Graph Protocol
Social media sites often embed cards when links are shared, based on metadata in the Open Graph Protocol (https://ogp.me/). This supports extracting that metadata from a website. It further allows for the creation of tags to add to a website to support the Open Graph Protocol and provides a list of the standard tags and their required properties.
Packages on GitHub
ei: Ecological Inference
(with Shusei Eshima, Gary King, and Molly Roberts)
Software accompanying Gary King's book: A Solution to the Ecological Inference Problem. (1997). Princeton University Press. ISBN 978-0691012407.
redistio: Interactive Redistricting
A point and click editor for districts built on 'shiny' and 'Leaflet'. Users can draw districts and calculate standard redistricting metrics, like compactness or the number of administrative splits. Maps can be exported as assignment files or shapefiles, readable by most other redistricting software.
ThemePark: Themes for 'ggplot2' from Popular Culture
(with Matthew B. Jané and Luke C. Pilling)
Provides 'ggplot2' themes that mirror works from popular culture, such as Barbie, Star Wars, Game of Thrones, and others. The package currently holds 14 themes and a number of corresponding discrete color scales, palettes, and fonts. Each theme (e.g., 'theme_barbie') generates a unique color scheme and font for a 'ggplot2' object that matches the color scheme and font found in the movie, TV show, or video game.
causaltbl: Tidy Causal Data Frames and Tools
(with Cory McCartan)
Provides a 'causal_tbl' class for causal inference. A 'causal_tbl' keeps track of information on the roles of variables like treatment and outcome, and provides functionality to store models and their fitted values as columns in a data frame.