Skip to contents

Downloads and parses the IANA Language Subtag Registry into a tidy data frame. Each row represents one registry entry (language, extlang, script, region, variant, grandfathered, or redundant tag). Columns correspond to registry fields such as type, subtag, description, added, preferred_value, and suppress_script. When an entry has multiple values for a single field (e.g., multiple Description lines), they are joined with ";".

Usage

bcp_process_registry(
  url =
    "https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry"
)

Arguments

url

A character scalar giving the URL of the registry plain-text file. Defaults to the official IANA location.

Value

A tibble with one row per registry entry and one column per field. The last_update attribute records the File-Date from the registry header.

Examples

reg <- bcp_process_registry()
head(reg)
#> # A tibble: 6 × 12
#>   type     subtag description added suppress_script scope macrolanguage comments
#>   <chr>    <chr>  <chr>       <chr> <chr>           <chr> <chr>         <chr>   
#> 1 language aa     Afar        2005… NA              NA    NA            NA      
#> 2 language ab     Abkhazian   2005… Cyrl            NA    NA            NA      
#> 3 language ae     Avestan     2005… NA              NA    NA            NA      
#> 4 language af     Afrikaans   2005… Latn            NA    NA            NA      
#> 5 language ak     Akan        2005… NA              macr… NA            NA      
#> 6 language am     Amharic     2005… Ethi            NA    NA            NA      
#> # ℹ 4 more variables: deprecated <chr>, preferred_value <chr>, prefix <chr>,
#> #   tag <chr>