Skip to contents

Decomposes a BCP 47 language tag into its constituent subtags following the syntax defined in RFC 5646. Both hyphen (-) and underscore (_) are accepted as subtag separators.

Usage

bcp_parse(tag)

Arguments

tag

A character scalar BCP 47 language tag.

Value

A named list with the following elements:

language

The primary language subtag (e.g., "en", "zh"), or NA for a pure private-use tag.

extlang

A character vector of extended language subtags (three-letter codes following the primary language), or NULL.

script

The four-letter script subtag (e.g., "latn", "hans"), or NA if absent.

region

The two-letter or three-digit region subtag (e.g., "us", "419"), or NA if absent.

variants

A character vector of variant subtags, or NULL.

extensions

A named list of extension subtag sequences, keyed by the single-letter extension singleton.

private

A character vector of private-use subtags (following x-), or NULL.

All subtags are returned in lower-case.

Examples

bcp_parse('en-US')
#> $language
#> [1] "en"
#> 
#> $extlang
#> NULL
#> 
#> $script
#> [1] NA
#> 
#> $region
#> [1] "us"
#> 
#> $variants
#> NULL
#> 
#> $extensions
#> list()
#> 
#> $private
#> NULL
#> 
bcp_parse('zh-Hans-CN')
#> $language
#> [1] "zh"
#> 
#> $extlang
#> NULL
#> 
#> $script
#> [1] "hans"
#> 
#> $region
#> [1] "cn"
#> 
#> $variants
#> NULL
#> 
#> $extensions
#> list()
#> 
#> $private
#> NULL
#> 
bcp_parse('de-1901')
#> $language
#> [1] "de"
#> 
#> $extlang
#> NULL
#> 
#> $script
#> [1] NA
#> 
#> $region
#> [1] NA
#> 
#> $variants
#> [1] "1901"
#> 
#> $extensions
#> list()
#> 
#> $private
#> NULL
#> 
bcp_parse('x-private')
#> $language
#> [1] NA
#> 
#> $extlang
#> NULL
#> 
#> $script
#> [1] NA
#> 
#> $region
#> [1] NA
#> 
#> $variants
#> NULL
#> 
#> $extensions
#> list()
#> 
#> $private
#> [1] "private"
#>