# `Text.Language`
[🔗](https://github.com/kipcole9/text/blob/v0.6.1/lib/language.ex#L1)

Language tag utilities used across the package.

Every function in `text` that takes a "language" option accepts:

* an atom (`:fr`, `:zh`),

* a string (`"fr"`, `"fr-CA"`, `"zh-Hans-CN"`),

* or a `Localize.LanguageTag` struct, when the optional
  [`localize`](https://hex.pm/packages/localize) dependency is
  available.

This module provides the normalisation helpers that unify those
shapes so the call sites remain simple.

### `normalize/1` — to a language-subtag atom

Most internal lookups (sentiment lexicons, classifier outputs, …)
key on the bare ISO 639-1 language subtag. `normalize/1` extracts
that subtag from any of the accepted shapes:

    iex> Text.Language.normalize(:fr)
    :fr

    iex> Text.Language.normalize("fr-CA")
    :fr

    iex> Text.Language.normalize("ZH-Hans-CN")
    :zh

### `to_locale_string/1` — to a BCP-47 string

Some downstream APIs (CLDR-aware tokenisation, locale-aware
formatting) want the full BCP-47 form. `to_locale_string/1` produces
a normalised string suitable for passing to `unicode_string`,
`localize`, etc.

    iex> Text.Language.to_locale_string(:fr)
    "fr"

    iex> Text.Language.to_locale_string("fr_CA")
    "fr-CA"

# `input`

```elixir
@type input() :: atom() | String.t() | struct()
```

Anything `normalize/1` and `to_locale_string/1` accept.

When `:localize` is available, also includes `Localize.LanguageTag`
structs.

# `normalize`

```elixir
@spec normalize(input()) :: atom()
```

Returns the language subtag of `input` as a lowercase atom.

### Arguments

* `input` is one of the accepted shapes — atom, string, or (when
  `:localize` is loaded) a `Localize.LanguageTag` struct.

### Returns

* An atom — the language subtag of the input (e.g. `:fr` for
  `"fr-CA"` or a `LanguageTag` whose language is `:fr`).

### Examples

    iex> Text.Language.normalize(:fr)
    :fr

    iex> Text.Language.normalize("fr-CA")
    :fr

    iex> Text.Language.normalize("FR")
    :fr

# `to_locale_string`

```elixir
@spec to_locale_string(input()) :: String.t()
```

Returns a normalised BCP-47 locale string for `input`.

Splits on `_` (Java-style separator) as well as `-` and joins the
subtags with `-`. The language subtag is lowercased; subsequent
subtags are passed through unchanged. For a `Localize.LanguageTag`
the canonical id is used when present, otherwise the
language/script/territory triple is composed.

### Arguments

* `input` is one of the accepted shapes.

### Returns

* A `t:String.t/0`.

### Examples

    iex> Text.Language.to_locale_string(:fr)
    "fr"

    iex> Text.Language.to_locale_string("fr_CA")
    "fr-CA"

    iex> Text.Language.to_locale_string("ZH-Hans-CN")
    "zh-Hans-CN"

---

*Consult [api-reference.md](api-reference.md) for complete listing*
