Text.Language (Text v0.6.1)

Copy Markdown View Source

Language tag utilities used across the package.

Every function in text that takes a "language" option accepts:

  • an atom (:fr, :zh),

  • a string ("fr", "fr-CA", "zh-Hans-CN"),

  • or a Localize.LanguageTag struct, when the optional localize dependency is available.

This module provides the normalisation helpers that unify those shapes so the call sites remain simple.

normalize/1 — to a language-subtag atom

Most internal lookups (sentiment lexicons, classifier outputs, …) key on the bare ISO 639-1 language subtag. normalize/1 extracts that subtag from any of the accepted shapes:

iex> Text.Language.normalize(:fr)
:fr

iex> Text.Language.normalize("fr-CA")
:fr

iex> Text.Language.normalize("ZH-Hans-CN")
:zh

to_locale_string/1 — to a BCP-47 string

Some downstream APIs (CLDR-aware tokenisation, locale-aware formatting) want the full BCP-47 form. to_locale_string/1 produces a normalised string suitable for passing to unicode_string, localize, etc.

iex> Text.Language.to_locale_string(:fr)
"fr"

iex> Text.Language.to_locale_string("fr_CA")
"fr-CA"

Summary

Functions

Returns the language subtag of input as a lowercase atom.

Returns a normalised BCP-47 locale string for input.

Types

input()

@type input() :: atom() | String.t() | struct()

Anything normalize/1 and to_locale_string/1 accept.

When :localize is available, also includes Localize.LanguageTag structs.

Functions

normalize(atom)

@spec normalize(input()) :: atom()

Returns the language subtag of input as a lowercase atom.

Arguments

  • input is one of the accepted shapes — atom, string, or (when :localize is loaded) a Localize.LanguageTag struct.

Returns

  • An atom — the language subtag of the input (e.g. :fr for "fr-CA" or a LanguageTag whose language is :fr).

Examples

iex> Text.Language.normalize(:fr)
:fr

iex> Text.Language.normalize("fr-CA")
:fr

iex> Text.Language.normalize("FR")
:fr

to_locale_string(atom)

@spec to_locale_string(input()) :: String.t()

Returns a normalised BCP-47 locale string for input.

Splits on _ (Java-style separator) as well as - and joins the subtags with -. The language subtag is lowercased; subsequent subtags are passed through unchanged. For a Localize.LanguageTag the canonical id is used when present, otherwise the language/script/territory triple is composed.

Arguments

  • input is one of the accepted shapes.

Returns

Examples

iex> Text.Language.to_locale_string(:fr)
"fr"

iex> Text.Language.to_locale_string("fr_CA")
"fr-CA"

iex> Text.Language.to_locale_string("ZH-Hans-CN")
"zh-Hans-CN"