mix text.download_lemma_data (Text v0.6.1)

Copy Markdown View Source

Downloads lemmatization dictionaries from the michmech/lemmatization-lists upstream and places them in the configured Text.Data cache so Text.Lemma can load them with no further network access.

Useful when you don't want to enable config :text, auto_download_lemma_data: true (which would let the package fetch on first lookup) but still want the data available locally — e.g. as a build step in CI, or to ship the dictionary alongside a release.

Source data is licensed under the Open Database License (ODbL) by Michal Boleslav Měchura. The download includes the upstream license file alongside the per-language .txt files.

Usage

# Download a single language pack.
mix text.download_lemma_data de

# Download several at once.
mix text.download_lemma_data de fr es it pt

# List the languages this task knows about.
mix text.download_lemma_data --list

Options

  • --force — re-downloads even when the file is already cached.

  • --list — prints the language codes available upstream and exits without downloading anything.

Where files land

Files are written to Text.Data.data_dir()/lemma/lemmatization-<lang>.txt, by default ~/.cache/text/lemma/. Override the cache location with:

config :text, data_dir: "/path/to/cache"