Feature Flags

Lindera uses Cargo feature flags to control optional functionality and dictionary embedding.

Core Features

FeatureDescriptionDefault
mmapMemory-mapped file supportYes
trainCRF-based dictionary training (depends on lindera-crf)CLI only
  • mmap is enabled by default in the main lindera crate.
  • train is enabled by default only in lindera-cli. For library usage, enable it explicitly with --features train.

The recommended approach is to use pre-built dictionaries as external files. Download a dictionary from GitHub Releases and specify its path at runtime:

#![allow(unused)]
fn main() {
let dictionary = load_dictionary("/path/to/ipadic")?;
}

No additional feature flags are required for this usage.

Dictionary Embedding Features (Advanced)

These features embed pre-built dictionaries directly into the binary, eliminating the need for external dictionary files at runtime. This is intended for advanced users who need self-contained binaries.

FeatureDictionaryLanguage
embed-ipadicIPADICJapanese
embed-ipadic-neologdIPADIC NEologdJapanese
embed-unidicUniDicJapanese
embed-ko-dicko-dicKorean
embed-cc-cedictCC-CEDICTChinese
embed-jiebaJiebaChinese

None of these are enabled by default. Enable them as needed:

[dependencies]
lindera = { version = "2.3.2", features = ["embed-ipadic"] }

When embedding is enabled, you can load the dictionary with:

#![allow(unused)]
fn main() {
let dictionary = load_dictionary("embedded://ipadic")?;
}

Combination Features

These meta-features enable multiple dictionaries at once for multilingual applications.

FeatureIncluded Dictionaries
embed-cjkIPADIC + ko-dic + Jieba
embed-cjk2UniDic + ko-dic + Jieba
embed-cjk3IPADIC NEologd + ko-dic + Jieba

Combining Feature Flags

Multiple feature flags can be combined. For example, to embed both Japanese and Korean dictionaries:

[dependencies]
lindera = { version = "2.3.2", features = ["embed-ipadic", "embed-ko-dic"] }

Or from the command line:

cargo build --features embed-ipadic,embed-ko-dic

Notes

  • Embedding dictionaries increases binary size significantly. Only embed dictionaries you actually need.
  • The train feature adds a dependency on lindera-crf and increases compile time. It is not needed for tokenization-only use cases.
  • The mmap feature enables memory-mapped dictionary loading, which reduces memory usage for large dictionaries loaded from disk. It has no effect on embedded dictionaries.