Lindera Node.js

Lindera Node.js provides Node.js bindings for the Lindera morphological analysis engine, built with NAPI-RS. It brings Lindera's high-performance tokenization capabilities to the Node.js ecosystem with support for Node.js 18 and later.

Features

Multi-language support: Tokenize Japanese (IPADIC, IPADIC NEologd, UniDic), Korean (ko-dic), and Chinese (CC-CEDICT, Jieba) text
Text processing pipeline: Compose character filters and token filters for flexible preprocessing and postprocessing
CRF-based dictionary training: Train custom morphological analysis models from annotated corpora (requires train feature)
Multiple tokenization modes: Normal and decompose modes for different analysis granularity
N-best tokenization: Retrieve multiple tokenization candidates ranked by cost
User dictionaries: Extend system dictionaries with custom vocabulary
TypeScript support: Full type definitions included out of the box

Documentation

Installation -- Prerequisites, build instructions, and feature flags
Quick Start -- A minimal example to get started
Tokenizer API -- TokenizerBuilder, Tokenizer, and Token class reference
Dictionary Management -- Loading, building, and managing dictionaries
Text Processing Pipeline -- Character filters and token filters
Training -- Training custom CRF models and exporting dictionaries

Lindera Documentation

Lindera Node.js

Features

Documentation