Dictionary Management

Lindera Node.js provides functions for loading, building, and managing dictionaries used in morphological analysis.

Loading Dictionaries

System Dictionaries

Use loadDictionary(uri) to load a system dictionary. Download a pre-built dictionary from GitHub Releases and specify the path to the extracted directory:

const { loadDictionary } = require("lindera-nodejs");

const dictionary = loadDictionary("/path/to/ipadic");

Embedded dictionaries (advanced) -- if you built with an embed-* feature flag, you can load an embedded dictionary:

const dictionary = loadDictionary("embedded://ipadic");

User Dictionaries

User dictionaries add custom vocabulary on top of a system dictionary.

const { loadUserDictionary, Metadata } = require("lindera-nodejs");

const metadata = new Metadata();
const userDict = loadUserDictionary("/path/to/user_dictionary", metadata);

Pass the user dictionary when building a tokenizer:

const { Tokenizer, loadDictionary, loadUserDictionary, Metadata } = require("lindera-nodejs");

const dictionary = loadDictionary("/path/to/ipadic");
const metadata = new Metadata();
const userDict = loadUserDictionary("/path/to/user_dictionary", metadata);

const tokenizer = new Tokenizer(dictionary, "normal", userDict);

Or via the builder:

const { TokenizerBuilder } = require("lindera-nodejs");

const tokenizer = new TokenizerBuilder()
  .setDictionary("/path/to/ipadic")
  .setUserDictionary("/path/to/user_dictionary")
  .build();

Building Dictionaries

System Dictionary

Build a system dictionary from source files:

const { buildDictionary, Metadata } = require("lindera-nodejs");

const metadata = new Metadata({ name: "custom", encoding: "UTF-8" });
buildDictionary("/path/to/input_dir", "/path/to/output_dir", metadata);

The input directory should contain the dictionary source files (CSV lexicon, matrix.def, etc.).

User Dictionary

Build a user dictionary from a CSV file:

const { buildUserDictionary, Metadata } = require("lindera-nodejs");

const metadata = new Metadata();
buildUserDictionary("ipadic", "user_words.csv", "/path/to/output_dir", metadata);

The metadata parameter is optional. When omitted, default metadata values are used:

buildUserDictionary("ipadic", "user_words.csv", "/path/to/output_dir");

Metadata

The Metadata class configures dictionary parameters.

Creating Metadata

const { Metadata } = require("lindera-nodejs");

// Default metadata
const metadata = new Metadata();

// Custom metadata
const metadata = new Metadata({
  name: "my_dictionary",
  encoding: "UTF-8",
  defaultWordCost: -10000,
});

Loading from JSON

const metadata = Metadata.fromJsonFile("metadata.json");

Properties

Property	Type	Default	Description
`name`	`string`	`"default"`	Dictionary name
`encoding`	`string`	`"UTF-8"`	Character encoding
`defaultWordCost`	`number`	`-10000`	Default cost for unknown words
`defaultLeftContextId`	`number`	`1288`	Default left context ID
`defaultRightContextId`	`number`	`1288`	Default right context ID
`defaultFieldValue`	`string`	`"*"`	Default value for missing fields
`flexibleCsv`	`boolean`	`false`	Allow flexible CSV parsing
`skipInvalidCostOrId`	`boolean`	`false`	Skip entries with invalid cost or ID
`normalizeDetails`	`boolean`	`false`	Normalize morphological details
`dictionarySchema`	`Schema`	IPADIC schema	Schema for the main dictionary
`userDictionarySchema`	`Schema`	Minimal schema	Schema for user dictionaries

All properties support both getting and setting:

const metadata = new Metadata();
metadata.name = "custom_dict";
metadata.encoding = "EUC-JP";
console.log(metadata.name); // "custom_dict"

`toObject()`

Returns a plain object representation of the metadata:

const metadata = new Metadata({ name: "test" });
console.log(metadata.toObject());

Schema

The Schema class defines the field structure of dictionary entries.

Creating a Schema

const { Schema } = require("lindera-nodejs");

// Default IPADIC-compatible schema
const schema = Schema.createDefault();

// Custom schema
const custom = new Schema(["surface", "left_id", "right_id", "cost", "pos", "reading"]);

Schema Methods

Method	Returns	Description
`getFieldIndex(name)`	`number \| null`	Get field index by name
`fieldCount()`	`number`	Total number of fields
`getFieldName(index)`	`string \| null`	Get field name by index
`getCustomFields()`	`string[]`	Fields beyond index 4 (morphological features)
`getAllFields()`	`string[]`	All field names
`getFieldByName(name)`	`FieldDefinition \| null`	Get full field definition
`validateRecord(record)`	`void`	Validate a CSV record against the schema

const schema = Schema.createDefault();

console.log(schema.fieldCount());           // 13 (IPADIC format)
console.log(schema.getFieldIndex("pos1"));  // e.g., 4
console.log(schema.getAllFields());          // ["surface", "left_id", ...]
console.log(schema.getCustomFields());      // Fields after index 4

FieldDefinition

Property	Type	Description
`index`	`number`	Field position index
`name`	`string`	Field name
`fieldType`	`FieldType`	Field type enum
`description`	`string \| undefined`	Optional description

FieldType

Value	Description
`FieldType.Surface`	Word surface text
`FieldType.LeftContextId`	Left context ID
`FieldType.RightContextId`	Right context ID
`FieldType.Cost`	Word cost
`FieldType.Custom`	Morphological feature field

Lindera Documentation