Dictionary Management

Lindera Node.js provides functions for loading, building, and managing dictionaries used in morphological analysis.

Loading Dictionaries

System Dictionaries

Use loadDictionary(uri) to load a system dictionary. Download a pre-built dictionary from GitHub Releases and specify the path to the extracted directory:

const { loadDictionary } = require("lindera-nodejs");

const dictionary = loadDictionary("/path/to/ipadic");

Embedded dictionaries (advanced) -- if you built with an embed-* feature flag, you can load an embedded dictionary:

const dictionary = loadDictionary("embedded://ipadic");

User Dictionaries

User dictionaries add custom vocabulary on top of a system dictionary.

const { loadUserDictionary, Metadata } = require("lindera-nodejs");

const metadata = new Metadata();
const userDict = loadUserDictionary("/path/to/user_dictionary", metadata);

Pass the user dictionary when building a tokenizer:

const { Tokenizer, loadDictionary, loadUserDictionary, Metadata } = require("lindera-nodejs");

const dictionary = loadDictionary("/path/to/ipadic");
const metadata = new Metadata();
const userDict = loadUserDictionary("/path/to/user_dictionary", metadata);

const tokenizer = new Tokenizer(dictionary, "normal", userDict);

Or via the builder:

const { TokenizerBuilder } = require("lindera-nodejs");

const tokenizer = new TokenizerBuilder()
  .setDictionary("/path/to/ipadic")
  .setUserDictionary("/path/to/user_dictionary")
  .build();

Building Dictionaries

System Dictionary

Build a system dictionary from source files:

const { buildDictionary, Metadata } = require("lindera-nodejs");

const metadata = new Metadata({ name: "custom", encoding: "UTF-8" });
buildDictionary("/path/to/input_dir", "/path/to/output_dir", metadata);

The input directory should contain the dictionary source files (CSV lexicon, matrix.def, etc.).

User Dictionary

Build a user dictionary from a CSV file:

const { buildUserDictionary, Metadata } = require("lindera-nodejs");

const metadata = new Metadata();
buildUserDictionary("ipadic", "user_words.csv", "/path/to/output_dir", metadata);

The metadata parameter is optional. When omitted, default metadata values are used:

buildUserDictionary("ipadic", "user_words.csv", "/path/to/output_dir");

Metadata

The Metadata class configures dictionary parameters.

Creating Metadata

const { Metadata } = require("lindera-nodejs");

// Default metadata
const metadata = new Metadata();

// Custom metadata
const metadata = new Metadata({
  name: "my_dictionary",
  encoding: "UTF-8",
  defaultWordCost: -10000,
});

Loading from JSON

const metadata = Metadata.fromJsonFile("metadata.json");

Properties

PropertyTypeDefaultDescription
namestring"default"Dictionary name
encodingstring"UTF-8"Character encoding
defaultWordCostnumber-10000Default cost for unknown words
defaultLeftContextIdnumber1288Default left context ID
defaultRightContextIdnumber1288Default right context ID
defaultFieldValuestring"*"Default value for missing fields
flexibleCsvbooleanfalseAllow flexible CSV parsing
skipInvalidCostOrIdbooleanfalseSkip entries with invalid cost or ID
normalizeDetailsbooleanfalseNormalize morphological details
dictionarySchemaSchemaIPADIC schemaSchema for the main dictionary
userDictionarySchemaSchemaMinimal schemaSchema for user dictionaries

All properties support both getting and setting:

const metadata = new Metadata();
metadata.name = "custom_dict";
metadata.encoding = "EUC-JP";
console.log(metadata.name); // "custom_dict"

toObject()

Returns a plain object representation of the metadata:

const metadata = new Metadata({ name: "test" });
console.log(metadata.toObject());

Schema

The Schema class defines the field structure of dictionary entries.

Creating a Schema

const { Schema } = require("lindera-nodejs");

// Default IPADIC-compatible schema
const schema = Schema.createDefault();

// Custom schema
const custom = new Schema(["surface", "left_id", "right_id", "cost", "pos", "reading"]);

Schema Methods

MethodReturnsDescription
getFieldIndex(name)number | nullGet field index by name
fieldCount()numberTotal number of fields
getFieldName(index)string | nullGet field name by index
getCustomFields()string[]Fields beyond index 4 (morphological features)
getAllFields()string[]All field names
getFieldByName(name)FieldDefinition | nullGet full field definition
validateRecord(record)voidValidate a CSV record against the schema
const schema = Schema.createDefault();

console.log(schema.fieldCount());           // 13 (IPADIC format)
console.log(schema.getFieldIndex("pos1"));  // e.g., 4
console.log(schema.getAllFields());          // ["surface", "left_id", ...]
console.log(schema.getCustomFields());      // Fields after index 4

FieldDefinition

PropertyTypeDescription
indexnumberField position index
namestringField name
fieldTypeFieldTypeField type enum
descriptionstring | undefinedOptional description

FieldType

ValueDescription
FieldType.SurfaceWord surface text
FieldType.LeftContextIdLeft context ID
FieldType.RightContextIdRight context ID
FieldType.CostWord cost
FieldType.CustomMorphological feature field