OPFS Dictionary Storage

Lindera WASM provides OPFS (Origin Private File System) helper utilities for persistent dictionary caching in web browsers. This allows you to download dictionaries once and reuse them across sessions without embedding them in the WASM binary.

Overview

The OPFS helpers are distributed as a separate JavaScript module (opfs.js) alongside the WASM package. They provide functions to download, store, load, and manage dictionaries using the browser's Origin Private File System.

Dictionaries are stored under the OPFS path lindera/dictionaries/<name>/.

Import

import { downloadDictionary, loadDictionaryFiles, removeDictionary,
         listDictionaries, hasDictionary } from 'lindera-wasm-web/opfs';

Functions

`downloadDictionary(url, name, options?)`

Downloads a dictionary zip archive, extracts it, and stores the files in OPFS.

The archive should be a zip file containing the 8 required dictionary files, optionally nested in a subdirectory.

Parameters:
- url (string) -- URL of the dictionary zip archive
- name (string) -- Name to store the dictionary under (e.g., "ipadic")
- options (object, optional):
  - onProgress (function) -- Progress callback
Returns: Promise<void>

await downloadDictionary(
    "https://example.com/ipadic.zip",
    "ipadic",
    {
        onProgress: (progress) => {
            switch (progress.phase) {
                case "downloading":
                    console.log(`Downloading: ${progress.loaded}/${progress.total} bytes`);
                    break;
                case "extracting":
                    console.log("Extracting archive...");
                    break;
                case "storing":
                    console.log("Storing in OPFS...");
                    break;
                case "complete":
                    console.log("Done!");
                    break;
            }
        },
    },
);

Progress Callback

The onProgress callback receives an object with the following shape:

Property	Type	Description
`phase`	`string`	`"downloading"`, `"extracting"`, `"storing"`, or `"complete"`
`loaded`	`number \| undefined`	Bytes downloaded (only during `"downloading"` phase)
`total`	`number \| undefined`	Total bytes if known (only during `"downloading"` phase)

`loadDictionaryFiles(name)`

Loads dictionary files from OPFS as an object of Uint8Array values.

The returned object can be passed directly to loadDictionaryFromBytes().

Parameters: name (string) -- The dictionary name (e.g., "ipadic")
Returns: Promise<DictionaryFiles>

const files = await loadDictionaryFiles("ipadic");

DictionaryFiles

Property	Type	Source File
`metadata`	`Uint8Array`	`metadata.json`
`dictDa`	`Uint8Array`	`dict.da` (Double-Array Trie)
`dictVals`	`Uint8Array`	`dict.vals` (word value data)
`dictWordsIdx`	`Uint8Array`	`dict.wordsidx` (word details index)
`dictWords`	`Uint8Array`	`dict.words` (word details)
`matrixMtx`	`Uint8Array`	`matrix.mtx` (connection cost matrix)
`charDef`	`Uint8Array`	`char_def.bin` (character definitions)
`unk`	`Uint8Array`	`unk.bin` (unknown word dictionary)

`removeDictionary(name)`

Removes a dictionary from OPFS.

Parameters: name (string) -- The dictionary name to remove
Returns: Promise<void>

await removeDictionary("ipadic");

`listDictionaries()`

Lists all dictionaries stored in OPFS.

Returns: Promise<string[]> -- Array of dictionary names

const names = await listDictionaries();
console.log(names); // e.g., ["ipadic", "unidic"]

`hasDictionary(name)`

Checks if a dictionary exists in OPFS.

Parameters: name (string) -- The dictionary name to check
Returns: Promise<boolean>

if (await hasDictionary("ipadic")) {
    console.log("Dictionary is cached");
}

Complete Workflow

A typical workflow for using OPFS-based dictionaries:

import __wbg_init, { TokenizerBuilder, loadDictionaryFromBytes } from 'lindera-wasm-web';
import { downloadDictionary, loadDictionaryFiles, hasDictionary } from 'lindera-wasm-web/opfs';

async function main() {
    await __wbg_init();

    const DICT_NAME = "ipadic";
    const DICT_URL = "https://github.com/lindera/lindera/releases/download/<version>/lindera-ipadic-<version>.zip";

    // Download dictionary if not already cached
    if (!await hasDictionary(DICT_NAME)) {
        await downloadDictionary(DICT_URL, DICT_NAME, {
            onProgress: ({ phase, loaded, total }) => {
                if (phase === "downloading" && total) {
                    console.log(`${(loaded / total * 100).toFixed(1)}%`);
                }
            },
        });
    }

    // Load dictionary from OPFS
    const files = await loadDictionaryFiles(DICT_NAME);
    const dictionary = loadDictionaryFromBytes(
        files.metadata, files.dictDa, files.dictVals, files.dictWordsIdx,
        files.dictWords, files.matrixMtx, files.charDef, files.unk,
    );

    // Build tokenizer
    const builder = new TokenizerBuilder();
    builder.setDictionaryInstance(dictionary);
    builder.setMode("normal");
    const tokenizer = builder.build();

    // Tokenize
    const tokens = tokenizer.tokenize("形態素解析を行います");
    tokens.forEach(token => {
        console.log(`${token.surface}\t${token.details.join(',')}`);
    });
}

main();

Required Dictionary Files

A valid dictionary archive must contain these 8 files:

File	Description
`metadata.json`	Dictionary metadata (name, encoding, schema, etc.)
`dict.da`	Double-Array Trie structure
`dict.vals`	Word value data
`dict.wordsidx`	Word details index
`dict.words`	Word details (morphological features)
`matrix.mtx`	Connection cost matrix
`char_def.bin`	Character category definitions
`unk.bin`	Unknown word dictionary

Browser Compatibility

OPFS requires a secure context (HTTPS or localhost) and is supported in:

Chrome 86+
Edge 86+
Firefox 111+
Safari 15.2+

The zip extraction uses the DecompressionStream API, which requires:

Chrome 80+
Edge 80+
Firefox 113+
Safari 16.4+

Lindera Documentation