Lindera ko-dic

Dictionary version

This repository contains mecab-ko-dic.

Dictionary format

Information about the dictionary format and part-of-speech tags used by mecab-ko-dic id documented in this Google Spreadsheet, linked to from mecab-ko-dic's repository readme.

Note how ko-dic has one less feature column than NAIST JDIC, and has an altogether different set of information (e.g. doesn't provide the "original form" of the word).

The tags are a slight modification of those specified by 세종 (Sejong), whatever that is. The mappings from Sejong to mecab-ko-dic's tag names are given in tab 태그 v2.0 on the above-linked spreadsheet.

The dictionary format is specified fully (in Korean) in tab 사전 형식 v2.0 of the spreadsheet. Any blank values default to *.

IndexName (Korean)Name (English)Notes
0표면Surface
1왼쪽 문맥 IDLeft context ID
2오른쪽 문맥 IDRight context ID
3비용Cost
4품사 태그Part-of-speech tagSee 태그 v2.0 tab on spreadsheet
5의미 부류Meaning(too few examples for me to be sure)
6종성 유무Presence or absenceT for true; F for false; else *
7읽기Readingusually matches surface, but may differ for foreign words e.g. Chinese character words
8타입TypeOne of: Inflect (활용); Compound (복합명사); or Preanalysis (기분석)
9첫번째 품사First part-of-speeche.g. given a part-of-speech tag of "VV+EM+VX+EP", would return VV
10마지막 품사Last part-of-speeche.g. given a part-of-speech tag of "VV+EM+VX+EP", would return EP
11표현Expression활용, 복합명사, 기분석이 어떻게 구성되는지 알려주는 필드 – Fields that tell how usage, compound nouns, and key analysis are organized

User dictionary format (CSV)

Simple version

IndexName (Japanese)Name (English)Notes
0표면Surface
1품사 태그part-of-speech tagSee 태그 v2.0 tab on spreadsheet
2읽기readingusually matches surface, but may differ for foreign words e.g. Chinese character words

Detailed version

IndexName (Korean)Name (English)Notes
0표면Surface
1왼쪽 문맥 IDLeft context ID
2오른쪽 문맥 IDRight context ID
3비용Cost
4품사 태그part-of-speech tagSee 태그 v2.0 tab on spreadsheet
5의미 부류meaning(too few examples for me to be sure)
6종성 유무presence or absenceT for true; F for false; else *
7읽기readingusually matches surface, but may differ for foreign words e.g. Chinese character words
8타입typeOne of: Inflect (활용); Compound (복합명사); or Preanalysis (기분석)
9첫번째 품사first part-of-speeche.g. given a part-of-speech tag of "VV+EM+VX+EP", would return VV
10마지막 품사last part-of-speeche.g. given a part-of-speech tag of "VV+EM+VX+EP", would return EP
11표현expression활용, 복합명사, 기분석이 어떻게 구성되는지 알려주는 필드 – Fields that tell how usage, compound nouns, and key analysis are organized
12--After 12, it can be freely expanded.

API reference

The API reference is available. Please see following URL: