Clue - trillium_discovery - trillium_quality - Latest

Trillium Parser Tuner

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Quality
Trillium > Trillium Discovery
Version
Latest
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Parser Tuner
Copyright
2024
First publish date
2008
Last updated
2024-10-18
Published on
2024-10-18T14:59:24.246276

Clue is used to store keywords that the Japanese Parser uses to separate input text into tokens and to determine business/personal classification. The following types are used for the Clue value.

Type Item

Description

T Business Type Words to describe business type. Example: 株式会社,(有)
N Business Name Parse as business name if this token is found at the beginning of the string (excluding business type).
E Business Name Suffix Words such as 病院, 学校.
D Branch Name It can be a branch name by itself. Example: 人事部,経理部
B Branch Name Suffix Usually this token is merged into the previous token and constitutes a branch name. Example: 支店, 営業所
C Business Keyword Words that can be part of business name or branch name. Example: データ, 建設
H Honorific Words for honorific. Example: 様, 殿
P Title (position) Words for Title. Example: 代表取締役, 公認会計士
R Region

Words for Region. Example: 東京, 長野

Format

Each entry consists of the following items:

'<Clue word entry in zenkaku>' att=clue type=<type> hankaku='<Clue word entry in hankaku>'

Example

'人事部' att=clue type D hankaku=''ジンジブ'

'(株) ' att=clue type T hankaku=''(カブ)'

Input data (株)アグレックス人事部
After token separation Business type (T) Unknown word Branch name (D)
  (株) アグレックス 人事部
Output data Business type Business name Branch name
  (株) アグレックス 人事部

In this case, "(株)" matches one of business types (T type), and "人事部" matches one of branch name keywords (D type), therefore the token type for the each word was determined. In the final output, the unknown word "アグレックス" was recognized as business name and each word was written out in the proper output attribute.

Note: When you add new entries, avoid adding duplicate words in different type.
Note: The only keyword that can include spaces is N type. If you add a keyword that includes spaces, delete all spaces before and after the entry and change all spaces within the entry to one hankaku space.