Japanese Sound Marks (Dakuten and Handakuten) - 17.2

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Quality
Trillium > Trillium Discovery
Version
17.2
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
First publish date
2008
Last updated
2024-07-01
Published on
2024-07-01T08:56:48.630530

In Japanese there is a set of sound marks that are used in combination with other characters, called dakuten (voiced sound mark) and handakuten (semi-voiced sound mark). They are similar to diacritics in European languages and used only in kana (hiragana and katakana).

Dakuten Handakuten

When sound marks are used with base characters, they can be displayed in two different forms—combining form and spacing form. The combining forms are generally considered the standard forms, but the spacing forms are also provided for compatibility with other encodings.

  • Combining form. The sounds marks can be merged with the preceding base characters to create regular dakuon characters. The sound marks in this form are control characters called combining marks.
  • Spacing form. The sound marks cannot be merged with the base characters and are shown as separate characters. The sound marks in this form are called spacing marks. Spacing marks have their own fonts.

Example

  Combining form (merged) Spacing form
Characters with dakuten ガ ギ グ ゲ ゴ カ ゛  キ ゛ ク ゛ ケ ゛ コ ゛
Characters with handakuten パ ピ プ ペ ポ  ハ ゚   ヒ ゚  フ ゚  ヘ ゚  ホ ゚

The following Quality functions are provided to merge or separate the sound marks: JCOMBINE, JCOMPOSE, JDECOMPOSE, and JSMARK. JCOMBINE and JCOMPOSE are used together to merge the sound marks. JCOMBINE converts the sound marks from their spacing form to the combining marks. JCOMPOSE merges the combining marks with the base character to create dakuten characters. Using JCOMPOSE after JCOMBINE guarantees that the maximal number of combinations will be performed.

JDECOMPOSE and JSMARK are used together to separate the sound marks. JDECONPOSE breaks combining characters into their constituent parts: a normal letter followed by a combining mark. JSMARK transforms the combining marks to the spacing form. Using the JSMARK after the JDECOMPOSE will ensure that the sound marks in the combining form are properly transformed to the spacing form.

Guidelines

Note the following guidelines when working with the functions for Japanese sound marks:

  • Not all combinations of letters and sound marks have a corresponding composite character. The sound marks cannot be merged with some characters, such as 'ア.'
  • If you use Shift-JIS, CP932, or EUC-JP encoding for your data, some sound marks may not be processed correctly during an export. To avoid this issue, convert the attributes to UCS2 or UTF8 in the first Transformer and then convert it back to the original encoding in the last process in the flow.