Transliterator - dataflow_designer - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
Overview
Reference
Tips
How Do I
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265

Transliterator converts a string between Latin and other scripts. For example:

Source Transliteration
Transliteration-Katakana

kyanpasu

Αλφαβητικός Κατάλογος

биологическом

biologichyeskom

It is important to note that transliteration is not translation. Rather, transliteration is the conversion of letters from one script to another without translating the underlying words.

Note: Standard transliteration methods often do not follow the pronunciation rules of any particular language in the target script.
The Transliterator stage supports these scripts. In general, the Transliterator stage follows the UNGEGN Working Group on Romanization Systems guidelines. For more information, see www.eki.ee/wgrs.
Arabic
The script used by several Asian and African languages, including Arabic, Persian, and Urdu.
Cyrillic
The script used by Eastern European and Asian languages, including Slavic languages such as Russian. The Transliterator stage generally follows ISO 9 for the base Cyrillic set.
Devanagari
The script used by several Indian languages, including Hindi and Sanskrit. This script is a descendent of the Brahmi script which is one of the oldest writing systems used in Ancient India and present South and Central Asia.
Greek
The script used by the Greek language. This script belongs to the Hellenic branch of the Indo-European language family.
Gujarati
The script used by the state of Gujarat in western India. It is one of the modern scripts of India which was adapted from the Devanagari script.
Gurmukhi
The script used by Indian language Punjabi. This script has a considerable influence from Nagari script which is an earlier form of the Devanagari script.
Hangul
The script used by the Korean language. The Transliterator stage follows the Korean Ministry of Culture and Tourism Transliteration regulations. For more information, see the website of The National Institute of the Korean Language.
Han
The script used by Chinese language. It is a branch of the Tibetan-Burman language family and has been written with scripts based on Thai and Chinese.
Traditional/Simplified Chinese

The Transliterator stage supports both traditional and simplified Chinese. For example, this is Traditional Chinese: Traditional Chinese string. This is Simplified Chinese: Simplified Chinese string

Kannada
The script used by several South Indian languages, such as Konkani. This script is a descendent of Brahmi script of ancient India.
Katakana and Hiragana
One of several scripts that can be used to write Japanese. The Transliterator stage uses a slight variant of the Hepburn system. With Hepburn system, both ZI (ZI character) and DI (DI character) are represented by "ji" and both ZU (ZU character) and DU (DU character) are represented by "zu". This is amended slightly for reversibility by using "dji" for DI and "dzu" for DU. The Katakana transliteration is reversible. Hiragana-Katakana transliteration is not completely reversible since there are several Katakana letters that do not have corresponding Hiragana equivalents. Also, the length mark is not used with Hiragana. The Hiragana-Latin transliteration is also not reversible since internally it is a combination of Katakana-Hiragana and Hiragana-Latin.
Half width/Full width
The Transliterator stage can convert between narrow half-width scripts and wider full-width scripts. For example, this is half-width: half-width string. This is full-width: full-width string.
Latin
The script used by most languages of Europe, such as English. It was originally used by the ancient Romans to write the Latin language.
Malayalam
The script used by the Malayalam language, the official language of the Indian state of Kerala. This script was first written with the Vatteluttu alphabet which means 'round writing' and developed from the Brahmi script of ancient India.
Oriya
The script used by the Oriya language, the official language of the Indian state of Odisha. The Oriya script was developed from the Kalinga script, one of the many descendents of the Brahmi script of ancient India.
Tamil
The script used by the Tamil language in several states of India, Sri Lanka, and Malaysia. This script was originally written with a version of the Brahmi script known as Tamil Brahmi.
Telugu
The script used by several languages of South India. This script is a descendent of Brahmi script of ancient India.
Thai
The script used by Thai language. This script is influenced by the Brahmi script of ancient India and the Khmer alphabets.

Transliterator is a part of Data Normalization. For a listing of other stages, see Spectrum Data Normalization.