Defining Culture RegEx Tags - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Product family
Spectrum > Quality > Spectrum Quality
Product name
Spectrum Data Quality
Spectrum Data Quality Guide
Topic type
How Do I
First publish date

This topic describes how to define culture RegEx tags when defining a culture-specific parsing grammar.

  1. In Enterprise Designer, go to Tools > Open Parser Domain Editor.
  2. Click the Cultures tab. The Cultures tab displays a list of supported cultures. For a complete list of supported cultures, see Assigning a Parsing Culture to a Record.
  3. Select a culture from the list and then click Properties. The Culture Properties dialog box displays.
  4. Click the RegEx Tags tab. The information displayed includes the RegEx tag names defined for the selected culture and the associated source culture, the value of the RegEx tag, and the description.
  5. Click Add or Modify.
  6. Type a name for the RegEx tag in the Name text box.

    If you type a name that already exists in the selected culture, a warning icon flashes. Type a different name or close the dialog box, delete the existing RegEx tag, and then click Add again.

  7. Type a description of the RegEx tag in the Description text box.
  8. Type a value for the RegEx tag in the Value text box.

    The value can be any valid regular expression but cannot match an empty string.

    Domain Editor includes several predefined RegEx tags that you can use to define culture properties. You can also use these RegEx tags for defining tokenization characters in your parsing grammar.

    You can modify the predefined RegEx tags or copy them and create your own variants. You can also use override properties to create specialized RegEx tags for specific languages.

    • Letter: Any letter from any language. This RegEx tag includes overrides for several languages due to differences in scripts used, for example, Cyrillic scripts, Asian-language scripts, and Thai script.
    • Lower: A lowercase letter that has an uppercase variant.
    • Number: Any numeric character in any script.
    • Punctuation: Any punctuation character.
    • Upper: An uppercase letter that has a lowercase variant.
    • White space: Any white space or invisible separator.
  9. Click OK.