Substring Patterns - 17.1

Inline Quality and Discovery

Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Inline Quality and Discovery

You use substring patterns to identify specific data within an existing Business Data Parser (BDP) pattern. This is helpful when parsing large, free-form text fields for which you need to define only a subset of data. For example, you may have the line pattern "Alpha Alpha Alpha 1-Numeric Color Alpha1Special Numeric" and only want to define "Alpha 1-Numeric Color" to be included in the output.

How Substring Patterns Work

The BDP process looks up word combinations in the Customized Definitions table and matches them to patterns that are based on the sequence of categories (attribute types) on a row of data. When the Parser finds no match for a pattern it then checks for substring patterns. If the BDP finds a substring pattern, it defines a pattern based on a substring of the original line pattern. Any part of the line pattern not defined by a substring pattern will be written to the BP_MISC_DATA output attribute.

Note the following guidelines:

  • You add substring pattern entries in the Parser Tuner by manually entering them into the Customized Definitions table or by using the Quick Build or Build Entry tools. Substring pattern entries use the syntax SUBSTRING-PATTERN or SUB-PAT.
  • You can define more than one substring pattern for a line pattern.

For information about line types and how the ATT= keyword works, see Assigning a Line Type with a Pattern for Business Data and Attributes.

Substring Pattern Process

Use the following process to add a substring pattern to the BDP Parser Tuner:

How Substring Patterns Work

The BDP process looks up word combinations in the Customized Definitions table and matches them to patterns that are based on the sequence of categories (attribute types) on a row of data. When the Parser finds no match for a pattern it then checks for substring patterns. If the BDP finds a substring pattern, it defines a pattern based on a substring of the original line pattern. Any part of the line pattern not defined by a substring pattern will be written to the BP_MISC_DATA output attribute.

Note the following guidelines:

  • You add substring pattern entries in the Parser Tuner by manually entering them into the Customized Definitions table or by using the Quick Build or Build Entry tools. Substring pattern entries use the syntax SUBSTRING-PATTERN or SUB-PAT.
  • You can define more than one substring pattern for a line pattern.

For information about line types and how the ATT= keyword works, see Assigning a Line Type with a Pattern for Business Data and Attributes.

Substring Pattern Process

Use the following process to add a substring pattern to the BDP Parser Tuner:

  1. In the BDP, create your categories and define words and phrases as necessary.
  2. In the Parser Tuner, open the Customized Definitions table and note the pattern with which you want to work.
  3. .Insert the substring pattern
  4. Fix bad patterns.
  5. Run Parser Customization to test the changes, then make further modifications in the Parser Tuner if necessary.
  6. Close the Parser Tuner and run the BDP process.
  7. Verify the results.
    Note:

    If you have defined only single word/phrase categories and you use a substring pattern to extract more than one word or phrase, the first word/phrase will be written to a single word output attribute (BP_USER1 through BP_USER25) and any subsequent words will be written to BP_MISC_DATA.