About Business Data Parser Patterns - trillium_discovery - trillium_quality - 17.1

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Trillium > Trillium Quality
Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Topic type
Overview
Administration
Configuration
Installation
Reference
How Do I
First publish date
2008

A BDP pattern is the sequence of output categories (attribute types) and/or intrinsic attributes that the BDP identifies in your data. The pattern syntax is based on the sequence in which your categories display in output attributes. Word/phrase definitions and pattern definitions are created in the same file to allow entire lines of data to be identified.

Note: Patterns that the parser does not recognize are written to the exceptions file. You always view and modify patterns in the Parser Tuner.

BDP patterns:

  • Help reduce the number of words and phrases that you have to define (see Example 2 below).
  • Identify specific data within an existing BDP pattern using substring patterns.
  • Validate pattern syntax based on the order in which the different categories display; for example, the following pattern tells the BDP that the unrecognized word ALPHA, as preceded by MAKE and followed by MODEL, should be treated as the category COLOR: 'MAKE ALPHA MODEL'

INSERT PATTERN MISC DEFRECODE='MAKE COLOR MODEL'

Note: For more information, see Pattern Structure and Special Entries.

Bad Patterns

Patterns are considered bad when they are not defined in the Customized Definitions word and pattern table. Bad patterns are written to the exceptions file. After you run the BDP, you view and fix patterns in the Parser Tuner's Parsing Exceptions Analyzer. Patterns are displayed in descending order by frequency. To define a pattern, you replace each flagged attribute with a defined category. You may also need to modify a word or phrase definition.

It is generally worth correcting bad patterns that occur multiple times in the data, but it may not be worth correcting patterns that rarely occur. Bad patterns often contain intrinsic attributes that are flagged for correcting. Fixing a bad pattern adds a pattern definition to the word and pattern table. The pattern definition tells the BDP which combinations of categories are valid.

Dummy Patterns

When the Customized Definitions table does not contain any entries with the syntax of a pattern, a sample (dummy) pattern is included based on categories you have defined. Dummy patterns are useful as a reference to understand the pattern syntax of your categories before pattern definitions are created.

Note: The Customized Definitions table must contain at least one pattern before the BDP will run.