A BDP pattern is the sequence of output categories (attribute types) and/or intrinsic attributes that the BDP identifies in your data. The pattern syntax is based on the sequence in which your categories display in output attributes. Word/phrase definitions and pattern definitions are created in the same file to allow entire lines of data to be identified.
BDP patterns:
- Help reduce the number of words and phrases that you have to define (see Example 2 below).
- Identify specific data within an existing BDP pattern using substring patterns.
- Validate pattern syntax based on the order in which the different categories display; for example, the following pattern tells the BDP that the unrecognized word ALPHA, as preceded by MAKE and followed by MODEL, should be treated as the category COLOR: 'MAKE ALPHA MODEL'
INSERT PATTERN MISC DEFRECODE='MAKE COLOR MODEL'
Bad Patterns
Patterns are considered bad when they are not defined in the Customized Definitions word and pattern table. Bad patterns are written to the exceptions file. After you run the BDP, you view and fix patterns in the Parser Tuner's Parsing Exceptions Analyzer. Patterns are displayed in descending order by frequency. To define a pattern, you replace each flagged attribute with a defined category. You may also need to modify a word or phrase definition.
It is generally worth correcting bad patterns that occur multiple times in the data, but it may not be worth correcting patterns that rarely occur. Bad patterns often contain intrinsic attributes that are flagged for correcting. Fixing a bad pattern adds a pattern definition to the word and pattern table. The pattern definition tells the BDP which combinations of categories are valid.
- Example 1
-
Even if the BDP parser recognizes every defined word and phrase, it considers the input data to have a bad pattern unless a specific sequence of categories is defined as a pattern. Say you define three categories called BRAND, COLOR, and SIZE. If these can occur in any order in the data, then there are six possible patterns to be defined to avoid bad patterns:
BRAND, COLOR, SIZEBRAND, SIZE, COLORCOLOR, BRAND, SIZECOLOR, SIZE, BRANDSIZE, COLOR, BRANDSIZE, BRAND, COLOR
- Example 2
-
Patterns can be developed so that you do not have to define all words and phrases in the input data. Using the categories from Example 1, you could omit the BRAND category definitions and use the following patterns (assuming all brand names are ALPHA values):
ALPHA, COLOR, SIZEALPHA, SIZE, COLORCOLOR, ALPHA, SIZECOLOR, SIZE, ALPHASIZE, COLOR, ALPHASIZE, ALPHA, COLOR
Dummy Patterns
When the Customized Definitions table does not contain any entries with the syntax of a pattern, a sample (dummy) pattern is included based on categories you have defined. Dummy patterns are useful as a reference to understand the pattern syntax of your categories before pattern definitions are created.
- Example
-
If you define three categories called BRAND, COLOR, and SIZE, the dummy pattern entry in the Customized Definitions table will look like the following:
* Dummy Pattern
*
'BRAND SIZE COLOR'
INSERT PATTERN MISC DEF
RECODE='BRAND SIZE COLOR'