You use substring patterns to identify specific data within an existing Business Data Parser (BDP) pattern. This is helpful when parsing large, free-form text fields for which you need to define only a subset of data. For example, you may have the line pattern "Alpha Alpha Alpha 1-Numeric Color Alpha1Special Numeric" and only want to define "Alpha 1-Numeric Color" to be included in the output.
How Substring Patterns Work
The BDP process looks up word combinations in the Customized Definitions table and matches them to patterns that are based on the sequence of categories (attribute types) on a row of data. When the Parser finds no match for a pattern it then checks for substring patterns. If the BDP finds a substring pattern, it defines a pattern based on a substring of the original line pattern. Any part of the line pattern not defined by a substring pattern will be written to the BP_MISC_DATA output attribute.
Note the following guidelines:
- You add substring pattern entries in the Parser Tuner by manually entering them into the Customized Definitions table or by using the Quick Build or Build Entry tools. Substring pattern entries use the syntax SUBSTRING-PATTERN or SUB-PAT.
- You can define more than one substring pattern for a line pattern.
- Show examples
-
Example 1:
Say you are parsing business data that contains the string TWO LARGE RED NIKE SWEATSHIRTS WITH DISCOUNT, and the Customized Definitions table contains the following definitions:
'NIKE' INSERT MISC DEF ATT=BRAND
'NORTH FACE' INSERT MISC DEF ATT=BRAND
'LARGE' INSERT MISC DEF ATT=SIZE
'SMALL' INSERT MISC DEF ATT=SIZE
'RED' INSERT MISC DEF ATT=COLOR
'BLUE' INSERT MISC DEF ATT=COLOR
'SWEATSHIRTS' INSERT MISC DEF ATT=PRODUCT
'T SHIRTS' INSERT MISC DEF ATT=PRODUCT
'SIZE COLOR BRAND PRODUCT'
INSERT PATTERN MISC DEF
RECODE='SIZE COLOR BRAND PRODUCT'
For this data, the Parser will identify the token string ALPHA SIZE COLOR BRAND PRODUCT ALPHA ALPHA. If there is no pattern definition that corresponds to this string, it will not parse successfully because not all the tokens are defined in the Customized Definitions table.
Adding the following substring pattern instructs the Parser to recognize the tokens LARGE RED NIKE SWEATSHIRTS by defining the line type to be I for ITEM instead of M for MISC and to ignore the intrinsic attributes ALPHA:
'SIZE COLOR BRAND PRODUCT'
INSERT SUBSTRING-PATTERN MISC DEF ATT=ITEM
RECODE='SIZE COLOR BRAND PRODUCT'
Example 2:
Say you are parsing business data that contains the string 2010 FORD GREEN COUP, and the Customized Definitions table contains the following definitions:
'2010' INSERT MISC DEF ATT=YEAR
'GREEN' INSERT MISC DEF ATT=COLOR
'COUP' INSERT MISC DEF ATT=MODEL
For this data, the Parser will identify the token string YEAR ALPHA COLOR MODEL. Only three of the four tokens have been defined, so there is no pattern definition that corresponds to this string.
Adding the following substring pattern instructs the Parser to recognize the token 2010, recode the ALPHA to MAKE, define the line type to be V for VEHICLE instead of M for MISC, and to ignore the tokens GREEN COUP:
'YEAR ALPHA'
INSERT SUB-PAT MISC DEF ATT=VEHICLE
RECODE='YEAR MAKE'
For information about line types and how the ATT= keyword works, see Assigning a Line Type with a Pattern for Business Data and Attributes.
Substring Pattern Process
Use the following process to add a substring pattern to the BDP Parser Tuner:
How Substring Patterns Work
The BDP process looks up word combinations in the Customized Definitions table and matches them to patterns that are based on the sequence of categories (attribute types) on a row of data. When the Parser finds no match for a pattern it then checks for substring patterns. If the BDP finds a substring pattern, it defines a pattern based on a substring of the original line pattern. Any part of the line pattern not defined by a substring pattern will be written to the BP_MISC_DATA output attribute.
Note the following guidelines:
- You add substring pattern entries in the Parser Tuner by manually entering them into the Customized Definitions table or by using the Quick Build or Build Entry tools. Substring pattern entries use the syntax SUBSTRING-PATTERN or SUB-PAT.
- You can define more than one substring pattern for a line pattern.
- Show examples
-
Example 1:
Say you are parsing business data that contains the string TWO LARGE RED NIKE SWEATSHIRTS WITH DISCOUNT, and the Customized Definitions table contains the following definitions:
'NIKE' INSERT MISC DEF ATT=BRAND
'NORTH FACE' INSERT MISC DEF ATT=BRAND
'LARGE' INSERT MISC DEF ATT=SIZE
'SMALL' INSERT MISC DEF ATT=SIZE
'RED' INSERT MISC DEF ATT=COLOR
'BLUE' INSERT MISC DEF ATT=COLOR
'SWEATSHIRTS' INSERT MISC DEF ATT=PRODUCT
'T SHIRTS' INSERT MISC DEF ATT=PRODUCT
'SIZE COLOR BRAND PRODUCT'
INSERT PATTERN MISC DEF
RECODE='SIZE COLOR BRAND PRODUCT'
For this data, the Parser will identify the token string ALPHA SIZE COLOR BRAND PRODUCT ALPHA ALPHA. If there is no pattern definition that corresponds to this string, it will not parse successfully because not all the tokens are defined in the Customized Definitions table.
Adding the following substring pattern instructs the Parser to recognize the tokens LARGE RED NIKE SWEATSHIRTS by defining the line type to be I for ITEM instead of M for MISC and to ignore the intrinsic attributes ALPHA:
'SIZE COLOR BRAND PRODUCT'
INSERT SUBSTRING-PATTERN MISC DEF ATT=ITEM
RECODE='SIZE COLOR BRAND PRODUCT'
Example 2:
Say you are parsing business data that contains the string 2010 FORD GREEN COUP, and the Customized Definitions table contains the following definitions:
'2010' INSERT MISC DEF ATT=YEAR
'GREEN' INSERT MISC DEF ATT=COLOR
'COUP' INSERT MISC DEF ATT=MODEL
For this data, the Parser will identify the token string YEAR ALPHA COLOR MODEL. Only three of the four tokens have been defined, so there is no pattern definition that corresponds to this string.
Adding the following substring pattern instructs the Parser to recognize the token 2010, recode the ALPHA to MAKE, define the line type to be V for VEHICLE instead of M for MISC, and to ignore the tokens GREEN COUP:
'YEAR ALPHA'
INSERT SUB-PAT MISC DEF ATT=VEHICLE
RECODE='YEAR MAKE'
For information about line types and how the ATT= keyword works, see Assigning a Line Type with a Pattern for Business Data and Attributes.
Substring Pattern Process
Use the following process to add a substring pattern to the BDP Parser Tuner: