You use the BDP Word Definitions Tool to create the business data categories, along with the word and phrase definitions, that tell the BDP process how to identify and standardize your business data.
Note: A phrase is a series of adjacent words.
You can also add word and phrase definitions in the Parser Tuner. You must be familiar with parser syntax and conventions before adding or customizing definitions in the Parser Tuner. For more information, see Adding a Definition to a Custom Definition Table. You can also define masks. For information, see Adding Mask Definitions.
- Definition Options
-
The Word Definitions tool gives you the following options to define your data for the parser process:
-
Categories: Assign each word and phrase to a category. Categories are attribute types that help define patterns the BDP uses to standardize data. When the BDP process runs, all data is placed in output attributes according to how the data is categorized. You can specify a maximum of 500 categories. The categories correspond up to 500 output attributes with the default names BP_USER1 through BP_USER500.
-
Position: Set the position to define the word/phrase location within the output data line.
-
Recodes: Recode words and phrases to conform and standardize the data; for example, if you have numbers representing colors in your product color, recode them as words.
-
Classifications: Add classifications to a word or phrase to enhance the definition; for example, if you are parsing clothing product data, the word Sweatshirt can be added to the Product category and then assigned a classification of Casual.
-
Synonyms: Create synonyms of original data values to correct misspellings and identify words in the output.
-
Multiple-Phrases: The Assign Multiple Phrases window allows you to associate the same definition to more than one word/phrase at time.
To add word and phrase definitions
-
Analyze words and phrases in the input attribute.
-
From the Navigation or Quality Project View, right-click the Business Data Parser process and select Edit Process. The BDP editing window opens.
-
Click Tools.
-
Click Word Definitions. The Word Definitions window displays.
-
Click Categories. The Output Categories window opens.
-
Enter the name of your output categories in the Single and/or Multi columns using the following guidelines:
-
Single. The categories in the Single column do not allow concatenation in the output; you only specify one word or phrase in a single category. Create a maximum of 25 single categories. The number for each category corresponds to the BP_USER1 - BP_USER25 output attributes.
-
Multi. The categories in the Multi column allow concatenation of words and phrases in the output. Therefore, all words and phrases found for a multi output category will be concatenated in the output attribute for that category. Create a maximum of 475 multiple categories. The number for each category corresponds the BP_USER26 - BP_USER500 output attributes. Define multi categories if you plan to extract multiple words/phrases using a substring pattern. (For more information, see Substring Patterns.)
-
Example
-
When deciding whether to use Single or Multi categories, consider how the output will display. For example, say you have the following input row: 2008 FORD FOCUS 2-DOOR HATCHBACK. The entry in the Customized Definitions table would look like the following, where the category MODEL occurs three times:
'2008' INSERT MISC DEF ATT=YEAR'FORD' INSERT MISC DEF ATT=MAKE'FOCUS' INSERT MISC DEF ATT=MODEL'2-DOOR' INSERT MISC DEF ATT=MODEL'HATCHBACK' INSERT MISC DEF ATT=MODEL
The associated row pattern would look like the following:
'YEAR MAKE MODEL MODEL MODEL' xxxPATTERN MISC DEF xxxRECODE='YEAR MAKE MODEL MODEL MODEL'
If all categories are defined as Single, the user-defined output attributes would populate as follows:
Attribute
|
Data
|
BP_USER1
|
2008
|
BP_USER2
|
FORD
|
BP_USER3
|
FOCUS
|
If the MODEL category is defined as Multi, the user-defined output attributes would populate as follows:
Attribute
|
Data
|
BP_USER1
|
2008
|
BP_USER2
|
FORD
|
BP_USER26
|
FOCUS 2-DOOR HATCHBACK
|
-
Click OK.
-
To define the first word/phrase, click the first field in the Word/Phrase column and enter a word/phrase or select one from the drop-down list of original data values. All words and phrase combinations are available in the drop-down list.
Note: To navigate the rows and columns, press the Tab or arrow keys or right-click
with your mouse; pressing the Enter/Return key while entering definitions
will save your work and close the tool.
Note the following guidelines:
- Use the Filter Phrase list to filter the drop-down list of available words/phrases by entering a word, phrase, letter, or number. This restricts the list to entries that include the specified string.
- When you add a word/phrase to the list of definitions, it is removed from the drop-down list.
- When you delete a word/phrase from the list of definitions, it is added to the end of the drop-down list.
-
Click in the Category column and select a category from the drop-down list.
Note the following guidelines:
-
Assign a category to multiple words/phrases. Any values defined in the Word/Phrase column that do not yet have a category assigned are given the next selected category; for example, select a word/phrase for rows 10 through 14, then select the category BRAND from any of the uncategorized rows. The Category column for rows 10 through 14 will display BRAND.
- This is a required field; you must select a category for each word and phrase.
-
Assign a definition to multiple words/phrases. Click Multi-Phrase.... In the Assign Multiple Phrases window, select the words/phrases to include, then select the category to which you want to add them. Optionally, assign all selected words/phrases to a position, recode, and classification.
-
To define a position for the word/phrase, do one of the following in the Pos column:
- If the physical location of the word/phrase in the output row is irrelevant, accept the position of DEF (default) .
- If the definition only applies to the word/phrase when it occurs at the beginning or end of the input attribute, select from the drop-down list either BEG or END.
- This field is required.
-
(Optional) To recode the word/phrase, in the Recode column enter a recode value.
-
(Optional) To add a classification, in the Classification column enter a classification value.
-
(Optional) To create a synonym of a defined word/phrase, click Synonyms.... The Synonyms window opens. Enter or select a synonym in the Synonym column. In the Of column, select the defined word/phrase for which you are creating a synonym. For more information about using synonyms in the parser, see Synonyms.
Note: You cannot create a synonym for words or phrases that have not yet been
defined.
-
When you are finished defining words/phrases, press Enter/Return or click OK to save your work and close the tool.
-
To verify your definitions in the Customized Definitions table, click Launch Parser Tuner... The Parser Tuner opens in a separate window. In the BDP edit window the field next to the Launch Parser Tuner button displays an Application running message and lists any files that get transferred between the applications. You can work independently in the Parser Tuner and the Control Center at the same time, if necessary.
Note: Whenever you finish editing the BDP, before you
run the process, always click
Finish to save your work and close the editing window. If you close the
edit window without clicking
Finish, your definitions and edits will not be
saved.