The first step for profiling and parsing unstructured business data is to analyze the attribute using phrase analysis. Phrase analysis allows you to search for words and phrases using a word range and uniqueness threshold that you specify. You can also direct the analysis to ignore certain words during the analysis. Use the results of the analysis to create a word definition table that is later imported for use in the BDP process.
To analyze word phrases in an attribute
-
Select the Entities tab on the Discover bar.
-
Open the input entity (Sample) you use for profiling and parsing.
-
Right-click the attribute Mortgage Description and select Attribute Properties. The Attribute Properties window opens.
-
Select the Analysis tab.
-
In the Derived Metadata Rules section, specify the numeric range and frequency as follows:
Phrases, between
1
and
2
words per phrase, and a frequency count of over:
1
You must review the data to decide the shortest and longest phrases to analyze. In this sample, we want to analyze all single words and 2 word phrases such as "interest only" and "adjustable mortgage." Therefore the range is set to 1 and 2.
-
(Optional) Click Configure Ignore words. Ignore word tables must be added or imported to the Control Center in advance. In this sample, we can use the following ignore word table (ignore_mortgage):
-
Example
-
-
(Optional) In the Available Ignore Words list, select the ignore word table (ignore_mortgage) and click Add. The table displays in the Selected Ignore words list.
-
(Optional) Click OK to close the Configure Ignore words window.
-
Check Analyze Now and click OK.
-
Run the analysis now or schedule a time for the job to run later.
-
After the job finishes, right-click the Mortgage Description attribute in the Navigation View and select Drill down to Metadata.
-
Double-click Phrases to open a List View of phrases for the attribute.
The phrase is analyzed. Now you can move on to Step
2, Create a Word Definition Table