Analyzing Phrases in a BDP Attribute - trillium_quality - trillium_discovery - Latest

Trillium Control Center

Product type
Product family
Trillium > Trillium Discovery
Product name
Trillium Quality and Discovery
Trillium Control Center
First publish date
Last updated
Published on

Phrase analysis is a process you can perform if you plan to add definitions to the Customized Definition table using the Word Definitions Tool, which optionally uses the results of phrase analysis as input. Phrase analysis is also useful as a data profiling tool.

How Phrase Analysis Works

Phrase Analysis identifies all unique words and phrases (series of adjacent words) that occur in an attribute. Use the results of the analysis to find and compare word phrases that represent similar string values but may have a different phrasing, spelling, or meaning. (For example, you may have two values that describe a product color, such as NVY and NAVY.)

You specify phrase analysis by opening the attribute for editing and setting the following values:

Phrases, between x and y words per phrase, and a frequency count of over: z

where x and y are numbers between 1 and 99, and z is a number between 1 and 9999.

Note: You must run the Transformer process to populate the entity to be analyzed. There is no difference between the Transformer output entity and the BDP input entity; either can be analyzed and used as input for the Word Definitions Tool.

To analyze phrases in an attribute

  1. From the Navigation or Quality Project View of a BDP project, right-click the input entity and select Analyze.... The Attribute Selection window opens.
  2. Select the attribute you want to analyze and click OK. The scheduler notification bar opens.
  3. (Optional) Change the job name.
  4. Do one of the following:
    • To schedule the job to run immediately, click Now. The message closes and the job begins running in the background.
    • Click Later. The Set Date & Time scheduler window opens. To immediately run the job in the background, click Run Now. To schedule a time, select a starting date on the calendar and select a time you want the job to run on the selected date. Click Submit to save the scheduled time.
    • Click Cancel to cancel the task.
  5. When analysis finishes, In the Navigation View expand the input entity, right-click the attribute that contains the business data, and select Attribute Properties. The Attribute Properties window opens.
  6. In the Derived Metadata Rules section on the Analysis tab, check Phrases, between and enter a numeric range to specify the number of words required to make up a phrase before it will be analyzed. The default is between 1 and 5.
    Note: The higher the words-per-phrase value, the longer it takes the analysis to run. Therefore, it is recommended that you keep these numbers low; for example, If your data contains only single word phrases, set the range between 0 and 1. If your data contains 1 and 2 word phrases, set the range between 1 and 2. 2 or more word phrases are considered multi-word phrases.
  7. Choose a uniqueness indicator by entering a number after and afrequency count of over:. This specifies how many times the phrase must occur before it is listed in the output; the Phrase Analysis output only includes words/phrases that occur more than the specified number. The default is 1.
    Note: If you have a large volume of data, consider setting this to a higher value so that the output only includes words/phrases that occur numerous times. This helps to enhance performance and produce a lower, more manageable volume of output.
  8. Select Analyze Now to perform phrase analysis on the attribute.
  9. Click OK to save your settings and close the window. The scheduler notification bar opens.
  10. (Optional) Change the job name.
  11. Do one of the following:
    • To schedule the job to run immediately, click Now. The message closes and the job begins running in the background.
    • Click Later. The Set Date & Time scheduler window opens. To immediately run the job in the background, click Run Now. To schedule a time, select a starting date on the calendar and select a time you want the job to run on the selected date. Click Submit to save the scheduled time.
    • Click Cancel to cancel the task.

    All scheduled jobs are run in the background. Note the progress in the Background Tasks List View.

  12. After the analysis completes, view phrase analysis results.