Standardizing Terms - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Product family
Spectrum > Quality > Spectrum Quality
Product name
Spectrum Data Quality
Spectrum Data Quality Guide
Topic type
How Do I
First publish date

Inconsistent use of terminology can be a data quality issue that causes difficulty in parsing, lookups, and more. You can create a dataflow that finds terms in your data that are inconsistently used and standardize them. For example, if your data includes the terms "Incorporated", "Inc.", and Inc" in business names, you can create a dataflow to standardize on one form (for example, "Inc.").

Note: Before performing this procedure, your administrator must install the Data Normalization database containing standardized terms that you want to apply to your data. Instructions for installing databases can be found in the Installation Guide.
  1. In Enterprise Designer, create a new dataflow.
  2. Drag a source stage onto the canvas.
  3. Double-click the source stage and configure it. See the Dataflow Designer's Guide for instructions on configuring source stages.
  4. Drag a Table Lookup stage onto the canvas and connect it to the source stage.

    For example, if you were using a Read from File source stage, your dataflow would look like this:

    Read from File in dataflow
  5. Double-click the Table Lookup stage on the canvas.
  6. To specify the options for Table Lookup you create a rule. You can create multiple rules then specify the order in which you want to apply the rules. Click Add to create a rule.
  7. In the Action field, leave the default option Standardize selected.
  8. In the On field, leave Complete field selected if the whole field is the term you want to standardize. Or, choose Individual terms within a field to standardize individual words in the field.
  9. In the Source field, select the field you want to standardize.
  10. In the Destination field, select the field that you want to contain the standardized term. If you specify the same field as the source field, then the source field's value will be replaced with the standardized term.
  11. In the Table field, select the table that contains the standardized terms.
    Note: If you do not see the table you need, contact your system administrator. The Data Normalization database must be loaded.
  12. In the When table entry not found, set Destination's value to field, select Source's value.
  13. Click OK.
  14. Define additional rules if you want to standardize values in more fields. When you are done defining rules, click OK.
  15. Drag a sink stage onto the canvas and connect it to Table Lookup.

    For example, if you were using Write to File, your dataflow would look like this:

    Write to File in dataflow
  16. Double-click the sink stage and configure it.

    For information on configuring sink stages, see the Dataflow Designer's Guide.

You now have a dataflow that standardizes terms.