Configuring Options - dataflow_designer - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
Overview
Reference
Tips
How Do I
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265

To specify the options for Advanced Transformer you create a rule. You can create multiple rules then specify the order in which you want to apply the rules. To create a rule:

  1. Double-click the instance of Advanced Transformer on the canvas. The Advanced Transformer Optionsdialog displays.
  2. Select the number of runtime instances and click OK. Use the Runtime Instances option to configure a dataflow to run multiple, parallel instances of a stage to potentially increase performance.
  3. Click the Add button. The Advanced Transformer Rule Options dialog displays.
    Note: If you add multiple transformer rules, you can use the Move Up and Move Down buttons to change the order in which the rules are applied.
  4. Select the type of transform action you wish to perform and click OK. The options are listed in the table below.
Table 1. Advanced Transformer Options

Option

Description

Source

Specifies the source input field to evaluate for scan and split.

Extract using

Select Table Data or Regular Expressions.

Select Table Data if you want to scan and split using the XML tables located at <Drive>:\Program Files\Precisely\Spectrum\server\modules\advancedtransformer\data.See Table Data Options below for more information about each option.

Select Regular Expressions if you want to scan and split using regular expressions. Regular expressions provide many additional options for splitting data. You can use the prepackaged regular expressions by selecting one from the list or you can construct your own using RegEx syntax.

For example, you could split data when the first numeric value is found, as in "John Smith 123 Main St." where "John Smith" would go in one field an "123 Main St." would go in another. See Regular Expression options below for more information about each option.

Table Data Options

Non-extracted Data

Specifies the output field that you want to contain the transformed data. If you want to replace the original value specify the same field in the Destination field as you did in the Source drop-down box.

You may also type in a new field name in the Destination field. If you type in a new field name, that field name will be available in stages in your dataflow that are downstream of Advanced Transformer.

Extracted Data

Specifies the output field where you want to put the extracted data.

You may type in a new field name in the Extracted Data field. If you type in a new field name, that field name will be available in stages in your dataflow that are downstream of Advanced Transformer.

Tokenization Characters

Specifies any special characters that you want to tokenize. Tokenization is the process of separating terms. For example, if you have a field with the data "Smith, John" you would want to tokenize the comma. This would result in terms:

  • Smith
  • ,
  • John

Now that the terms are separated, the data can be split by scanning and extracting on the comma so that "Smith" and "John" are cleanly identified as the data to standardize.

Table

Specifies the table that contains the terms on which to base the splitting of the field. For a list of tables, see Advanced Transformer Tables. For information about creating or modifying tables, see Introduction to Lookup Tables.

Lookup multiple word terms

Select this check box to enable multiple word searches within a given string. For example:

Input String = "Cedar Rapids 52401" Business Rule = Identify "Cedar Rapids" in string based on a table that contains the entry; Cedar Rapids = US Output = Identifies presence of "Cedar Rapids" and places the terms into a new field, for example City.

For multiple word searches, the search stops at the first occurrence of a match.

Note: Selecting this option may adversely affect performance.

Extract

Specifies the type of extraction to perform. Select from one of these:

Extract term
Extracts the term identified by the selected table.
Extract N words to the right of the term
Extracts words to the right of the term. You specify the number of words to extract. For example, if you want to extract the two words to the right of the identified term, specify 2.
Extract N words to the left of the term
Extracts words to the left of the term. You specify the number of words to extract. For example, if you want to extract the two words to the left of the identified term, specify 2.

If you choose to extract words to the right or left of the term, you can specify if you want to include the term itself in the destination data or the extracted data. For example, if you have this field:

2300 BIRCH RD STE 100

and you want to extract "STE 100" and place it in the field specified in extracted data, you would choose to include the term in the extracted data field, thus including the abbreviation "STE" and the word "100".

If you select neither Destination nor Extracted data, the term will not be included and is discarded.

Regular Expressions Options

Regular Expressions

Select a prepackaged regular expressions from the list or construct your own in the text box. Advanced Transformer supports standard RegEx syntax.

The Java 2 Platform contains a package called java.util.regex, enabling the use of regular expressions. For more information, go to: java.sun.com/docs/books/tutorial/essential/regex/index.html.

Ellipsis Button

Click this button to add or remove a new regular expression.

Populate Group

After you have selected a predefined or typed a new Regex expression, click Populate Group to extract any Regex groups and place the complete expression, as well as any Regex groups found, into the Groups list.

Groups

This column shows the regular expressions for the selected Regular Expressions group.

For example, if you select the Date Regex expression, the following expression displays in the text box: (1[012]{1,2}|0?[1-9])[-/.]([12][0-9]|3[01]{1,2}|0?[1-9])[-/.](([0-9]{4})). This Regex expression has three parts to it and the whole expression and each of the parts can be sent to a different output field. The entire expression is looked for in the source field and if a match is found in the source field, then the associated parts are moved to the assigned output field. If the source field is "On 12/14/2006" and you apply the Date expression to it, and assign the entire date (such as, "12/14/2006) to be placed in the DATE field, the "12" to be placed in MONTH field, the "14" to be placed in the DAY field and "2006" to be placed in YEAR field. It will look for the date and if it finds it will move the appropriate information to the appropriate output field.

Source Field: "On 12/14/2006" DATE: "12/14/2006" MONTH: "12" DAY: "14" YEAR: "2006"

Output Field

Pull-down menu to select an output field.