Standardization - EnterWorks - Fusion - Match_and_Merge - Precisely_EnterWorks - 3.1

EnterWorks Fusion

Product type
Software
Portfolio
Verify
Product family
EnterWorks
Product
Precisely EnterWorks > Match and Merge
Precisely EnterWorks > Fusion
Version
3.1
Language
English
Product name
Precisely EnterWorks
Title
EnterWorks Fusion
Copyright
2023
First publish date
2007

This section of the documentation is still under development. Additional information can be found at: https://infolinkcloud.com/doc/index.html.

Standardization is the process of normalizing values in a table, such as converting all uppercase letters to lowercase letters or removing all non-digit characters from a field containing telephone numbers. When a table is standardized, the values in the results table are generated from the values in the input table. The values may be copied directly from the input table to the result table, or they may be transformed by the application of expressions on the values.

For instructions on renaming, cloning, or deleting a standardization specification, see Specification Management.

How a Standardization Specification Operates

A standardization specification definition consists of:

  • Specification Name

  • Source Name: The data source.

  • Table: The input table

  • Key Column: The record id for the input table.

  • Target Table: The resulting table.

  • Columns: Each row in this section of the UI defines how one or more columns will be created in the results table and how their values will be determined.

    • Input column: Specifies one or more columns that will be processed.

    • Pre-expression: (Optional) A source query language expression that will be used to compute the column's values.

    • Function: (Optional) A Java function that will be used to compute the column's values.

    • Expression: A source query language expression that will be used to compute the column's values.

    • Result column name: The names of the columns that will be created in the target table to hold the computed values.

Broadly speaking, standardization specifications generate the resulting table by working left to right across the Column definition. The values of the Input column are used as input to the Pre-expression (if there is one). Those results are operated on by the Function (if there is one). Those results are used as input to the Expression, (if there is one). The final results are stored in the specified Result column.

The actual processing is as follows:

  • If a Function is specified for one or more columns, the value of the result column is computed as follows:

    1. For every column included in the specification, the Pre-expression values are computed in the source database as a single query.
    2. InfoLink iterates over the result of the query and applies the corresponding Function to each columns. The result of the function application is stored to the source as temporary table.
    3. Then another query is evaluated to evaluate Expression for each column.
  • If a Function is not specified for any column, the whole operation is run as a single query in the source. First Pre-expression is evaluated then Expression is evaluated for each column.

Note: When Function is specified for at least one column, a round trip of data from the source to Fusion and back is performed. Depending on the volume of data involved, there may be a performance penalty.

Creating a Standardize Specification

Standardize specifications are located in the navigation tree, in the Specifications section, in the Standardization subsection. Before you can create a specification, you must have configured your data source.

To create a standardization specification:

  1. In the Specifications section of the navigation tree, right-click Standardization and select Create standardization specification.
  2. A pop-up window will appear that has the following configuration options:
    • Specification name: The name of your new specification.

    • Select source: Select the source of the data to be standardized.

  3. When you are done editing the options, click Create. Your new specification will be displayed in the current window. To edit your specification, follow the instructions outlined in Edit a Standardization Specification.

Editing a Standardization Specification

To edit a standardization specification:

  1. In the application's navigation tree, click the standardization specification's name. Your specification will be displayed in the current window.
  2. Edit the specification's configuration options:
    • Specification Name: This field is pre-populated and cannot be edited.

    • Source Name: This field is pre-populated and may not be able to be changed.

    • Table: Enter the name of the input table you want to standardize. You can select the table name from the dropdown menu or start typing the name of the table to filter the dropdown menu.

      If your data source is of type EnterworksSource, the dropdown may list the tables by name or it may list them by repository number. To determine the number of an EnterWorks repository, in the New UI, open the repository in the Detail Editor. The number of the repository is the digits at the end of the Detail Editor's tab's URL.

    • Key Column: name of the record ID column in the input table. The values in this column must be unique and not null. This parameter is only required if you use Java-based functions in the columns section.

    • Target Table: The name of the result table the system will create to contain the standardized records.

    • Columns: Each row in this section defines how one or more columns will be created in the results table and how their values will be determined. (See Example Standardization Specification.)

      Specify one or more columns to be standardized.

      • To add a column, click the Add column button.

      • To delete a column, click the column's Delete button on the far right of the screen.

      The configuration options for columns are as follows:

      • Input column: Use the dropdown menu to indicate which columns of the input table will be acted upon:

        • The name of a particular column.

        • <any column>: All columns in the target table.

        • <expression>: Select this if the action you want to perform does not require values from the input table, for example, if you want to add an empty column to the result table.

      • Pre-expression: (Optional) An expression in the source query language (typically SQL) that will be performed on the column value. You can use the string literal ${col_name} to refer to the Pre-expression's input. (See Example Standardization Specification.)

      • Function: A function that is performed on the results of the of the Pre-expression.

      • Expression: An expression in the source query language (typically SQL) that will be performed on the column value. You can use the string literal ${col_name} to refer to the Expression's input. (See Example Standardization Specification.)

      • Result column name: The name or names of the column or columns that will store the resulting values. This can expressed as:

        • A text string that will be used as the name.

        • An expression that will be evaluated to create the name of one or more columns. For example, if the following is entered:

          ${col_name}

          the column in the result table will have the same name as the column in the input table.

Example Standardization Specification

This standardization specification will be used to prepare a table for deduplication, (a single source matching specification). The matching specification requires a column called data_id that contains an unique identifier. We will also add a column titled Name_st that contains the standardized values of the column Name.

Input column Pre-expression Function Expression Result column name
InternalRecordId     ${col_name} data_id
<any column>     ${col_name} ${col_name}
Name   StandString.java cast(${col_name} as nvarchar(900)) ${col_name}_st

The first row in the above table will cause Fusion to copy the values from the input table's column InternalRecordId to a column it creates in the target table called data_id. data_id will be a duplicate of InternalRecordId.

The second row copies all the columns from the input table to the target table. The columns in the target table will have the same names as the columns in the input table.

The third row applies the standardized function StandString.java to the values in the input table's Name column. It writes the results to a column in the target table called Name_st.

Executing a Standardization Specification

Once a standardization specification has been created, you can call it from a scenario using the GeneralStandardize operation or you can execute it directly by right-clicking the specification name in the navigation tree and selecting how you want it to be run.

When you right-click a specification's name to execute it, you can:

  • Execute: Executes the specification and displays the returned status in the Activity Log.

  • View results: Displays the result table in the Result tab.