Options - dataflow_designer - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
Overview
Reference
Tips
How Do I
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265

The table lists the options for the Duplicate Synchronization stage.

Option Name

Description / Valid Values

Group by

Specifies the field to use to create groups of records to synchronize. In cases where you have used a matching stage earlier in the dataflow, such as Interflow Match, Intraflow Match, or Transactional Match, you should select the CollectionNumber field to use the collections created by the matching stage as the groups. However, if you want to group records by some other field, choose the field here. For example, if you want to synchronize records that have the same value in the AccountNumber field, you would select AccountNumber.

Sort

If you specify a field in the Group by field, check this box to sort the records by the value in the field you chose. This option is enabled by default.

Advanced

Click this button to specify sort performance options. By default, the sort performance options specified in Management Console, which are the default performance options for your system, are in effect. If you want to override your system's default performance options, check the Override sort performance options box then specify the values you want in these fields:

In memory record limit
Specifies the maximum number of data rows a sorter will hold in memory before it starts paging to disk. By default, a sort of 10,000 records or less will be done in memory and a sort of more than 10,000 records will be performed as a disk sort. The maximum limit is 100,000 records. Typically an in-memory sort is much faster than a disk sort, so this value should be set high enough so that most of the sorts will be in-memory sorts and only large sets will be written to disk.
Note: Be careful in environments where there are jobs running concurrently because increasing the In memory record limit setting increases the likelihood of running out of memory.
Maximum number of temporary files
Specifies the maximum number of temporary files that may be used by a sort process. Using a larger number of temporary files can result in better performance. However, the optimal number is highly dependent on the configuration of the server running Spectrum Technology Platform. You should experiment with different settings, observing the effect on performance of using more or fewer temporary files. To calculate the approximate number of temporary files that may be needed, use this equation:

(NumberOfRecords × 2) ÷ InMemoryRecordLimit = NumberOfTempFilesN

Note: The maximum number of temporary files cannot be more than 1,000.
Note: The optimal sort performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance:

(InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords

Note: The optimal sort performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance:

(InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords

Rules

Duplicate Synchronization rules determine which records should have their data copied to all other records in the collection.

To add a rule, select Rules in the rule hierarchy and click Add Rule

If you specify multiple rules, you will have to select a logical operator to use between each rule. Choose And if you want the new rule and the previous rule to both pass in order for the condition to be met. Select Or if you want either the previous rule or the new rule to pass in order for the condition to be met.

Option Description

Field name

Specifies the name of the dataflow field whose value you want to evaluate to determine whether to filter the record.

Field Type

Specifies the type of data in the field. One of the following:

Non-Numeric
Choose this option if the field contains non-numeric data (for example, string data).
Numeric
Choose this option if the field contains numeric data (for example, double, float, and so on).

Operator

Specifies the type of comparison you want to use to evaluate the field. One of the following:

Contains
Determines if the field contains the value specified. For example, "sailboat" contains the value "boat".
Equal
Determines if the field contains the exact value specified.
Greater Than
Determines if the field value is greater than the value specified. This operation only works on numeric fields.
Greater Than Or Equal To
Determines if the field value is greater than or equal to the value specified. This operation only works on numeric fields.
Highest
Compares the field's value for all the records group and determines which record has the highest value in the field. For example, if the fields in the group contain values of 10, 20, 30, and 100, the record with the field value 100 would be selected. This operation only works on numeric fields. If multiple records are tied for the longest value, one record is selected.
Is Empty
Determines if the field contains no value.
Is Not Empty
Determines if the field contains any value.
Less Than
Determines if the field value is less than the value specified. This operation only works on numeric fields.
Less Than Or Equal To
Determines if the field value is less than or equal to the value specified. This operation only works on numeric fields.
Longest
Compares the field's value for all the records group and determines which record has the longest (in bytes) value in the field. For example, if the group contains the values "Mike" and "Michael", the record with the value "Michael" would be selected. If multiple records are tied for the longest value, one record is selected.
Lowest
Compares the field's value for all the records group and determines which record has the lowest value in the field. For example, if the fields in the group contain values of 10, 20, 30, and 100, the record with the field value 10 would be selected. This operation only works on numeric fields. If multiple records are tied for the longest value, one record is selected.
Most Common
Determines if the field value contains the value that occurs most frequently in this field among the records in the group. If two or more values are most common, no action is taken.
Not Equal
Determines if the field value is not the same as the value specified.

Value type

Specifies the type of value you want to compare to the field's value. One of the following:

Note: This option is not available if you select the operator Highest, Lowest, or Longest.
Field
Choose this option if you want to compare another dataflow field's value to the field.
String
Choose this option if you want to compare the field to a specific value.
Value

Specifies the value to compare to the field's value. If you selected Field in the Field type field, select a dataflow field. If you selected String in the Value type field, type the value you want to use in the comparison.

Note: This option is not available if you select the operator Highest, Lowest, or Longest.

Actions

Actions determine which field to copy to other records in the group. To add an action, select Actions in the Duplicate Synchronization condition tree then click the Add Action. Use the following options to define the action.

Option Description

Source type

Specifies the type of data to copy to other records in the group. One of the following.

Field
Choose this option if you want to copy a value from a field to the other records in the group.
String
Choose this option if you want to copy a constant value to the other records in the group.

Source data

Specifies the data to copy to the other records in the group. If the source type is Field, select the field whose value you want to copy to the other records in the group. If the source type is String, specify a constant value to copy to the other records in the group.
Note: In case the source data has null value it will not be copied to the other records of the group. The other records will rather retain their original values.

Destination

Specifies the field in the other records to which you want to copy the data specified in the Source data field. For example, if you want to copy the data to the AccountBalance field in all the other records in the group, you would specify AccountBalance.

A Duplicate Synchronization Rule and Action

This Duplicate Synchronization rule and action selects the record where the match score is 100 and copies the account number AccountNumber field in all the other records in the group.

Rule
Field Name: MatchScore
Field Type: Numeric
Operator: Equal
Value Type: String
Value: 100

Action
Source Type: Field
Source Data: AccountNumber
Destination: NewAccountNumber