Options - dataflow_designer - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
First publish date
2007
Last updated
2024-03-04
Published on
2024-03-04T22:52:13.486265

The following table lists the options for Best of Breed.

Option Name

Description / Valid Values

Group by

Specifies the field to use to create groups of records to merge into a single best of breed record, creating one best of breed record from each group. In cases where you have used a matching stage earlier in the dataflow, you should select the CollectionNumber field to use the collections created by the matching stage as the groups. However, if you want to group records by some other field, choose the field here. For example, if you want to merge all records that have the same value in the AccountNumber field into one best of breed record, you would select AccountNumber.

Sort

If you specify a field in the Group by field, check this box to sort the records by the value in the field you chose. This option is enabled by default.

Advanced

Click this button to specify sort performance options. By default, the sort performance options specified in Management Console, which are the default performance options for your system, are in effect. If you want to override your system's default performance options, check the Override sort performance options box then specify the values you want in these fields:

In memory record limit
Specifies the maximum number of data rows a sorter will hold in memory before it starts paging to disk. By default, a sort of 10,000 records or less will be done in memory and a sort of more than 10,000 records will be performed as a disk sort. The maximum limit is 100,000 records. Typically an in-memory sort is much faster than a disk sort, so this value should be set high enough so that most of the sorts will be in-memory sorts and only large sets will be written to disk.
Note: Be careful in environments where there are jobs running concurrently because increasing the In memory record limit setting increases the likelihood of running out of memory.
Maximum number of temporary files
Specifies the maximum number of temporary files that may be used by a sort process. Using a larger number of temporary files can result in better performance. However, the optimal number is highly dependent on the configuration of the server running Spectrum Technology Platform. You should experiment with different settings, observing the effect on performance of using more or fewer temporary files. To calculate the approximate number of temporary files that may be needed, use this equation:

(NumberOfRecords × 2) ÷ InMemoryRecordLimit = NumberOfTempFilesN

Note: The maximum number of temporary files cannot be more than 1,000.
Note: The optimal sort performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance:

(InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords

Note: The optimal sort performance settings depends on your server's hardware configuration. You can use this equation as a general guideline to produce good sort performance:

(InMemoryRecordLimit × MaxNumberOfTempFiles ÷ 2) >= TotalNumberOfRecords

Keep original records

Select this option to retain all records in the collection along with the best of breed record. Clear the option if you want only the best of breed record.

Use first record

Select this option if you want Best of Breed to automatically select the first record in the collection as the template record. The template record is the record upon which the best of breed record is based.

Define template record

Select this option to define rules for selecting the template record. For more information, see Defining Template Record Rules.