DUPKEYS Control Statement - mfx - 3.1

Syncsort™ MFX Programmers Guide

Product type
Software
Portfolio
Integrate
Product family
Syncsort™ Software
Product
Syncsort™ MFX > MFX
Version
3.1
Language
English
Content type
Programmer’s Guide
Product name
Syncsort™ MFX
Title
Syncsort™ MFX Programmers Guide
Copyright
2024
First publish date
2010
Last updated
2024-08-27
Published on
2024-08-27T08:14:56.318001

The DUPKEYS control statement is used to enable special processing for records with equal sort/merge control fields (keys). You can perform the following functions:

  • Sum specified numeric fields, place the sum in one record and delete the other records with the same key (SUM)

  • Compute the average of specified numeric fields, place the average in one record and delete the other records with the same key (AVG)

  • Determine the minimum or maximum value of specified numeric fields, place this value in one record and delete the other records with the same key (MIN,MAX)

  • Delete all but one of the records with equal keys (FIELDS=NONE)

  • Retain only records with keys that occur more than once (ALLDUPS)

  • Retain only the first record of those with keys that occur more than once (FIRSTDUP)

  • Retain only the last record of those with keys that occur more than once (LASTDUP)

  • Retain only the records with keys that occur only once (NODUPS)

Note that the retained record will not necessarily be the first record unless the EQUALS parameter is in effect.

The records deleted by DUPKEYS can optionally be written to a separate file.

The DUPKEYS control statement cannot be used with a SUM control statement, nor when FIELDS=COPY is specified on the SORT or MERGE control statement.

If you need to add other DUPKEYS functionality to an application with a SUM control statement, you must move the SUM specification to the DUPKEYS statement and remove the SUM statement. If XSUM was used, then XDUP should be specified and the JCL changed from using a SORTXSUM DD to a SORTXDUP DD.

DUPKEYS Control Statement Format

The format of the DUPKEYS control statement is illustrated below.
Figure 1. DUPKEYS Control Statement Format

Function Parameters (AVG, MAX, MIN, SUM)

Each field specified in the AVG, MAX, MIN, and SUM parameters is identified by its position p, length l, and format f, described as follows:
p The position value indicates the first byte of the field relative to the beginning of the input record after INREC and/or E15 processing, if specified, have completed. The field must begin on a byte boundary.
l The length value indicates the length of the field. The length must be an integral number of bytes. Refer to Table 4 for the permissible lengths.
f The optional format value indicates the data format. The formats that can be specified are in Table 4. If all the defined fields have the same format, you can specify the format value once by using the FORMAT=f subparameter. If you specify both the individual f values and the FORMAT subparameter, the individual f values will be used for fields where they are specified.
Table 1. Allowed DUPKEYS Formats and Field Lengths
FORMAT CODE PERMISSIBLE LENGTH
SUM Fields MIN or MAX Fields AVG
 BI 2, 4, or 8 bytes 1 to 256 bytes 2, 4, or 8 bytes
FD* 4, 8, or 16 bytes 4, 8, or 16 bytes 4, 8, or 16 bytes
FI 2, 4, or 8 bytes 2, 4, or 8 bytes 2, 4, or 8 bytes
FL 4, 8, or 16 bytes 4, 8, or 16 bytes 4, 8, or 16 bytes
PD 1 to 16 bytes 1 to 16 bytes 1 to 10 bytes
ZD 1 to 31 bytes 1 to 31 bytes 1 to 18 bytes
Note: *A non-finite number in the data will cause a WER497A error.

AVG Parameter (Optional)

Use the AVG parameter to specify numeric fields to contain the average value calculated from all records with the same control fields. Multiple fields separated by commas may be specified in the same parameter. The results of the AVG parameter will be truncated for all data formats except FL.

If overflow or underflow occurs during AVG calculations, the duplicate-keyed records will not be deleted and none of the AVG, MAX, MIN, or SUM functions will be performed.

Adding AVG to an existing MAXSORT application could cause the generation of additional intermediate output files (SORTOU00 or SORTOUnn). This occurs because AVG delays DUPKEYS processing until the final MAXSORT merge pass.

MAX Parameter (Optional)

Use the MAX parameter to specify numeric fields to retain the maximum value among all records with the same control fields. Multiple fields separated by commas may be specified in the same parameter. Equally-keyed records are processed pair by pair. For the MAX parameter, the values in the MAX fields are compared; the highest MAX value is placed in one of the records, and the other equally-keyed records are deleted. The sorted data will be reduced to one record per sort key value.

MIN Parameter (Optional)

Use the MIN parameter to specify numeric fields to retain the minimum value among all records with the same control fields. Multiple fields separated by commas may be specified in the same parameter. Equally-keyed records are processed pair by pair. For the MIN parameter, the values in the MIN fields are compared; the lowest MIN value is placed in one of the records, and the other equally-keyed records are deleted. The sorted data will be reduced to one record per sort key value.

SUM Parameter (Optional)

Use the SUM parameter to specify numeric fields to contain the summed value for all records with the same control fields. Multiple fields separated by commas may be specified in the same parameter. Equally-keyed records are processed pair by pair. For the SUM parameter, the values in the SUM fields are added, the sum is placed in one of the records, and the other record is deleted. The sorted data will be reduced to one record per sort key value if arithmetic overflow does not occur during the summing process.

If the sum of any of the specified SUM fields in any two equally-keyed records overflows the size of the field, the duplicate-keyed record will not be deleted and none of the AVG, MAX, MIN, or SUM functions will be performed.

FIELDS Parameter (Optional)

The only valid value for FIELDS is NONE. Specify FIELDS=NONE only if no arithmetic functions are desired. The sorted data will be reduced to one record per sort key value.

ALLDUPS Parameter (Optional)

Use the ALLDUPS parameter to specify that only records with sort/merge fields that occur more than once are retained.

FIRSTDUP Parameter (Optional)

Use the FIRSTDUP parameter to specify that only the first record of those with sort/merge fields that occur more than once should be retained. If the NODUPS parameter is also specified, all records with sort/merge fields that occur exactly once are also retained. For a merge, the first record will be the first record from the lowest numbered input file.

LASTDUP Parameter (Optional)

Use the LASTDUP parameter to specify that only the last record of those with sort/merge fields occurring more than once should be retained. If the NODUPS parameter is also specified, all records with sort/merge fields occurring exactly once are also retained. For a merge, the last record will be the last record from the highest numbered input file.

NODUPS Parameter (Optional)

Use the NODUPS parameter to specify that only records with sort/merge fields that occur exactly once are retained.

XDUP Parameter (Optional)

Specify the XDUP parameter if you want records deleted by DUPKEYS processing to be written to a data set defined by the SORTXDUP DD statement. These records will be written to SORTXDUP at the time of DUPKEYS processing. The records will not undergo OUTREC, E35, and OUTFIL processing because such processing occurs after DUPKEYS processing.

The DCB BLKSIZE of the SORTIN data set will not be used to determine the BLKSIZE of the SORTXDUP data set. System determined blocksize will be used when enabled and appropriate. Unblocked output will be generated if system determined blocksize has been disabled and an explicitly specified blocksize has not been provided in the JCL.

The XDUP file will be sequenced in the same order as the SORTOUT file.

Note that XDUP may increase system requirements:
  • Adding XDUP to an existing sort application may result in an increase in the amount of SORTWORK space required. This occurs because XDUP delays all DUPKEYS processing until Phase 3.

  • Adding XDUP to an existing MAXSORT application could cause the generation of additional intermediate output files (SORTOU00 or SORTOUnn). This occurs because XDUP delays DUPKEYS processing until the final MAXSORT merge pass.

  • XDUP may require additional main memory. Specify a region size of 512K or more.

General Considerations for DUPKEYS

  • If NOEQUALS is in effect, the record which is retained during arithmetic processing (AVG, MAX, MIN, SUM), or FIELDS=NONE processing, is determined arbitrarily. If EQUALS is in effect, the record which is retained is the first record read in a SORT application; in a MERGE, the retained record will be from the lowest-numbered input file. The EQUALS parameter can be specified on the SORT or MERGE control statement or as a PARM option.

  • Functions (AVG, MAX, MIN, SUM), FIELDS=NONE, ALLDUPS, FIRSTDUP, LASTDUP and NODUPS are all mutually exclusive parameters, except that NODUPS can be specified with FIRSTDUP and LASTDUP.

  • AVG, MAX, MIN, or SUM arithmetic cannot be performed on a SORT or MERGE control field. An AVG, MAX, MIN, or SUM field cannot include any or part of a SORT or MERGE control field.

  • AVG, MAX, MIN, and SUM fields may not overlap each other.

  • Each AVG, MAX, MIN and SUM parameter may be used only once.

  • If any variable-length record does not contain all of the AVG, MAX, MIN, or SUM fields, none of the arithmetic functions will be performed for that record.

  • Non-AVG, non-MAX, non-MIN, and non-SUM fields remain unchanged and are retained from the record which contains the average, maximum, minimum, or sum value, respectively.

  • If overflow or underflow occurs during AVG or SUM calculations for records, then those records will not have any functions performed and none of the records will be deleted. MAX and MIN calculations are also suspended among those records. AVG, MAX, MIN, and SUM arithmetic restarts when a subsequent set of records with equal control fields can be averaged or summed without overflow. Further processing is determined by the option selected at installation through the SUMOVFL parameter or the run-time parameter OVFLO. If the RC16 option of this parameter has been selected, processing will terminate with a WER049A critical error. For the RC0 (the delivered default) or the RC4 option, average or sum processing will continue and a WER049I message will be issued (only for the first occurrence). If a subsequent pair of records with equal control fields can be averaged or summed without causing overflow or underflow, the arithmetic functions will be performed. To avoid arithmetic overflow with SUM, use the INREC control statement to insert zeros of the proper format immediately before the SUM field. For example, for a PD field, use nZ to insert binary zeros.

  • Remember that the first 4 bytes of variable-length records are reserved for the Record Descriptor Word, so the first byte of the data portion of the record is byte 5.

  • DUPKEYS is incompatible with an incore sort. If you specify the DUPKEYS control statement, allocate SORTWKxx data sets in the JCL or use the DYNALLOC feature for dynamic SORTWK allocation. If no JCL SORTWKs are provided and DYNALLOC is disabled by default, DUPKEYS will cause DYNALLOC to be enabled.

  • When AVG and SUM arithmetic is performed on FL fields, user-issued SPIE macros are not permitted and exit routines must not produce exponent overflow or underflow. Because of the numeric rounding performed by the hardware, the exact average or sum depends on the order in which fields are calculated. Thus, the average or sum may vary slightly for different executions.

  • By default, the sign byte of a positive averaged or summed ZD field will be converted to printable format. If you want to disable this action, use the NZDPRINT PARM option. Refer to ZDPRINT.

  • Adding ALLDUPS, FIRSTDUP, LASTDUP or NODUPS to an existing sort application may result in an increase in the amount of SORTWORK space required. This occurs because these functions delay all DUPKEYS processing until Phase 3.

  • Adding ALLDUPS, FIRSTDUP, LASTDUP or NODUPS to an existing MAXSORT application could cause the generation of additional intermediate output files (SORTOU00 or SORTOUnn). This occurs because these functions delay DUPKEYS processing until the final MAXSORT merge pass.

Sample DUPKEYS Control Statement

The following DUPKEYS statement deletes records with equal control fields but places arithmetic sum, minimum, maximum, and average values of some fields in the retained record.
Figure 2. Sample DUPKEYS Statement

When the control fields are equal, this sample statement sums the ZD field beginning in byte 20 and the FI field beginning in byte 32; selects the minimum value of the ZD field beginning in byte 40; selects the maximum value of the ZD field beginning in byte 48; averages the PD field beginning in byte 56 and the PD field beginning in byte 64; and then deletes the equal-keyed record.