/PARTITION - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To define a partition scheme that can be applied to a target.

Format

/PARTITION distribution

where

distribution by_size_only | by_count_only | by_condition | by_field }
by_size_only size_constraint
by_count_only count_constraint [ROUNDROBIN maximum_size_per_round]
size_constraint PARTITIONSIZE [COMPRESSED] maximum_partition_size
count_constraint PARTITIONCOUNT number_of_partitions
by_condition BYCONDITION condition [,condition…] [UNSELECTEDRECORDS partition_suffix] [NOEMPTY]
by_field BYFIELD [size_constraint | count_constraint]
maximum_size_per_round maximum_size
maximum_partition_size  maximum_size
maximum_size  r_number RECORDS | b_number BYTES | k_number KILOBYTES | m_number MEGABYTES | g_number GIGABYTES | t_number TERABYTES

Arguments

x_number

A positive non-zero integer that indicates the maximum amount of data, in one of the following units, which is written either in total or in each round to each partition:

records - r_number bytes - b_number kilobyes - k_number megabytes - m_number gigabytes - g_number terabytes - t_number

number_of_partitions A positive non-zero integer that indicates the exact number of partitions that will be created. Depending on the data and other options, some partitions may be empty.
condition The name of a condition defined in a /CONDITION option or a level 88 in a COBOL copybook supplied through a /DATADICTIONARY option, that is evaluated to determine whether a record gets selected into a given partition.
partition_suffix A character string that is the suffix to be added to the target file name. The resulting file name is used for the partition where all records that are not selected by any of the conditions are put.

Location

This option may appear anywhere in the task definition.

Notes

The /PARTITION option is applicable to the last output file that is listed before the partition statement.

Target partitioning is not supported for XML, Indexed, Micro Focus, Tableau Data Extract, or QlikView data eXchange files.

For Apache Avro and Apache Parquet file formats, target partitioning is supported only with the following distributions (see the syntax rules above for definitions):

  • by_count_only
  • by_condition
  • by_field, with count_constraint option only

Distribution by size only

A partition scheme based on size only is one constrained by the maximum size of each partition.

The maximum_partition_size determines the maximum amount of data that can be written to each partition. As records are not broken across partitions, the amount of data written to a given partition might be less than the maximumum amount specified. A minimum of 1 record is written to each partition.

Distribution by count only

A partition scheme based on count only is one constrained by the number of partitions.

The number_of_partitions determines the exact number of partitions that are created. As records are not broken across partitions, the amount of data written to a given partition might be less than the maximumum amount specified. Depending on the amount of data, some partitions might be empty. A minimum of 1 record is written to each partition.

If number_of_partitions is not specified, Connect ETL applies a default value, which could change between multiple executions of the task.

The maximum_size_per_round determines the amount of data that is written to a partition before moving to the next partition.

Distribution by condition

A partition scheme based on condition is one in which a condition is specified for each partition. The number of conditions determines the number of partitions. For each target record, the conditions are evaluated in sequence to determine whether the record goes into the corresponding partition. Once a condition is true for a record, subsequent conditions are not evaluated.

Records that are not selected by any of the specified conditions are dropped unless the UNSELECTEDRECORDS keyword is specified. In this case, they go into a separate partition.

Empty partitions are not created when the NOEMPTY keyword is specified.

Distribution by field

A partition scheme based on field is one in which all records with the same field values are distributed to the same partition. Field based partitioning can be used with sort and merge applications only. The sort and merge field values are used to partition the file. Field based distribution cannot be specified when the task includes the /SUMMARIZE option.

When PARTITIONCOUNT is specified, Connect ETL attempts to distribute the sets of records with equal field values such that each partition has approximately the same number of records.

When PARTITIONSIZE is specified, Connect ETL attempts to create partitions of the specified size, while keeping all records with the same field values together.

When neither PARTITIONCOUNT nor PARTITIONSIZE is specified, each partition contains the records for one set of sort or merge field values.

Compressed Size

When the COMPRESSED attribute is specified, target compressed size is considered when partitioning output files.

Examples

/OUTFILE Customers.dat STLF
/PARTITION partitionsize 200 megabytes 
The target file Customers.dat will be split into partitions that are each about 200 megabytes in size and are named Customers_1.dat, Customers_2.dat etc. The number of partitions will depend on the amount of data output. The last partition may be smaller in size.
/OUTFILE Customers.dat STLF
/PARTITION partitioncount 20 roundrobin 250 kilobytes 
The target file Customers.dat is split into 20 partitions named Customers_1.dat, Customers_2.dat etc. 250 kilobytes of data is written to one partition before moving to the next one, and so on.
/OUTFILE Customers.dat STLF
/PARTITION conditional bycondition IsCountryUSA, IsCountryNotUSA unselectedrecords 
CountryUnknown noempty 
All records that satisfy the condition IsCountryUSA go into the first partition called Customers_IsCountryUSA.dat. If this condition is not true, then the condition IsCountryNotUSA is evaluated, and records that satisfy this condition are put in the second partition named Customers_IsCountryNotUSA.dat. Any remaining records go into the last partition that is called Customers_CountryUnknown.dat. Empty partitions are not created.
/OUTFILE Customers.dat STLF
/PARTITION SortFieldBased byfield partitioncount 10 

The target file Customers.dat is split into 10 partitions named Customers_1.dat, Customers_2.dat etc. where all the records with the same sort or merge field values are sent to the same partition. Connect ETL tries to distribute data such that the partitions are approximately the same size.