/COLLATINGSEQUENCE - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To set a default collating sequence for the task or define a custom collating sequence.

Format

/COLLATINGSEQUENCE

{default}

{new_seq}

where

default =

{standard_seq}

{name }

new_seq = base_seq [seq_mod...] [invalid_chars]
standard_seq = {LOCALE } {ASCII } {FASCII} {EBCDIC } {FEBCDIC} {MULTINATIONAL }
base_seq = {standard_seq } {next_spec[, next_spec ...]}
seq_mod = {MODIFICATION pos_changes} {NULLCHAR null_chars }
next_spec = {single_char} {double_char} {char_range }
single_char = {"a" } {X"xx" }
double_char = {"aa" } {X"xxxx" }
char_range = "b" - "e"
pos_changes = pos_change[,pos_change ...]
pos_change = moved_char {=} {>} comp_pos {<}
moved_char = {single_char} {double_char}
comp_pos = {single_char} {double_char}
null_chars = {next_spec[,next_spec ...]} {NONPRINTABLE } {NONDICTIONARY }
invalid_chars = {HIGH} {LOW }

Arguments

name

The name of the collating sequence.

The name assigned to the sequence must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

a The character representation of a single character.
xx The hexadecimal representation of a single character.
aa The character representation of a double character, i.e. a ’character’ which takes up two bytes of storage, but which is to be collated as if it occupies only a single byte, e.g. "CH".
xxxx The hexadecimal representation of a double character. The first two hexadecimal digits represent the first half of the double character while the last two hexadecimal digits represent the second half of the double character, e.g. x"4348".
b The first character in a range (double character not allowed).
e The last character in a range (double character not allowed).

Location

The option may appear anywhere in the task definition.

Defaults

If a default collating sequence is not explicitly specified in the task, the ASCII standard sequence is assumed to be the default.

Notes

Defining a Default Collating Sequence

You can define a default collating sequence for the task. The default collating sequence takes effect for character field comparisons for which you do not assign an explicit collating sequence.

To define a standard collating sequence as the default, specify DEFAULT followed by a standard sequence name.

To define a custom collating sequence as the default, first define the custom collating sequence and then assign this new collating sequence as the default.

Defining a Custom Collating Sequence

You can define a custom collating sequence when none of the standard sequences provide the collating result that you require during character value comparisons.

When defining a custom collating sequence, first define a base sequence then apply modifications that can include character repositioning, null character definition, and case folding.

Base Sequence Definition

To specify the base sequence for your custom sequence, you can specify one of the standard sequences (i.e. LOCALE, ASCII, EBCDIC, MULTINATIONAL) or you can define your own. To define your own base sequence, simply specify the characters (single, double, range) in the order in which you want them to collate, low to high. For example,
/COLLATINGSEQUENCE special_seq1 "a" - "l", "ll", "m" - "z", "A" - "L", "LL", "M" -"Z", "0" - "9"

The collating sequence special_seq1 defined above includes no special or non-printable characters nor does it have any modifications to the base sequence. The characters excluded from it are regarded as invalid if any are encountered when collating fields according to this sequence. You can specify, through the invalid_chars argument of /COLLATINGSEQUENCE, where you want invalid characters to collate. The choices are for the invalid characters to appear either below (LOW) or above (HIGH) the characters that are specified in the collating sequence definition. If you leave the invalid_chars argument unspecified, Connect ETL handles invalid characters as null characters (for the definition of a null character, see below).

Collating values in a field according to the alternative sequence of the above example results in the following (assuming ascending ordering and padding with spaces):
a*****
affect
almond
allure
amends
****de
ladder
lambda
llamas
Affect

Sequence Modifications

There are three different types of changes which you can make to your base sequence. First, characters can be repositioned within the collating sequence. Second, characters can be ignored during the collating process by defining them as null. Third, the distinction between the uppercase and lowercase characters may be removed.

Character Repositioning

Character repositioning is handled through the clause MODIFICATION pos_changes in the /COLLATINGSEQUENCE option. This is the only way to make two characters (other than uppercase and lowercase equivalents) collate as if they have the same value. The "=" operator is used to define a pair of characters as equivalent. You can also insert a character between two adjacent positions in your base sequence by using the ">" or "<" operators. The following /COLLATINGSEQUENCE statement presents a case of character repositioning:
/COLLATINGSEQUENCE special_seq2 ascii modification "ll" = "l", "LL" = "L", "0" > "9"
special_seq2 modifies the standard ASCII collating sequence. Its effect consists in allowing the double "ll" character to collate equal to the singleton "l" and in repositioning the "0" character so that it collates after "9" but before ":". Here is an example of collating values according to the alternative sequence special_seq2 (assuming ascending ordering and padding with spaces):
Affect
abend1
abend9
abend0
llamas
lamase
lambda
There are a few rules you must obey when specifying reposition modifications to a base sequence:
  1. The character used as the comparison position (on the right of the "=", ">" or "<" operator) must already be assigned a place in the collating sequence either as part of a standard base sequence or by a prior modification to the sequence. When the comparison character is a double character, the requirement is satisfied if the double character has been assigned a place in the sequence or the two characters that comprise the double character have both been assigned places separately.
  2. A character can be repositioned only once.

Null Character Definition

The second type of modification which you can apply to a base sequence is to specify "null" characters. These are established through the clause NULLCHAR null_chars in the /COLLATINGSEQUENCE option. A character defined as null is ignored when ordering values; null characters have the effect of reducing the number of significant characters in a field.

You either specify a predefined set of characters as null (i.e. NONPRINTABLE, NONDICTIONARY) or you specify your own list of null characters. The specification of a list of null characters is similar to the definition of a base sequence. For example,
/COLLATINGSEQUENCE special_seq3 ascii nullchar "*", "0" - "9"
modifies the standard ASCII collating sequence by nullifying the asterisk and all digits. Collating values according to the alternative sequence special_seq3 results in the following (assuming ascending ordering and padding with spaces):
****7
ol0de
older
olive
ol5ve
****p

The NONDICTIONARY set of characters includes all those which are not alphabetic, digits, spaces, or tab characters. For all base sequences other than LOCALE, the NONDICTIONARY characters are those for 8-bit ASCII. See Reference/Collating sequences reference in the Connect help. For the LOCALE base sequence, the NONDICTIONARY characters are those that are not defined as alphanumeric in the current locale.

NONDICTIONARY and NONPRINTABLE cannot be used with multi-byte character locales.

Case Folding

The third method of modifying a base sequence is to use the FOLDED argument to fold all lowercase characters into their uppercase equivalents. Each lowercase character then collates equal to its uppercase counterpart. FOLDED cannot be used with multi-byte character locales.

Options that Reference Collating Sequence

The collating sequence name defined within the /COLLATINGSEQUENCE option can be referenced within and subsequently used in making comparisons by the /SUMMARY option.

Examples

/COLLATINGSEQUENCE DEFAULT LOCALE
The above option specifies that the collating sequence defined by the system locale be used as the default collating sequence for the task.
/COLLATINGSEQUENCE seq1 "0" - "9", "A" - "Z", "a" - "z" 
modification "*" = "0" nullchar "(" , ")" low
seq1 defines a collating sequence that orders the numeric characters (0 - 9) before the alphabetic characters. Among the alphabetic characters, the uppercase characters (A - Z) are ordered before their lowercase counterparts (a - z). The asterisk (*) has the same collating value as zero (0). The opening and closing parentheses are defined as null and is therefore ignored in the collating process. The invalid characters, i.e. those characters that are excluded from the definition of seq1, is ordered low in the collating sequence.
/COLLATINGSEQUENCE seq2 ascii modification "SS" = "B" folded
seq2 modifies the standard ASCII collating sequence in two ways. Firstly, the uppercase double s (SS) is given the same collating value as the uppercase b (B). Secondly, all lowercase characters are folded into their uppercase equivalents.
/COLLATINGSEQUENCE seq3 ascii nullchar nondictionary

seq3 defines a collating sequence that excludes the set of NONDICTIONARY characters from the ASCII sequence.