Purpose
To set a default collating sequence for the task or define a custom collating sequence.
Format
/COLLATINGSEQUENCE |
{default} {new_seq} |
where
default | = |
{standard_seq} {name } |
new_seq | = | base_seq [seq_mod...] [invalid_chars] |
standard_seq | = | {LOCALE } {ASCII } {FASCII} {EBCDIC } {FEBCDIC} {MULTINATIONAL } |
base_seq | = | {standard_seq } {next_spec[, next_spec ...]} |
seq_mod | = | {MODIFICATION pos_changes} {NULLCHAR null_chars } |
next_spec | = | {single_char} {double_char} {char_range } |
single_char | = | {"a" } {X"xx" } |
double_char | = | {"aa" } {X"xxxx" } |
char_range | = | "b" - "e" |
pos_changes | = | pos_change[,pos_change ...] |
pos_change | = | moved_char {=} {>} comp_pos {<} |
moved_char | = | {single_char} {double_char} |
comp_pos | = | {single_char} {double_char} |
null_chars | = | {next_spec[,next_spec ...]} {NONPRINTABLE } {NONDICTIONARY } |
invalid_chars | = | {HIGH} {LOW } |
Arguments
name |
The name of the collating sequence. The name assigned to the sequence must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
a | The character representation of a single character. |
xx | The hexadecimal representation of a single character. |
aa | The character representation of a double character, i.e. a ’character’ which takes up two bytes of storage, but which is to be collated as if it occupies only a single byte, e.g. "CH". |
xxxx | The hexadecimal representation of a double character. The first two hexadecimal digits represent the first half of the double character while the last two hexadecimal digits represent the second half of the double character, e.g. x"4348". |
b | The first character in a range (double character not allowed). |
e | The last character in a range (double character not allowed). |
Location
The option may appear anywhere in the task definition.
Defaults
If a default collating sequence is not explicitly specified in the task, the ASCII standard sequence is assumed to be the default.
Notes
Defining a Default Collating Sequence
You can define a default collating sequence for the task. The default collating sequence takes effect for character field comparisons for which you do not assign an explicit collating sequence.
To define a standard collating sequence as the default, specify DEFAULT followed by a standard sequence name.
To define a custom collating sequence as the default, first define the custom collating sequence and then assign this new collating sequence as the default.
Defining a Custom Collating Sequence
You can define a custom collating sequence when none of the standard sequences provide the collating result that you require during character value comparisons.
When defining a custom collating sequence, first define a base sequence then apply modifications that can include character repositioning, null character definition, and case folding.
Base Sequence Definition
/COLLATINGSEQUENCE special_seq1 "a" - "l", "ll", "m" - "z", "A" - "L", "LL", "M" -"Z", "0" - "9"
The collating sequence special_seq1 defined above includes no special or non-printable characters nor does it have any modifications to the base sequence. The characters excluded from it are regarded as invalid if any are encountered when collating fields according to this sequence. You can specify, through the invalid_chars argument of /COLLATINGSEQUENCE, where you want invalid characters to collate. The choices are for the invalid characters to appear either below (LOW) or above (HIGH) the characters that are specified in the collating sequence definition. If you leave the invalid_chars argument unspecified, Connect ETL handles invalid characters as null characters (for the definition of a null character, see below).
a*****
affect
almond
allure
amends
****de
ladder
lambda
llamas
Affect
Sequence Modifications
There are three different types of changes which you can make to your base sequence. First, characters can be repositioned within the collating sequence. Second, characters can be ignored during the collating process by defining them as null. Third, the distinction between the uppercase and lowercase characters may be removed.
Character Repositioning
/COLLATINGSEQUENCE special_seq2 ascii modification "ll" = "l", "LL" = "L", "0" > "9"
Affect
abend1
abend9
abend0
llamas
lamase
lambda
- The character used as the comparison position (on the right of the "=", ">" or "<" operator) must already be assigned a place in the collating sequence either as part of a standard base sequence or by a prior modification to the sequence. When the comparison character is a double character, the requirement is satisfied if the double character has been assigned a place in the sequence or the two characters that comprise the double character have both been assigned places separately.
- A character can be repositioned only once.
Null Character Definition
The second type of modification which you can apply to a base sequence is to specify "null" characters. These are established through the clause NULLCHAR null_chars in the /COLLATINGSEQUENCE option. A character defined as null is ignored when ordering values; null characters have the effect of reducing the number of significant characters in a field.
/COLLATINGSEQUENCE special_seq3 ascii nullchar "*", "0" - "9"
****7
ol0de
older
olive
ol5ve
****p
The NONDICTIONARY set of characters includes all those which are not alphabetic, digits, spaces, or tab characters. For all base sequences other than LOCALE, the NONDICTIONARY characters are those for 8-bit ASCII. See Reference/Collating sequences reference in the Connect help. For the LOCALE base sequence, the NONDICTIONARY characters are those that are not defined as alphanumeric in the current locale.
NONDICTIONARY and NONPRINTABLE cannot be used with multi-byte character locales.
Case Folding
The third method of modifying a base sequence is to use the FOLDED argument to fold all lowercase characters into their uppercase equivalents. Each lowercase character then collates equal to its uppercase counterpart. FOLDED cannot be used with multi-byte character locales.
Options that Reference Collating Sequence
The collating sequence name defined within the /COLLATINGSEQUENCE option can be referenced within and subsequently used in making comparisons by the /SUMMARY option.
Examples
/COLLATINGSEQUENCE DEFAULT LOCALE
/COLLATINGSEQUENCE seq1 "0" - "9", "A" - "Z", "a" - "z"
modification "*" = "0" nullchar "(" , ")" low
/COLLATINGSEQUENCE seq2 ascii modification "SS" = "B" folded
/COLLATINGSEQUENCE seq3 ascii nullchar nondictionary
seq3 defines a collating sequence that excludes the set of NONDICTIONARY characters from the ASCII sequence.