/INPIPE - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
ft:locale
en-US
Product name
Connect ETL
ft:title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
ft:lastEdition
2023-09-11
ft:lastPublication
2023-09-11T19:01:45.019000

Purpose

The /INPIPE option defines a pipe as a source of records.

Format

/INPIPE pipe_identifier [ALIAS alias] pipe_type [pipe_attribute …] [record_limit …] [record_layouts]
where
pipe_identifier = {pipe_name } {task_alias PREVIOUSTASK} {pipe_descriptor OPEN } {— }
pipe_type = {STREAM CRLF } {STLF } {STCR } {FIXED record_length [record_alignment] } {VARIABLE [LITTLEENDIAN] [record_alignment]} {VARIABLE DISTRIBUTABLE } {MAINFRAME FIXED record_length } {MAINFRAME FTPBINARY } {MAINFRAME VARIABLE [BLOCKED] } {MAINFRAME VARIABLE DISTRIBUTABLE } {FORTRANUNFORMATTED }
record_alignment = {ALIGNED2 } {ALIGNED4 } {UNALIGNED}
pipe_attribute = {compress_attribute} {encoding_attribute} {field_delimiters }
compress_attribute = {COMPRESSED } {UNCOMPRESSED}
encoding_attribute = [BOM] [ENCODING encoding|character_set]
field_delimiters = FIELDSEPARATOR field_separator [ENCLOSEDBY leading_character [,trailing_character]]
field_separator = {separator } {UNIXSORTDEFAULT} {NONE }
record_limit = {FDISCARDFIRST num_records} {FDISCARDAFTER num_records} {FDISCARDBLANK } {FDISCARDSHORT min_length }
record_layouts =

LAYOUT {layout_info } {(layout_info [, layout_info …])}

layout_info = layout_name [ALIAS layout_alias]

Arguments

pipe_name

The pathname of the local or remote named pipe that provides input records to the task.

On windows systems, this is specified as \\<server>\pipe\<actual_pipe_name>. For details on specifying a file path, see File Name and Syntax Requirements.

task_alias The alias or task name of a previous task, which provides input records to the current task.
pipe_descriptor The descriptor of the open pipe, which provides input records to the task.
Indicator for standard input. Tells dmexpress that the source records are piped from standard input.
alias

A name you assign to the pipe, and which you will use to refer to the pipe in other dmexpress options.

The name assigned to a pipe must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

record_length The length in bytes associated with the following /INPIPE record type clauses: FIXED, MAINFRAME FIXED.
encoding

The encoding argument can be used for encoding a text field of data with one of the following built-in character sets: ASCII, EBCDIC, Locale, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, or UTF-32LE.

Note: When Locale is specified, the encoding used is that which is defined by the system locale where dmexpress runs.

This encoding value can be specified as a built-in character set keyword without quotes or as a string as outlined in File Name and Syntax Requirements.

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

character_set

The character_set argument can be used for encoding a text field of data with one of the ICU library character sets.

This character set value must be specified as a string as outlined in File Name and Syntax Requirements.

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

separator

A single byte or multibyte character string that separates adjacent fields in delimited text records.

You can specify the character string in single-quoted or double-quoted format as outlined in Constants.

At runtime, field separators are treated as follows:

  • Character text constant field separators are treated as locale encoded.
  • Hex text constant field separators are treated as binary representations of the source or target encoding. Hex text constant field separators within the /inpipe option are encoded in the same encoding as specified in the source pipe. 
leading_character The character in the field-delimited text records that optionally precedes the data in each delimited field.
trailing_character The character in the field-delimited text records that option­ally follows the data in each delimited field.
num_records The number of records to be bypassed from the beginning of the data supplied through the pipe. These records are not released to the task.
min_length Specifies the minium length requirement for records to avoid being discarded as per the FDISCARDSHORT clause.
layout_name The name of a record layout that defines the fields of the records in the source pipe. A record layout can be defined in a /delimitedrecordlayout or /recordlayout option or defined in an external metadata file using /datadictionary.
layout_alias

A name you assign to the layout, and which you will use, in lieu of the original layout name, to refer to the fields of records originating from this source in other dmexpress options. When a layout alias is defined, you can no longer reference the fields using the original layout name directly in your task.

The name assigned to a source pipe’s layout must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

Location

The /INPIPE option can be listed anywhere in the task definition for the following task types: copy, sort, merge, and aggregates.

For a join task, the /INPIPE option must be listed as follows:
  • Left side – Before the /JOINKEYS options that defines the left side.
  • Right side - After the /JOINKEYS option that defines the left side and before the /JOINKEYS option that defines the right side.

Defaults

When no inputs are present, dmexpress takes its input from standard input.

Field Delimiters

When you provide neither a a single byte or multibyte field separator nor an enclosing character, Connect ETL treats a sequence of one or more spaces and tabs (the UNIX default) as the delimiter in field-delimited text records. The separator is treated as part of the following field. When you do not provide a field separator, but you do supply an enclosing character, Connect ETL assumes that the separator in field-delimited text records is the comma.

Compression

When no compression attribute is specified, the data is treated as UNCOMPRESSED.

Encoding

The default encoding is ASCII. For MAINFRAME FTPBINARY, the default encoding is EBCDIC.

Notes

Number of Input Pipes

For each input pipe, use a separate /INPIPE option.

Pipe Type

For information on supported pipe types, see Source and Target File Types and Records.

Record Length

You must provide record length values for the following pipe types:
  • Fixed record length (FIXED)
  • Mainframe fixed record length (MAINFRAME FIXED)

To specify a maximum record length for variable length records, use the /INMAXRECORDLENGTH option, which specifies the maximum source record length per task.

Record Alignment

If all records are aligned on 2-byte boundaries, you have to supply the ALIGNED2 keyword except for fixed format records of even length. If all records are aligned on 4-byte boundaries, you have to supply the ALIGNED4 keyword except for fixed format records whose length is a multiple of 4 bytes. The default is that records are not aligned.

Record Limit Option Arguments

The following /INPIPE option arguments can be specified for the source pipe and are applied in the following processing sequence regardless of the order in which they are listed within the option:
  1. FDISCARDSHORT - All records shorter than the specified minimum length are discarded.
  2. FDISCARDFIRST - The specified number of records are discarded starting from the first record in the supplied data. If FDISCARDSHORT is specified, short records that are rejected as per the FDISCARDSHORT specification do not count towards that number.
  3. FDISCARDBLANK - All blank records are discarded. A record is considered to be blank if either the record length is 0 or all bytes in the record are space characters.
  4. FDISCARDAFTER - Only the specified number of records are retained for further processing.

Encoding and Byte Order Mark (BOM)

Use the ENCODING option to describe the character encoding of your source pipe. The encoding will be used to interpret stream record terminators as well as any field delimiters or enclosing characters specifed. An encoding of LOCALE will use the system’s character set for this interpretation. An encoding specification is supported for the following types of data:

  • Fixed record length
  • Stream, carriage-return record terminator
  • Stream, MS-DOS(CR-LF) record terminator
  • Stream, UNIX(LF) record terminator
  • Variable record length with 2-byte prefix
  • Variable record Hadoop distributable format
  • Mainframe fixed record format
  • Mainframe ftpbinary format
  • Mainframe variable record formt
  • Mainframe variable record Hadoop distributable format
  • Mainframe variable with block descriptor record format

If a task has multiple input pipes, they must have the same encoding.

Specify the BOM keyword when the data supplied through the pipe starts with a Unicode Byte Order Mark (BOM), which specifies the endian and character encoding information for the rest of the supplied data. This mark can appear in the first few bytes and is not treated as part of the data. More information regarding Unicode encodings and the BOM can be found at http://unicode.org/faq/utf_bom.html.

If the ENCODING option is specified, it overrides the encoding detected from the BOM. If the encoding is specified as UTF-16 or UTF-32, the byte order of the encoding is determined from the BOM. If there is no BOM specified, the encoding would be treated as big-endian.

Compatible encoding types

In some cases, you may choose to specify an encoding that is different from the actual encoding of the data. For instance, ASCII will almost always provide better performance than any other encoding. In order to do so, you need to understand whether the actual encoding is compatible with the specified encoding.

An encoding is compatible with another if the first encoding can be used in place of the other without significant data loss or corruption. However, it may still corrupt data or cause data loss if used for certain ranges of characters. For example, if data is encoded in UTF-8, you may be able to use ASCII as a compatible encoding when all of your characters are expected to be in the 7-bit ASCII range (0x00 – 0x7f). However, since character representations differ between ASCII and UTF-8 beyond this range, undesirable results may occur if you have data above this range and interchange these encodings. An incompatible encoding has no overlapping character ranges and if used would definitely give undesired results. ASCII, LOCALE, and UTF-8 may all be compatible to each other; the rest of the encodings are incompatible.

Field Delimiters 

A field-delimited text record consists of consecutive fields, with adjacent fields separated by a single byte or multibyte field separator. When the field separator is the UNIX default, each blank in a sequence of blanks is part of the following field.

When the field separator is not the default you must specify it through the FIELDSEPARATOR separator argument. The specified separator is assumed to be in locale and is converted to the encoding of the source pipe.

Each separator is considered to separate two adjacent fields; therefore, consecutive separators indicate an empty field (one whose length is zero). A non-default separator is not considered part of the following field.

When one or more blank characters (spaces and tabs) separate fields in a text record, and the blanks are part of the data, specify FIELDSEPARATOR UNIXSORTDEFAULT. For this type of separator (the UNIX default), each blank character in a sequence of blanks is taken as part of the following field. It is not possible to have an empty field with this type of separator.

When the data supplied through the pipe does not contain field-delimited text records, indicate this by specifying FIELDSEPARATOR NONE. This is necessary when there are two or more sources in the task and there is a mixture of delimited records and fixed-position records. If you do not indicate that a fixed-position source in such an application is not delimited, Connect ETL applies the default and assumes the source has delimited records with the default separator. With FIELDSEPARATOR NONE specified for a fixed-position source, when you reference a delimited field in a record from the source, the field either contains the complete record (the first field) or is empty (the second and subsequent fields).

The combination of the UNIX default separator and enclosing characters is not supported.

Each field in a field-delimited text record may be enclosed by two enclosing characters. When fields are enclosed, specify the ENCLOSEDBY clause. Any leading and trailing enclosing characters are not part of the field. Two consecutive trailing enclosing characters within the field are treated as a single enclosing character. The leading enclosing character cannot be a blank (tab or space). The leading enclosing character and the trailing enclosing character cannot be a prefix of the field separator. When the leading and the trailing characters are the same, you do not need to specify the trailing character.

Connect ETL ignores any characters between an enclosing character terminating a field and the following field separator or the end of the record. When the first nonblank character following a field separator (or the first nonblank character in the record for the first field) is not a leading enclosing character, Connect ETL assumes that the field is not enclosed.

For sources and targets with multi-byte Locale encodings, the byte sequence of the converted field separator must not occur as part of another multi-byte character. For example, for Japanese locales, the set of valid separator characters are:

  • Japanese Shift-JIS: 0x00 – 0x3F
  • Japanese EUC: 0x00 – 0x7F

Record Layout

Use the LAYOUT clause to designate the record layout that defines the fields of the records supplied through the pipe.

Fields in a record layout cannot be referenced until the record layout is associated with the input pipe.

The specification of multiple record layout names are supported with the /INPIPE option; however, one and only one fixed position record layout must be specified for an input pipe of type MAINFRAME FTPBINARY.

Pipe Alias

When you wish to include or omit records based on which source they originate from, you must give aliases to some or all of the sources. For an /INPIPE option, this alias is simply a shorthand way of referring to the input pipe. You then code the alias as part of a comparison in a /CONDITION definition.

Compressed Pipes

dmexpress accepts bzip2 and gzip compressed source data. The bzip2 format is defined at http://www.bzip.org; the gzip format is described in RFC 1952.

Compatibility with Other dmexpress Options

Per source record limiting criteria, which is specified through the FDISCARDSHORT, FDISCARDFIRST, FDISCARDBLANK, FDISCARDAFTER option arguments in the /INPIPE option, cannot be specified in the same task definition that specifies general bulk filtering for source records through the DISCARDSHORT, DISCARDFIRST, DISCARDBLANK, DISCARDAFTER option arguments in the /FILTER option.

Examples

/inpipe sort1_pipe stlf

The option specifies that the input records coming from the pipe sort1_pipe are UNIX text (LF-terminated) records.

/inpipe trans_data alias transactions variable layout trans 

This option assigns an alias, transactions, to the input pipe trans_data. The pipe contains variable length records. Fields in the records are described by record layout trans.

/inpipe sorted_delim_data stlf fieldseparator "," enclosedby """"

The records supplied through the sorted_delim_data pipe are UNIX text (LF record terminator) records with enclosed delimited fields. The comma (,) is used as the field separator and the double quote (") is used as the leading and trailing enclosing character.