Purpose
The /INPIPE option defines a pipe as a source of records.Format
/INPIPE | pipe_identifier [ALIAS alias] pipe_type [pipe_attribute …] [record_limit …] [record_layouts] |
pipe_identifier | = | {pipe_name } {task_alias PREVIOUSTASK} {pipe_descriptor OPEN } {— } |
pipe_type | = | {STREAM CRLF } {STLF } {STCR } {FIXED record_length [record_alignment] } {VARIABLE [LITTLEENDIAN] [record_alignment]} {VARIABLE DISTRIBUTABLE } {MAINFRAME FIXED record_length } {MAINFRAME FTPBINARY } {MAINFRAME VARIABLE [BLOCKED] } {MAINFRAME VARIABLE DISTRIBUTABLE } {FORTRANUNFORMATTED } |
record_alignment | = | {ALIGNED2 } {ALIGNED4 } {UNALIGNED} |
pipe_attribute | = | {compress_attribute} {encoding_attribute} {field_delimiters } |
compress_attribute | = | {COMPRESSED } {UNCOMPRESSED} |
encoding_attribute | = | [BOM] [ENCODING encoding|character_set] |
field_delimiters | = | FIELDSEPARATOR field_separator [ENCLOSEDBY leading_character [,trailing_character]] |
field_separator | = | {separator } {UNIXSORTDEFAULT} {NONE } |
record_limit | = | {FDISCARDFIRST num_records} {FDISCARDAFTER num_records} {FDISCARDBLANK } {FDISCARDSHORT min_length } |
record_layouts | = |
LAYOUT {layout_info } {(layout_info [, layout_info …])} |
layout_info | = | layout_name [ALIAS layout_alias] |
Arguments
pipe_name |
The pathname of the local or remote named pipe that provides input records to the task. On windows systems, this is specified as \\<server>\pipe\<actual_pipe_name>. For details on specifying a file path, see File Name and Syntax Requirements. |
task_alias | The alias or task name of a previous task, which provides input records to the current task. |
pipe_descriptor | The descriptor of the open pipe, which provides input records to the task. |
– | Indicator for standard input. Tells dmexpress that the source records are piped from standard input. |
alias |
A name you assign to the pipe, and which you will use to refer to the pipe in other dmexpress options. The name assigned to a pipe must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
record_length | The length in bytes associated with the following /INPIPE record type clauses: FIXED, MAINFRAME FIXED. |
encoding |
The encoding argument can be used for encoding a text field of data with one of the following built-in character sets: ASCII, EBCDIC, Locale, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, or UTF-32LE. Note: When Locale is specified, the encoding used is that which is defined by the system locale where dmexpress runs. This encoding value can be specified as a built-in character set keyword without quotes or as a string as outlined in File Name and Syntax Requirements. For additional information, see the following Connect help topics: "Data encoding" and "Text data type." |
character_set |
The character_set argument can be used for encoding a text field of data with one of the ICU library character sets. This character set value must be specified as a string as outlined in File Name and Syntax Requirements. For additional information, see the following Connect help topics: "Data encoding" and "Text data type." |
separator |
A single byte or multibyte character string that separates adjacent fields in delimited text records. You can specify the character string in single-quoted or double-quoted format as outlined in Constants. At runtime, field separators are treated as follows:
|
leading_character | The character in the field-delimited text records that optionally precedes the data in each delimited field. |
trailing_character | The character in the field-delimited text records that optionally follows the data in each delimited field. |
num_records | The number of records to be bypassed from the beginning of the data supplied through the pipe. These records are not released to the task. |
min_length | Specifies the minium length requirement for records to avoid being discarded as per the FDISCARDSHORT clause. |
layout_name | The name of a record layout that defines the fields of the records in the source pipe. A record layout can be defined in a /delimitedrecordlayout or /recordlayout option or defined in an external metadata file using /datadictionary. |
layout_alias |
A name you assign to the layout, and which you will use, in lieu of the original layout name, to refer to the fields of records originating from this source in other dmexpress options. When a layout alias is defined, you can no longer reference the fields using the original layout name directly in your task. The name assigned to a source pipe’s layout must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
Location
The /INPIPE option can be listed anywhere in the task definition for the following task types: copy, sort, merge, and aggregates.
For a join task, the /INPIPE option must be listed as follows:- Left side – Before the /JOINKEYS options that defines the left side.
- Right side - After the /JOINKEYS option that defines the left side and before the /JOINKEYS option that defines the right side.
Defaults
When no inputs are present, dmexpress takes its input from standard input.
Field Delimiters
When you provide neither a a single byte or multibyte field separator nor an enclosing character, Connect ETL treats a sequence of one or more spaces and tabs (the UNIX default) as the delimiter in field-delimited text records. The separator is treated as part of the following field. When you do not provide a field separator, but you do supply an enclosing character, Connect ETL assumes that the separator in field-delimited text records is the comma.
Compression
When no compression attribute is specified, the data is treated as UNCOMPRESSED.
Encoding
The default encoding is ASCII. For MAINFRAME FTPBINARY, the default encoding is EBCDIC.
Notes
Number of Input Pipes
For each input pipe, use a separate /INPIPE option.
Pipe Type
For information on supported pipe types, see Source and Target File Types and Records.
Record Length
- Fixed record length (FIXED)
- Mainframe fixed record length (MAINFRAME FIXED)
To specify a maximum record length for variable length records, use the /INMAXRECORDLENGTH option, which specifies the maximum source record length per task.
Record Alignment
If all records are aligned on 2-byte boundaries, you have to supply the ALIGNED2 keyword except for fixed format records of even length. If all records are aligned on 4-byte boundaries, you have to supply the ALIGNED4 keyword except for fixed format records whose length is a multiple of 4 bytes. The default is that records are not aligned.
Record Limit Option Arguments
- FDISCARDSHORT - All records shorter than the specified minimum length are discarded.
- FDISCARDFIRST - The specified number of records are discarded starting from the first record in the supplied data. If FDISCARDSHORT is specified, short records that are rejected as per the FDISCARDSHORT specification do not count towards that number.
- FDISCARDBLANK - All blank records are discarded. A record is considered to be blank if either the record length is 0 or all bytes in the record are space characters.
- FDISCARDAFTER - Only the specified number of records are retained for further processing.
Encoding and Byte Order Mark (BOM)
Use the ENCODING option to describe the character encoding of your source pipe. The encoding will be used to interpret stream record terminators as well as any field delimiters or enclosing characters specifed. An encoding of LOCALE will use the system’s character set for this interpretation. An encoding specification is supported for the following types of data:
- Fixed record length
- Stream, carriage-return record terminator
- Stream, MS-DOS(CR-LF) record terminator
- Stream, UNIX(LF) record terminator
- Variable record length with 2-byte prefix
- Variable record Hadoop distributable format
- Mainframe fixed record format
- Mainframe ftpbinary format
- Mainframe variable record formt
- Mainframe variable record Hadoop distributable format
- Mainframe variable with block descriptor record format
If a task has multiple input pipes, they must have the same encoding.
Specify the BOM keyword when the data supplied through the pipe starts with a Unicode Byte Order Mark (BOM), which specifies the endian and character encoding information for the rest of the supplied data. This mark can appear in the first few bytes and is not treated as part of the data. More information regarding Unicode encodings and the BOM can be found at http://unicode.org/faq/utf_bom.html.
If the ENCODING option is specified, it overrides the encoding detected from the BOM. If the encoding is specified as UTF-16 or UTF-32, the byte order of the encoding is determined from the BOM. If there is no BOM specified, the encoding would be treated as big-endian.
Compatible encoding types
In some cases, you may choose to specify an encoding that is different from the actual encoding of the data. For instance, ASCII will almost always provide better performance than any other encoding. In order to do so, you need to understand whether the actual encoding is compatible with the specified encoding.
An encoding is compatible with another if the first encoding can be used in place of the other without significant data loss or corruption. However, it may still corrupt data or cause data loss if used for certain ranges of characters. For example, if data is encoded in UTF-8, you may be able to use ASCII as a compatible encoding when all of your characters are expected to be in the 7-bit ASCII range (0x00 – 0x7f). However, since character representations differ between ASCII and UTF-8 beyond this range, undesirable results may occur if you have data above this range and interchange these encodings. An incompatible encoding has no overlapping character ranges and if used would definitely give undesired results. ASCII, LOCALE, and UTF-8 may all be compatible to each other; the rest of the encodings are incompatible.
Field Delimiters
A field-delimited text record consists of consecutive fields, with adjacent fields separated by a single byte or multibyte field separator. When the field separator is the UNIX default, each blank in a sequence of blanks is part of the following field.
When the field separator is not the default you must specify it through the FIELDSEPARATOR separator argument. The specified separator is assumed to be in locale and is converted to the encoding of the source pipe.
Each separator is considered to separate two adjacent fields; therefore, consecutive separators indicate an empty field (one whose length is zero). A non-default separator is not considered part of the following field.
When one or more blank characters (spaces and tabs) separate fields in a text record, and the blanks are part of the data, specify FIELDSEPARATOR UNIXSORTDEFAULT. For this type of separator (the UNIX default), each blank character in a sequence of blanks is taken as part of the following field. It is not possible to have an empty field with this type of separator.
When the data supplied through the pipe does not contain field-delimited text records, indicate this by specifying FIELDSEPARATOR NONE. This is necessary when there are two or more sources in the task and there is a mixture of delimited records and fixed-position records. If you do not indicate that a fixed-position source in such an application is not delimited, Connect ETL applies the default and assumes the source has delimited records with the default separator. With FIELDSEPARATOR NONE specified for a fixed-position source, when you reference a delimited field in a record from the source, the field either contains the complete record (the first field) or is empty (the second and subsequent fields).
The combination of the UNIX default separator and enclosing characters is not supported.
Each field in a field-delimited text record may be enclosed by two enclosing characters. When fields are enclosed, specify the ENCLOSEDBY clause. Any leading and trailing enclosing characters are not part of the field. Two consecutive trailing enclosing characters within the field are treated as a single enclosing character. The leading enclosing character cannot be a blank (tab or space). The leading enclosing character and the trailing enclosing character cannot be a prefix of the field separator. When the leading and the trailing characters are the same, you do not need to specify the trailing character.
Connect ETL ignores any characters between an enclosing character terminating a field and the following field separator or the end of the record. When the first nonblank character following a field separator (or the first nonblank character in the record for the first field) is not a leading enclosing character, Connect ETL assumes that the field is not enclosed.
For sources and targets with multi-byte Locale encodings, the byte sequence of the converted field separator must not occur as part of another multi-byte character. For example, for Japanese locales, the set of valid separator characters are:
- Japanese Shift-JIS: 0x00 – 0x3F
- Japanese EUC: 0x00 – 0x7F
Record Layout
Use the LAYOUT clause to designate the record layout that defines the fields of the records supplied through the pipe.
Fields in a record layout cannot be referenced until the record layout is associated with the input pipe.
The specification of multiple record layout names are supported with the /INPIPE option; however, one and only one fixed position record layout must be specified for an input pipe of type MAINFRAME FTPBINARY.
Pipe Alias
When you wish to include or omit records based on which source they originate from, you must give aliases to some or all of the sources. For an /INPIPE option, this alias is simply a shorthand way of referring to the input pipe. You then code the alias as part of a comparison in a /CONDITION definition.
Compressed Pipes
dmexpress accepts bzip2 and gzip compressed source data. The bzip2 format is defined at http://www.bzip.org; the gzip format is described in RFC 1952.
Compatibility with Other dmexpress Options
Per source record limiting criteria, which is specified through the FDISCARDSHORT, FDISCARDFIRST, FDISCARDBLANK, FDISCARDAFTER option arguments in the /INPIPE option, cannot be specified in the same task definition that specifies general bulk filtering for source records through the DISCARDSHORT, DISCARDFIRST, DISCARDBLANK, DISCARDAFTER option arguments in the /FILTER option.
Examples
/inpipe sort1_pipe stlf
The option specifies that the input records coming from the pipe sort1_pipe are UNIX text (LF-terminated) records.
/inpipe trans_data alias transactions variable layout trans
This option assigns an alias, transactions, to the input pipe trans_data. The pipe contains variable length records. Fields in the records are described by record layout trans.
/inpipe sorted_delim_data stlf fieldseparator "," enclosedby """"
The records supplied through the sorted_delim_data pipe are UNIX text (LF record terminator) records with enclosed delimited fields. The comma (,) is used as the field separator and the double quote (") is used as the leading and trailing enclosing character.