/INFILE, /FILE - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

The /INFILE option defines a file as a source of records.

The /FILE option defines an auxiliary data file such as a lookup source file.

Format

/INFILE | /FILE

file_identifier [EXPANDWILDCARDS] [SERVERCONNECTION connection]

[ALIAS alias] file_type [file_attribute …] [record_limit …] [record_layouts]

where
file_identifier = {file_name } {mainframe_dataset} {task_alias PREVIOUSTASK} {file_descriptor OPEN }
file_type = {STREAM CRLF } {STLF } {STCR } {XML } {FIXED record_length [record_alignment] } {VARIABLE [LITTLEENDIAN] [record_alignment]} {VARIABLE DISTRIBUTABLE } {MFFIXED record_length } {MFVARIABLE } {MFLINESEQUENTIAL } {INDEXED } {MAINFRAME FIXED record_length } {MAINFRAME FTPBINARY } {MAINFRAME VARIABLE [BLOCKED] } {MAINFRAME VARIABLE DISTRIBUTABLE } {MAINFRAME VSAM } {FORTRANUNFORMATTED }
record_alignment = {ALIGNED2 } {ALIGNED4 } {UNALIGNED}
file_attribute = {compress_attribute} {encoding_attribute} {header } {field_delimiters }
compress_attribute = {COMPRESSED } {UNCOMPRESSED}
encoding_attribute = [BOM] [ENCODING encoding|character_set]
header =

[HEADER SIZE number_of_records [NAMESRECORD names_record [SKIPFIELDS fields_to_skip]

LAYOUTNAME header_layout]]

field_delimiters = FIELDSEPARATOR field_separator [ENCLOSEDBY leading_character [,trailing_character]]
field_separator = {separator } {UNIXSORTDEFAULT} {NONE }
record_limit = {FDISCARDFIRST num_records} {FDISCARDAFTER num_records} {FDISCARDBLANK } {FDISCARDSHORT min_length }
record_layouts =

LAYOUT {layout_info } {(layout_info [, layout_info…])}

layout_info = layout_name [EXTERNAL] [ALIAS layout_alias]

Arguments

file_name

The pathname of the file which provides input records for the task.

For details on specifying a file path, see File Name and Syntax Requirements .

mainframe_dataset

The full pathname to the location of the mainframe dataset or file residing on the HFS (UNIX) partition.

The path name base directory must include the /MVS virtual directory or slash (/), which indicates that the file is located on the MVS or z/OS partition, or the /HFS virtual directory, which indicates that the file is located on the HFS partition:

  • For files located on the MVS system partition, which is also known as the z/OS partition, the path name must include either the virtual directory /MVS or slash (/) as the base directory. The high-level qualifier (HLQ), which identifies the user's catalog or folder, must follow the virtual base directory. In the path, references to datasets in libraries must include a slash (/) to separate the library name from the dataset name.
  • For files located on the HFS system partition, the path name must include the virtual directory /HFS as the base directory.

For details on specifying a file path, see File Name and Syntax Requirements .

task_alias The alias or task name of a previous task, which provides input records to the current task.
file_descriptor The descriptor of the open file, which provides input records to the task.
connection A string or identifier that specifies the connection or its alias used to connect to the system where the file is located. The connection needs to be specified using a /serverconnection option.
alias

A name you assign to the file, and which you will use to refer to the file in other dmexpress options.

The name assigned to a file must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

record_length The length in bytes associated with the following /INFILE record type clauses: FIXED, MFFIXED, MAINFRAME FIXED.
encoding

The encoding argument can be used for encoding a text field of data with one of the following built-in character sets: ASCII, EBCDIC, Locale, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, or UTF-32LE.

Note: When Locale is specified, the encoding used is that which is defined by the system locale where dmexpress runs.

This encoding value can be specified as a built-in character set keyword without quotes or as a string as outlined in File Name and Syntax Requirements .

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

character_set

The character_set argument can be used for encoding a text field of data with one of the ICU library character sets.

This character set value must be specified as a string as outlined in File Name and Syntax Requirements .

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

number_of_records The number of records that constitute the header.
names_record The record number of the header record that contains the field names. This record defines the data layout of the header.
fields_to_skip The number of fields at the beginning of the header’s names record that do not contain field names.
header_layout

The name of the header layout that is defined by the header’s names record. The name can be used to reference the header layout from other dmexpress options.

The name must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help

separator

A single byte or multibyte character string that separates adjacent fields in delimited text records.

You can specify the character string in single-quoted or double-quoted format as outlined in Constants

At runtime, field separators are treated as follows:

  • Character text constant field separators are treated as locale encoded.
  • Hex text constant field separators are treated as binary representations of the source or target encoding. Hex text constant field separators within the /infile option are encoded in the same encoding as specified in the source file.
leading_character The character in the field-delimited text records that optionally precedes the data in each delimited field.
trailing_character The character in the field-delimited text records that optionally follows the data in each delimited field.
num_records The number of records to be bypassed from the beginning of the file. These records are not released to the task.
min_length Specifies the minium length requirement for records to avoid being discarded as per the FDISCARDSHORT clause.
layout_name The name of a record layout that defines the fields of the records in the source file. A record layout can be defined in a /DELIMITEDRECORDLAYOUT or /RECORDLAYOUT option or defined in an external metadata file using /DATADICTIONARY.
layout_alias

A name you assign to the layout, and which you will use, in lieu of the original layout name, to refer to the fields of records originating from this source in other dmexpress options. When a layout alias is defined, you can no longer reference the fields using the original layout name directly in your task.

The name assigned to a source file’s layout must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

Location

The /INFILE option can be listed anywhere in the task definition for the following task types: copy, sort, merge, and aggregates.

For a join task, the /INFILE option must be listed as follows:
  • Left side – Before the /JOINKEYS options that defines the left side.
  • Right side - After the /JOINKEYS option that defines the left side and before the /JOINKEYS option that defines the right side.
The /FILE option can be listed anywhere in the task definition for all tasks.

Defaults

When no inputs are present, dmexpress takes its input from standard input, which can be specified through the /INPIPE option.

Field Delimiters

When you provide neither a single byte or multibyte field separator nor an enclosing character, Connect ETL treats a sequence of one or more spaces and tabs (the UNIX default) as the delimiter in field-delimited text records. The separator is treated as part of the following field.

When you do not provide a field separator, but you do supply an enclosing character, Connect ETL assumes that the separator in field-delimited text records is the comma.

Compression

When no compression attribute is specified, the file is treated as UNCOMPRESSED.

Encoding

The default encoding for non-XML files except MAINFRAME FTPBINARY is ASCII. For XML files, the default encoding is the one specified in the encoding declaration in the file, when present, and UTF-8 otherwise. For MAINFRAME FTPBINARY, the default encoding is EBCDIC.

Wildcard Expansion

When EXPANDWILDCARDS is not specified, wildcards will not be expanded in the file name.

Notes

The file defined through the /FILE option does not provide source records directly to the task. Instead, it may be used in other source options as an auxiliary input to the record extraction process. For instance, a file defined through the /FILE option can be used as the look-in file in a Lookup function.

Number of Input Files

For each input file, use a separate /INFILE option.

An aggregate task that has an XML source file can not have any other source.

File Type

For information on supported file types, see Source and Target File Types and Records.

Record Length

You must provide record length values for the following file types:
  • Fixed record length (FIXED)
  • Micro Focus sequential with fixed record length (MFFIXED)
  • Mainframe fixed record length (MAINFRAME FIXED)

    To specify a maximum record length for variable length records, use the /INMAXRECORDLENGTH option, which specifies the maximum source record length per task

Record Alignment

If all records are aligned on 2-byte boundaries, you have to supply the ALIGNED2 keyword except for fixed format records of even length. If all records are aligned on 4-byte boundaries, you have to supply the ALIGNED4 keyword except for fixed format records whose length is a multiple of 4 bytes. The default is that records are not aligned.

Record Limit Option Arguments

The following /INFILE option arguments can be specified for the source file and are applied in the following processing sequence regardless of the order in which they are listed within the option:
  1. FDISCARDSHORT - All records shorter than the specified minimum length are discarded.
  2. FDISCARDFIRST - The specified number of records are discarded from the files specified in this /INFILE option, starting from the first record in the first input file.

    If the first file does not have enough records, records are discarded from the second file and from subsequent files until the specified number of records are discarded.

  3. FDISCARDBLANK - All blank records are discarded. A record is considered to be blank if either the record length is 0 or all bytes in the record are space characters.
  4. FDISCARDAFTER - Only the specified number of records are retained for further processing.

Encoding and Byte Order Mark (BOM)

Use the ENCODING option to describe the character encoding of your source file. The encoding will be used to interpret stream record terminators as well as any field delimiters or enclosing characters specifed. An encoding of LOCALE will use the system’s character set for this interpretation. An encoding specification is supported for the following file types:
  • Fixed record length
  • Stream, carriage-return record terminator
  • Stream, MS-DOS(CR-LF) record terminator
  • Stream, UNIX(LF) record terminator
  • Variable record length with 2-byte prefix
  • Variable record Hadoop distributable format
  • XML
  • Mainframe fixed record format
  • Mainframe ftpbinary format
  • Mainframe variable record format
  • Mainframe variable record Hadoop distributable format
  • Mainframe variable with block descriptor record format
  • Mainframe VSAM record format

    See the section XML files, below, for the encoding of XML files.

    If a task has multiple non-XML input files, they must have the same encoding.

    Specify the BOM keyword when the file contains a Unicode Byte Order Mark (BOM), which specifies the endian and character encoding information of the file. This mark appears in the first few bytes of the file and is not treated as part of the data. More information regarding Unicode encodings and the BOM can be found at http://unicode.org/faq/utf_bom.html.

    If the ENCODING option is specified, it overrides the encoding detected from the BOM. If the encoding is specified as UTF-16 or UTF-32, the byte order of the encoding is determined from the BOM. If there is no BOM specified, the encoding would be treated as big-endian.

Compatible Encoding Types

In some cases, you may choose to specify an encoding that is different from the actual encoding of the data. For instance, ASCII will almost always provide better performance than any other encoding. In order to do so, you need to understand whether the actual encoding is compatible with the specified encoding.

An encoding is compatible with another if the first encoding can be used in place of the other without significant data loss or corruption. However, it may still corrupt data or cause data loss if used for certain ranges of characters. For example, if data is encoded in UTF-8, you may be able to use ASCII as a compatible encoding when all of your characters are expected to be in the 7-bit ASCII range (0x00 – 0x7f). However, since character representations differ between ASCII and UTF-8 beyond this range, undesirable results may occur if you have data above this range and interchange these encodings. An incompatible encoding has no overlapping character ranges and if used would definitely give undesired results. ASCII, LOCALE, and UTF-8 may all be compatible to each other; the rest of the encodings are incompatible.

XML Files

When the record format is XML, all the options under file_attribute are ignored, except the file_size, block_size and compress_attribute.

LAYOUT must be specified for XML input; layout must be the root of the XML document. The layout/structure of the XML document must be described by one of the following:
  • A Document Type Definition (DTD) inlined in the XML document;
  • A DTD referenced by the root element of the XML document;
  • An XML Schema referenced by the root element of the XML document;
  • A DTD or an XML Schema made available through the file specified in /DATADICTIONARY.
A schema that is found embedded or referenced in the XML input file (local schema) takes precedence over a schema that is made available through /DATADICTIONARY (external schema), unless EXTERNAL is specified, in which case external layouts take precedence.

When layout is not specified, Connect ETL will use a layout defined in a local schema. If the schema has more than one layout, Connect ETL will use the layout whose name matches the root of the source XML document.

For an XML layout, all the XML fields composing the layout that are found at run-time in the XML input will be extracted. If you want to extract only some XML fields, you need to specify an extracted XML layout through /XMLEXTRACT.

A composite field defined in an XML layout can not be referenced anywhere in dmexpress except in a /REFORMAT that defines a new XML target layout. The contents of such fields used in any other context are undefined.

When data is extracted from a nullable XML field, the nullability information is also retrieved unless suppressed using the /XMLEXTRACT NOTNULLABLE argument.

Encoding of XML files

The encoding of an XML file is defined through the encoding declaration as part of the metadata. If the document does not have any encoding declaration, an XML document, by default, is assumed to be encoded in UTF-8. The encoding of the document is overridden by specifying the ENCODING option. The BOM option is ignored for XML files, since the file determines whether a BOM is needed or not.
The ENCODING ASCII option can be used:
  • To improve the performance of data processing when the encoding declaration specifies UTF-8 encoding, but the data contains only 7-bit ASCII characters;
  • When the encoding declaration specifies an unknown encoding, but the characters contained in the data are compatible with 8-bit ASCII (ISO-8859-1);
  • When you want to treat data as a sequence of bytes, not as characters, regardless of the encoding.

    The ENCODING LOCALE option can be used only when the encoding declaration specifies an unknown encoding. In that case, you need to make sure that your locale encoding, and the document encoding are compatible.

Field Delimiters

A field-delimited text record consists of consecutive fields, with adjacent fields separated by a single byte or multibyte field separator. When the field separator is the UNIX default, each blank in a sequence of blanks is part of the following field. When the field separator is not the default you must specify it through the FIELDSEPARATOR separator argument. The specified separator is assumed to be in locale and is converted to the encoding of the source file.

Each separator is considered to separate two adjacent fields; therefore, consecutive separators indicate an empty field (one whose length is zero). A non-default separator is not considered part of the following field.

When one or more blank characters (spaces and tabs) separate fields in a text record, and the blanks are part of the data, specify FIELDSEPARATOR UNIXSORTDEFAULT. For this type of separator (the UNIX default), each blank character in a sequence of blanks is taken as part of the following field. It is not possible to have an empty field with this type of separator.

When a file does not contain field-delimited text records, indicate this by specifying FIELDSEPARATOR NONE. This is necessary when there are two or more source files in the task and there is a mixture of delimited records and fixed-position records. If you do not indicate that a fixed-position file in such an application is not delimited, Connect ETL applies the default and assumes the file has delimited records with the default separator. With FIELDSEPARATOR NONE specified for a fixed-position file, when you reference a delimited field in a record from the file, the field either contains the complete record (the first field) or is empty (the second and subsequent fields).

The combination of the UNIX default separator and enclosing characters is not supported.

Each field in a field-delimited text record may be enclosed by two enclosing char­acters. When fields are enclosed, specify the ENCLOSEDBY clause. Any lead­ing and trailing enclosing characters are not part of the field. Two consecutive trailing enclosing characters within the field are treated as a single enclosing character. The leading enclosing character cannot be a blank (tab or space). The leading enclosing character and the trailing enclosing character cannot be a prefix of the field separator. When the leading and the trailing characters are the same, you do not need to specify the trailing character.

Connect ETL ignores any characters between an enclosing character terminating a field and the following field separator or the end of the record. When the first nonblank character following a field separator (or the first nonblank character in the record for the first field) is not a leading enclosing character, Connect ETL assumes that the field is not enclosed.
For sources and targets with multi-byte Locale encodings, the byte sequence of the converted field separator must not occur as part of another multi-byte character. For example, for Japanese locales, the set of valid separator characters are:
  • Japanese Shift-JIS: 0x00 – 0x3F
  • Japanese EUC: 0x00 – 0x7F

Record Layout

Use the LAYOUT clause to designate the record layout that defines the fields of the records in the file.

Fields in the record layout cannot be referenced until the record layout is associated with the input file.

The specification of multiple record layout names is supported with the /INFILE option; however, one and only one fixed position record layout must be specified for an input file of type MAINFRAME FTPBINARY.

File Alias

When you want to include or omit records based on which source they originate from, you must give aliases to some or all of the sources. For an /INFILE option, this alias is simply a shorthand way of referring to the input file. You then code the alias as part of a comparison in a /CONDITION definition.

The file alias is mandatory in a /FILE option. dmexpress functions that require a file argument should be supplied with the quoted file alias of a file defined using /FILE.

Compressed Files

Connect ETL accepts bzip2 and gzip compressed source files. The bzip2 format is defined at http://www.bzip.org; the gzip format is described in RFC 1952.

The COMPRESSED argument cannot be specified for an indexed, VSAM, MicroFocus line sequential, or Micro Focus variable file.

Header and header layout

A header clause can be specified for the input file. A header is not considered to be part of the data and does not count towards any execution statistics. The header does not affect filters that are based on the number of input records (e.g. FDISCARDFIRST, DISCARDFIRST, FDISCARDAFTER, DISCARDAFTER, etc.). If the header contains a record that defines the field names for the data records in the file, a layout name can be assigned to those fields using the LAYOUTNAME clause.

Headers cannot be specified if the file type is indexed, XML, Mainframe, Micro Focus, or Fortran Unformatted.

A header containing a header layout can only be specified for a source file that is delimited and has a stream or a fixed length record format. The field names for the data records in the file should be separated using the delimiter and enclosing characters that are specified for the source.

When a header containing a layout is specified for the source, EXTERNAL can not be specified for the header layout. If the same header layout is specified for two or more sources, the same field names must be present in each source.

Remote files

Specify the connection that should be used to connect to the system where the file is located. When the host provided as part of a connection and the host in the file access do not match, a warning will be issued, and the host used will be the one in the connection.

Wildcard Expansion

If EXPANDWILDCARDS is specified and the file name contains a wildcard pattern, then Connect ETL will process all the files that match the given pattern. File names that match the wildcard pattern will be collated according to the current locale and processed in that order. See the topic Syntax of wildcard characters in the Connect help for information on the wildcards supported by Connect ETL.
Note: For legacy UNIX command-line usage, the file name must be enclosed in double-quotes ("") when specifying EXPANDWILDCARDS to protect the wildcard. Otherwise, the UNIX shell will do the file expansion.

All files that match the pattern must have the same file type and the same attributes such as record format, field separator, enclosing characters, record length, record alignment, compression, among others.

Any file level record filtering options that are specified such as FDISCARDFIRST or FDISCARDAFTER will be separately applied to each file.

Compatibility with Other dmexpress Options

File level record limiting criteria, which is specified through the FDISCARDSHORT, FDISCARDFIRST, FDISCARDBLANK, FDISCARDAFTER option arguments in the /INFILE option, cannot be specified in the same task definition that specifies general bulk filtering for source records through the DISCARDSHORT, DISCARDFIRST, DISCARDBLANK, DISCARDAFTER option arguments in the /FILTER option.

Examples

/infile sort1.in stlf
The option specifies that the input records come from the file sort1.in, which is a UNIX text (LF record terminator) file.
/INFILE year90/payables/current/trans.dat alias transactions 
          variable layout trans
This option assigns an alias, transactions, to the input file trans.dat in the directory year90/payables/current. The file contains variable length records. Fields in the records are described by record layout trans.
/INFILE tape1.in fixed 1024
The name of the input file is tape1.in. It contains fixed format records of length 1024 bytes.
/INFILE transact*.txt EXPANDWILDCARDS fixed 214 layout transaction
The option specifies that the input records come from all the files that match the pattern transact*.txt, for example transact.txt, transact1.txt, transaction.txt. All the matching files have fixed length records of length 214 bytes. The record layout transaction defines the fields in the records.
/INFILE report.dat stlf fdiscardfirst 5 fdiscardblank
The option specifies that the first five records and all blank records in the UNIX text (LF record terminator) file report.dat are to be bypassed.
/INFILE sort.in2 fixed 79 aligned2 
The option specifies that the input records are aligned on a 2-byte boundary. The file is a fixed record length file. The length of each record is 79 bytes.
/INFILE sort.in stlf fieldseparator ":" 
The option specifies that the fields of the input records are separated by a colon.
/INFILE corporate_db indexed 
The file corporate_db is an indexed file to be used to supply input records.
/INFILE sortdelim.dat stlf fieldseparator "," enclosedby """"
The file sortdelim.dat is a field-delimited UNIX text file with enclosed fields. The comma (,) is used as the field separator and the double quote (") is used as the leading and trailing enclosing character.
/INFILE sortdelim.dat stream crlf fieldseparator "," enclosedby ":", ";"
The file sortdelim.dat is a field-delimited, MS-DOS text (CR-LF record terminator) file with enclosed fields. The comma(,) is used as the field separator , the leading enclosing character is the colon(:) and the trailing enclosing character is the semi-colon(;).
/INFILE sales_new york.txt XML LAYOUT I"SalesEntries" 
          /INFILE sales_newjersey.txt XML LAYOUT I"SalesEntries" ALIAS SalesNJ
The options specify that the input records come from two XML files, which have the same XML layout SalesEntries. Since the names of the layouts are the same for both input files, an alias is used to resolve the name conflict and be able to reference XML fields in the rest of the task.
/INFILE "sample.txt" STLF FIELDSEPARATOR "," BOM ENCODING UTF-16LE LAYOUT layout1 
          ALIAS alias
File Sample.txt contains a BOM and the file is encoded in UTF-16LE. The records in the file can be mapped using record layout which is aliased as alias1. The alias name should be used in field references later.
/infile "/MVS/wwcdmx/INVAR" MAINFRAMEUNBLOCKEDVARIABLE 32756 SERVERCONNECTION
          MainframeConnection encoding EBCDIC 
The option specifies that the input records come from the mainframe variable file INVAR on the z/OS partition of a mainframe system in user wwcdmx’s catalog with the maximum record length equal to 32756 bytes and is accessed via the remote server connection defined in MainframeConnection. The data is EBCDIC encoded.
Note: If the file is on a mainframe and accessed via a FTP server connection, the virtual directories /DMX_FTP_HFS and /DMX_FTP_MVS are supported for backward compatibility for older tasks.
/INFILE "invar2.ftpbinary.dat" MAINFRAME FTPbinary encoding EBCDIC layout transaction

This option specifies that the file, invar2.ftpbinary.dat, which is a mainframe variable record file without record descriptor words (RDWs), is input as a source file of dmexpress file type MAINFRAME FTPBINARY. Mainframe variable record files without RDWs are generated by the File Transfer Protocol (FTP) when transferring a mainframe variable file in BINARY mode without the LOCSITE RDW option. The data is EBCDIC encoded. The fixed position record layout, transaction, is associated with invar2.ftpbinary.dat.