Purpose
The /INFILE option defines a file as a source of records.
The /FILE option defines an auxiliary data file such as a lookup source file.
Format
/INFILE | /FILE |
file_identifier [EXPANDWILDCARDS] [SERVERCONNECTION connection] [ALIAS alias] file_type [file_attribute …] [record_limit …] [record_layouts] |
file_identifier | = | {file_name } {mainframe_dataset} {task_alias PREVIOUSTASK} {file_descriptor OPEN } |
file_type | = | {STREAM CRLF } {STLF } {STCR } {XML } {FIXED record_length [record_alignment] } {VARIABLE [LITTLEENDIAN] [record_alignment]} {VARIABLE DISTRIBUTABLE } {MFFIXED record_length } {MFVARIABLE } {MFLINESEQUENTIAL } {INDEXED } {MAINFRAME FIXED record_length } {MAINFRAME FTPBINARY } {MAINFRAME VARIABLE [BLOCKED] } {MAINFRAME VARIABLE DISTRIBUTABLE } {MAINFRAME VSAM } {FORTRANUNFORMATTED } |
record_alignment | = | {ALIGNED2 } {ALIGNED4 } {UNALIGNED} |
file_attribute | = | {compress_attribute} {encoding_attribute} {header } {field_delimiters } |
compress_attribute | = | {COMPRESSED } {UNCOMPRESSED} |
encoding_attribute | = | [BOM] [ENCODING encoding|character_set] |
header | = |
[HEADER SIZE number_of_records [NAMESRECORD names_record [SKIPFIELDS fields_to_skip] LAYOUTNAME header_layout]] |
field_delimiters | = | FIELDSEPARATOR field_separator [ENCLOSEDBY leading_character [,trailing_character]] |
field_separator | = | {separator } {UNIXSORTDEFAULT} {NONE } |
record_limit | = | {FDISCARDFIRST num_records} {FDISCARDAFTER num_records} {FDISCARDBLANK } {FDISCARDSHORT min_length } |
record_layouts | = |
LAYOUT {layout_info } {(layout_info [, layout_info…])} |
layout_info | = | layout_name [EXTERNAL] [ALIAS layout_alias] |
Arguments
file_name |
The pathname of the file which provides input records for the task. For details on specifying a file path, see File Name and Syntax Requirements . |
mainframe_dataset |
The full pathname to the location of the mainframe dataset or file residing on the HFS (UNIX) partition. The path name base directory must include the /MVS virtual directory or slash (/), which indicates that the file is located on the MVS or z/OS partition, or the /HFS virtual directory, which indicates that the file is located on the HFS partition:
For details on specifying a file path, see File Name and Syntax Requirements . |
task_alias | The alias or task name of a previous task, which provides input records to the current task. |
file_descriptor | The descriptor of the open file, which provides input records to the task. |
connection | A string or identifier that specifies the connection or its alias used to connect to the system where the file is located. The connection needs to be specified using a /serverconnection option. |
alias |
A name you assign to the file, and which you will use to refer to the file in other dmexpress options. The name assigned to a file must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
record_length | The length in bytes associated with the following /INFILE record type clauses: FIXED, MFFIXED, MAINFRAME FIXED. |
encoding |
The encoding argument can be used for encoding a text field of data with one of the following built-in character sets: ASCII, EBCDIC, Locale, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, or UTF-32LE. Note: When Locale is specified, the encoding used is that which is defined by the system locale where dmexpress runs. This encoding value can be specified as a built-in character set keyword without quotes or as a string as outlined in File Name and Syntax Requirements . For additional information, see the following Connect help topics: "Data encoding" and "Text data type." |
character_set |
The character_set argument can be used for encoding a text field of data with one of the ICU library character sets. This character set value must be specified as a string as outlined in File Name and Syntax Requirements . For additional information, see the following Connect help topics: "Data encoding" and "Text data type." |
number_of_records | The number of records that constitute the header. |
names_record | The record number of the header record that contains the field names. This record defines the data layout of the header. |
fields_to_skip | The number of fields at the beginning of the header’s names record that do not contain field names. |
header_layout |
The name of the header layout that is defined by the header’s names record. The name can be used to reference the header layout from other dmexpress options. The name must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help |
separator |
A single byte or multibyte character string that separates adjacent fields in delimited text records. You can specify the character string in single-quoted or double-quoted format as outlined in Constants. At runtime, field separators are treated as follows:
|
leading_character | The character in the field-delimited text records that optionally precedes the data in each delimited field. |
trailing_character | The character in the field-delimited text records that optionally follows the data in each delimited field. |
num_records | The number of records to be bypassed from the beginning of the file. These records are not released to the task. |
min_length | Specifies the minium length requirement for records to avoid being discarded as per the FDISCARDSHORT clause. |
layout_name | The name of a record layout that defines the fields of the records in the source file. A record layout can be defined in a /DELIMITEDRECORDLAYOUT or /RECORDLAYOUT option or defined in an external metadata file using /DATADICTIONARY. |
layout_alias |
A name you assign to the layout, and which you will use, in lieu of the original layout name, to refer to the fields of records originating from this source in other dmexpress options. When a layout alias is defined, you can no longer reference the fields using the original layout name directly in your task. The name assigned to a source file’s layout must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
Location
The /INFILE option can be listed anywhere in the task definition for the following task types: copy, sort, merge, and aggregates.
For a join task, the /INFILE option must be listed as follows:- Left side – Before the /JOINKEYS options that defines the left side.
- Right side - After the /JOINKEYS option that defines the left side and before the /JOINKEYS option that defines the right side.
Defaults
When no inputs are present, dmexpress takes its input from standard input, which can be specified through the /INPIPE option.
Field Delimiters
When you provide neither a single byte or multibyte field separator nor an enclosing character, Connect ETL treats a sequence of one or more spaces and tabs (the UNIX default) as the delimiter in field-delimited text records. The separator is treated as part of the following field.
When you do not provide a field separator, but you do supply an enclosing character, Connect ETL assumes that the separator in field-delimited text records is the comma.Compression
When no compression attribute is specified, the file is treated as UNCOMPRESSED.
Encoding
The default encoding for non-XML files except MAINFRAME FTPBINARY is ASCII. For XML files, the default encoding is the one specified in the encoding declaration in the file, when present, and UTF-8 otherwise. For MAINFRAME FTPBINARY, the default encoding is EBCDIC.
Wildcard Expansion
When EXPANDWILDCARDS is not specified, wildcards will not be expanded in the file name.
Notes
The file defined through the /FILE option does not provide source records directly to the task. Instead, it may be used in other source options as an auxiliary input to the record extraction process. For instance, a file defined through the /FILE option can be used as the look-in file in a Lookup function.
Number of Input Files
For each input file, use a separate /INFILE option.
An aggregate task that has an XML source file can not have any other source.File Type
For information on supported file types, see Source and Target File Types and Records.
Record Length
- Fixed record length (FIXED)
- Micro Focus sequential with fixed record length (MFFIXED)
- Mainframe fixed record length (MAINFRAME FIXED)
To specify a maximum record length for variable length records, use the /INMAXRECORDLENGTH option, which specifies the maximum source record length per task
Record Alignment
If all records are aligned on 2-byte boundaries, you have to supply the ALIGNED2 keyword except for fixed format records of even length. If all records are aligned on 4-byte boundaries, you have to supply the ALIGNED4 keyword except for fixed format records whose length is a multiple of 4 bytes. The default is that records are not aligned.
Record Limit Option Arguments
- FDISCARDSHORT - All records shorter than the specified minimum length are discarded.
- FDISCARDFIRST - The specified number of records are discarded from the files specified
in this /INFILE option, starting from the first record in the first input file.
If the first file does not have enough records, records are discarded from the second file and from subsequent files until the specified number of records are discarded.
- FDISCARDBLANK - All blank records are discarded. A record is considered to be blank if either the record length is 0 or all bytes in the record are space characters.
- FDISCARDAFTER - Only the specified number of records are retained for further processing.
Encoding and Byte Order Mark (BOM)
- Fixed record length
- Stream, carriage-return record terminator
- Stream, MS-DOS(CR-LF) record terminator
- Stream, UNIX(LF) record terminator
- Variable record length with 2-byte prefix
- Variable record Hadoop distributable format
- XML
- Mainframe fixed record format
- Mainframe ftpbinary format
- Mainframe variable record format
- Mainframe variable record Hadoop distributable format
- Mainframe variable with block descriptor record format
- Mainframe VSAM record format
See the section XML files, below, for the encoding of XML files.
If a task has multiple non-XML input files, they must have the same encoding.
Specify the BOM keyword when the file contains a Unicode Byte Order Mark (BOM), which specifies the endian and character encoding information of the file. This mark appears in the first few bytes of the file and is not treated as part of the data. More information regarding Unicode encodings and the BOM can be found at http://unicode.org/faq/utf_bom.html.
If the ENCODING option is specified, it overrides the encoding detected from the BOM. If the encoding is specified as UTF-16 or UTF-32, the byte order of the encoding is determined from the BOM. If there is no BOM specified, the encoding would be treated as big-endian.
Compatible Encoding Types
In some cases, you may choose to specify an encoding that is different from the actual encoding of the data. For instance, ASCII will almost always provide better performance than any other encoding. In order to do so, you need to understand whether the actual encoding is compatible with the specified encoding.An encoding is compatible with another if the first encoding can be used in place of the other without significant data loss or corruption. However, it may still corrupt data or cause data loss if used for certain ranges of characters. For example, if data is encoded in UTF-8, you may be able to use ASCII as a compatible encoding when all of your characters are expected to be in the 7-bit ASCII range (0x00 – 0x7f). However, since character representations differ between ASCII and UTF-8 beyond this range, undesirable results may occur if you have data above this range and interchange these encodings. An incompatible encoding has no overlapping character ranges and if used would definitely give undesired results. ASCII, LOCALE, and UTF-8 may all be compatible to each other; the rest of the encodings are incompatible.
XML Files
When the record format is XML, all the options under file_attribute are ignored, except the file_size, block_size and compress_attribute.
- A Document Type Definition (DTD) inlined in the XML document;
- A DTD referenced by the root element of the XML document;
- An XML Schema referenced by the root element of the XML document;
- A DTD or an XML Schema made available through the file specified in /DATADICTIONARY.
When layout is not specified, Connect ETL will use a layout defined in a local schema. If the schema has more than one layout, Connect ETL will use the layout whose name matches the root of the source XML document.
For an XML layout, all the XML fields composing the layout that are found at run-time in the XML input will be extracted. If you want to extract only some XML fields, you need to specify an extracted XML layout through /XMLEXTRACT.
A composite field defined in an XML layout can not be referenced anywhere in dmexpress except in a /REFORMAT that defines a new XML target layout. The contents of such fields used in any other context are undefined.
When data is extracted from a nullable XML field, the nullability information is also retrieved unless suppressed using the /XMLEXTRACT NOTNULLABLE argument.
Encoding of XML files
The encoding of an XML file is defined through the encoding declaration as part of the metadata. If the document does not have any encoding declaration, an XML document, by default, is assumed to be encoded in UTF-8. The encoding of the document is overridden by specifying the ENCODING option. The BOM option is ignored for XML files, since the file determines whether a BOM is needed or not.- To improve the performance of data processing when the encoding declaration specifies UTF-8 encoding, but the data contains only 7-bit ASCII characters;
- When the encoding declaration specifies an unknown encoding, but the characters contained in the data are compatible with 8-bit ASCII (ISO-8859-1);
- When you want to treat data as a sequence of bytes, not as characters, regardless of
the encoding.
The ENCODING LOCALE option can be used only when the encoding declaration specifies an unknown encoding. In that case, you need to make sure that your locale encoding, and the document encoding are compatible.
Field Delimiters
A field-delimited text record consists of consecutive fields, with adjacent fields separated by a single byte or multibyte field separator. When the field separator is the UNIX default, each blank in a sequence of blanks is part of the following field. When the field separator is not the default you must specify it through the FIELDSEPARATOR separator argument. The specified separator is assumed to be in locale and is converted to the encoding of the source file.Each separator is considered to separate two adjacent fields; therefore, consecutive separators indicate an empty field (one whose length is zero). A non-default separator is not considered part of the following field.
When one or more blank characters (spaces and tabs) separate fields in a text record, and the blanks are part of the data, specify FIELDSEPARATOR UNIXSORTDEFAULT. For this type of separator (the UNIX default), each blank character in a sequence of blanks is taken as part of the following field. It is not possible to have an empty field with this type of separator.When a file does not contain field-delimited text records, indicate this by specifying FIELDSEPARATOR NONE. This is necessary when there are two or more source files in the task and there is a mixture of delimited records and fixed-position records. If you do not indicate that a fixed-position file in such an application is not delimited, Connect ETL applies the default and assumes the file has delimited records with the default separator. With FIELDSEPARATOR NONE specified for a fixed-position file, when you reference a delimited field in a record from the file, the field either contains the complete record (the first field) or is empty (the second and subsequent fields).
The combination of the UNIX default separator and enclosing characters is not supported.Each field in a field-delimited text record may be enclosed by two enclosing characters. When fields are enclosed, specify the ENCLOSEDBY clause. Any leading and trailing enclosing characters are not part of the field. Two consecutive trailing enclosing characters within the field are treated as a single enclosing character. The leading enclosing character cannot be a blank (tab or space). The leading enclosing character and the trailing enclosing character cannot be a prefix of the field separator. When the leading and the trailing characters are the same, you do not need to specify the trailing character.
Connect ETL ignores any characters between an enclosing character terminating a field and the following field separator or the end of the record. When the first nonblank character following a field separator (or the first nonblank character in the record for the first field) is not a leading enclosing character, Connect ETL assumes that the field is not enclosed.- Japanese Shift-JIS: 0x00 – 0x3F
- Japanese EUC: 0x00 – 0x7F
Record Layout
Use the LAYOUT clause to designate the record layout that defines the fields of the records in the file.
Fields in the record layout cannot be referenced until the record layout is associated with the input file.The specification of multiple record layout names is supported with the /INFILE option; however, one and only one fixed position record layout must be specified for an input file of type MAINFRAME FTPBINARY.
File Alias
When you want to include or omit records based on which source they originate from, you must give aliases to some or all of the sources. For an /INFILE option, this alias is simply a shorthand way of referring to the input file. You then code the alias as part of a comparison in a /CONDITION definition.
The file alias is mandatory in a /FILE option. dmexpress functions that require a file argument should be supplied with the quoted file alias of a file defined using /FILE.Compressed Files
Connect ETL accepts bzip2 and gzip compressed source files. The bzip2 format is defined at http://www.bzip.org; the gzip format is described in RFC 1952.
The COMPRESSED argument cannot be specified for an indexed, VSAM, MicroFocus line sequential, or Micro Focus variable file.Header and header layout
A header clause can be specified for the input file. A header is not considered to be part of the data and does not count towards any execution statistics. The header does not affect filters that are based on the number of input records (e.g. FDISCARDFIRST, DISCARDFIRST, FDISCARDAFTER, DISCARDAFTER, etc.). If the header contains a record that defines the field names for the data records in the file, a layout name can be assigned to those fields using the LAYOUTNAME clause.
Headers cannot be specified if the file type is indexed, XML, Mainframe, Micro Focus, or Fortran Unformatted.A header containing a header layout can only be specified for a source file that is delimited and has a stream or a fixed length record format. The field names for the data records in the file should be separated using the delimiter and enclosing characters that are specified for the source.
When a header containing a layout is specified for the source, EXTERNAL can not be specified for the header layout. If the same header layout is specified for two or more sources, the same field names must be present in each source.Remote files
Specify the connection that should be used to connect to the system where the file is located. When the host provided as part of a connection and the host in the file access do not match, a warning will be issued, and the host used will be the one in the connection.
Wildcard Expansion
All files that match the pattern must have the same file type and the same attributes such as record format, field separator, enclosing characters, record length, record alignment, compression, among others.
Any file level record filtering options that are specified such as FDISCARDFIRST or FDISCARDAFTER will be separately applied to each file.Compatibility with Other dmexpress Options
File level record limiting criteria, which is specified through the FDISCARDSHORT, FDISCARDFIRST, FDISCARDBLANK, FDISCARDAFTER option arguments in the /INFILE option, cannot be specified in the same task definition that specifies general bulk filtering for source records through the DISCARDSHORT, DISCARDFIRST, DISCARDBLANK, DISCARDAFTER option arguments in the /FILTER option.
Examples
/infile sort1.in stlf
/INFILE year90/payables/current/trans.dat alias transactions
variable layout trans
/INFILE tape1.in fixed 1024
/INFILE transact*.txt EXPANDWILDCARDS fixed 214 layout transaction
/INFILE report.dat stlf fdiscardfirst 5 fdiscardblank
/INFILE sort.in2 fixed 79 aligned2
/INFILE sort.in stlf fieldseparator ":"
/INFILE corporate_db indexed
/INFILE sortdelim.dat stlf fieldseparator "," enclosedby """"
/INFILE sortdelim.dat stream crlf fieldseparator "," enclosedby ":", ";"
/INFILE sales_new york.txt XML LAYOUT I"SalesEntries"
/INFILE sales_newjersey.txt XML LAYOUT I"SalesEntries" ALIAS SalesNJ
/INFILE "sample.txt" STLF FIELDSEPARATOR "," BOM ENCODING UTF-16LE LAYOUT layout1
ALIAS alias
/infile "/MVS/wwcdmx/INVAR" MAINFRAMEUNBLOCKEDVARIABLE 32756 SERVERCONNECTION
MainframeConnection encoding EBCDIC
/INFILE "invar2.ftpbinary.dat" MAINFRAME FTPbinary encoding EBCDIC layout transaction
This option specifies that the file, invar2.ftpbinary.dat, which is a mainframe variable record file without record descriptor words (RDWs), is input as a source file of dmexpress file type MAINFRAME FTPBINARY. Mainframe variable record files without RDWs are generated by the File Transfer Protocol (FTP) when transferring a mainframe variable file in BINARY mode without the LOCSITE RDW option. The data is EBCDIC encoded. The fixed position record layout, transaction, is associated with invar2.ftpbinary.dat.