/OUTFILE - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To specify a file as a target for records.

Format

/OUTFILE file_identifier [SERVERCONNECTION connection] [disposition] file_type [file_attribute …] [parameters]

where

file_identifier = {file_name } {file_descriptor OPEN}
disposition = {OVERWRITE} {APPEND }
file_type = {STREAM CRLF } {STLF } {STCR } {XML } {FIXED record_length [record_alignment] } {VARIABLE [LITTLEENDIAN] [record_alignment]} {VARIABLE DISTRIBUTABLE } {MFFIXED record_length } {MFVARIABLE } {MFLINESEQUENTIAL } {INDEXED } {MAINFRAME FIXED record_length } {MAINFRAME VARIABLE [BLOCKED block_size] } {MAINFRAME VARIABLE DISTRIBUTABLE } {FORTRANUNFORMATTED } {TABLEAU } {APACHEAVRO } {APACHEPARQUET } {QLIKVIEW }
file_attribute   {compress_attribute} {encoding_attribute} {field_delimiters } {record_numbering } {HEADER [ALWAYS] }
record_alignment = {ALIGNED2 } {ALIGNED4 } {UNALIGNED}
compress_attribute   {COMPRESSED [SOURCECOMPRESSIONFORMAT|BZIP2|GZIP] [HIGHCOMPRESSION] } {UNCOMPRESSED }
encoding attribute   ENCODING encoding_value [BOM]
encoding_value = {SOURCEENCODING} {encoding } {character_set }
field_delimiters   [FIELDSEPARATOR separator] [enclosedby_setting]
enclosedby_settings   {ENCLOSEDBY leading_character [,trailing_character]} {NOTENCLOSED }
record_numbering = RECORDNUMBER [START recnum_start] [EACHPARTITION]
parameters = PARAMETERS parameter [,parameter...]
parameter = parameter_name parameter_value (space separated)

Arguments

file_name

The pathname of the file which is to receive records from the task.

For details on specifying a file path, see File Name and Syntax Requirements .

file_descriptor The descriptor of the open file, which is to receive records from the task.
connection A string or identifier that specifies the connection or its alias used to connect to the system where the file is located.
record_length The length in bytes associated with the following /OUTFILE target record type clauses: FIXED, MFFIXED, MAINFRAME FIXED.
block_size The block size in bytes for MAINFRAME VARIABLE BLOCKED, which is the mainframe variable record length with block descriptor record format. The block size is required and ranges in value from 9 to 32760.
encoding

The encoding argument can be used for encoding a text field of data with one of the following built-in character sets: ASCII, EBCDIC, Locale, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, or UTF-32LE.

Note: When Locale is specified, the encoding used is that which is defined by the system locale where dmexpress runs.

This encoding value can be specified as a built-in character set keyword without quotes or as a string as outlined in Constants

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

character_set

The character_set argument can be used for encoding a text field of data with one of the ICU library character sets.

This character set value must be specified as a string as outlined in File Name and Syntax Requirements .

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

separator

A single byte or multibyte character string that separates adjacent fields in delimited text records.

You can specify the character string in single-quoted or double-quoted format as outlined in Constants.  

At runtime, field separators are treated as follows:

  • Character text constant field separators are treated as locale encoded.
  • Hex text constant field separators are treated as binary representations of the source or target encoding. Hex text constant field separators within the /OUTFILE option are encoded in the same encoding as specified in the target file.
leading_character The character in the field-delimited text records that optionally precedes the data in each delimited field.
trailing_character The character in the field-delimited text records that optionally follows the data in each delimited field.
recnum_start The number assigned to the first record in the file, for record numbering. The minimum value is zero. The default is 1.
parameter_name The name of the parameter
parameter_value The value of the parameter

Location

The option may appear anywhere in the task definition.

When no outputs are present, Connect ETL writes its output to standard output, which can be specified through the /OUTPIPE option.

Header

A header containing the data field names can be generated for the target by specifying the HEADER keyword.

A header can only be generated if:
  • the target file has a stream based record format, and
  • the target file is delimited

The generated header consists of a single record that lists the names of the fields present in the data. The names are listed in the same order as that of the data fields. The field names are delimited and enclosed using the same characters as the target data.

The field names inserted into the header are obtained from the corresponding field names of the first reformatted layout that is specified for this target (see /REFORMAT). An error is reported if a reformat is not specified for the target or if a positional reformat is specified. An error is reported if a composite field or an array field is used in the target’s first reformat as it prevents the names from being obtained.

The generated header will be inserted prior to any data records. The header does not count towards any execution statistics that may be calculated. It also does not affect any options that depend on the number of output records (e.g. target record number). The header is considered to be disjoint from the data records.

If the output file is partitioned, the generated header will be present in all the partitions.

If ALWAYS is specified for the HEADER, then the header will be generated in the target even when the source is empty.

Field Separator

The default field separator for the output is the same as the single byte or multibyte field separator of the first input that is different than UNIXSORTDEFAULT. If all inputs have either a UNIX default separator or no separator, then the default output field separator is a space. If none of the inputs are delimited, the output is not delimited.

Record Alignment

When you do not provide the record alignment, the attribute is taken from the first source specified via /INFILE, /INPIPE, or /INBUFFER.

Compression

When neither COMPRESSED nor UNCOMPRESSED is specified, the attribute is taken from the first source specified via /INFILE or /INPIPE.

When COMPRESSED is specified without SOURCECOMPRESSIONFORMAT, BZIP2 or GZIP, Connect ETL generates output compressed in gzip format.

The HIGHCOMPRESSION option will minimize the size of the target data, but this may impact the performance. By default, the decision is made to optimize for speed.

Encoding and Byte Order Mark (BOM)

Use the ENCODING option to describe the character encoding of your target file.

If the ENCODING keyword is not specified in a non-XML outfile, the outfile encoding is the same as the encoding of the first non-XML source file or pipe when they are present. If a non-XML source file or pipe is not present, the target encoding defaults to ASCII.

If the ENCODING keyword is specified in a non-XML outfile, data and stream record terminators as well as any field delimiters or enclosing characters are treated in the given encoding.

If UTF-16 or UTF-32 is specified for a non-XML target file, the byte order will be determined by the byte order of the first non-XML source file or pipe. However, if the first non-XML source file or pipe is not a variation of UTF-16 or UTF-32, respectively, the byte order is assumed to be big-endian.

By default, an XML target file has the same encoding as the first XML source file, if present. Otherwise, the XML target file encoding is set to be ASCII. An XML target file with a UTF-16 encoding specified will assume the byte order of the first XML source file, if it is a UTF-16 variation or assume big-endian byte order otherwise.

When BOM is specified, Connect ETL will create a byte order mark on your target file. More information regarding Unicode encodings and the Byte Order Mark can be found at http://unicode.org/faq/utf_bom.html.

Notes

The /OUTFILE option defines a file as a target for records.

Number of Output Files

Connect ETL writes the output records to one or more files. The number of output files is not restricted by Connect ETL. For each output file, use a separate /OUTFILE option.

File Type

For information on supported file types, see Source and Target File Types and Records.

XML

An XML target file requires a reformat to be specified by /REFORMAT with MAPPINGS using a target XML layout.

For a mapped reformat, no metadata is created. If the mapped layout originates from an XML Schema, the target file will be in the form of an XML instance document. The document may or may not be schema-valid, depending on the data written. Connect ETL does not validate the data against the constraints specified in the schema. If the mapped layout originates from an XML DTD, the target file will be a plain well-formed XML document containing only tag names, attributes and content.

When metadata is generated, it is in the same location as the output file with the extension .dtd or .xsd for DTD and Schema respectively. The target data file will contain a link to the external metadata file.

All data written to the XML document is in displayable form, encoded in the specified encoding. The ENCODING option applies to both XML targets and XML metadata. If ENCODING LOCALE is used, the encoding written in the encoding declaration of the XML document is system-dependent. It is recommended that you use one of the other encodings, as they are system-independent. UTF-32 encoding is not allowed with XML targets.

Tableau Data Extract

For Tableau Data Extract file targets:

  • the file_identifier argument must be a file_name
  • the file_type argument must be TABLEAU
  • all file_attribute arguments except record_numbering are ignored
  • a reformat must be specified using the /REFORMAT option with the target layout specified as EXTERNAL (or the deprecated TABLEAU)

Apache File Formats

For Apache file format targets:

  • the file_identifier argument must be a file_name
  • the file_type argument must be APACHEAVRO or APACHEPARQUET
  • all file_attribute arguments except record_numbering are ignored
  • a reformat must be specified using the /REFORMAT option, with the target_layout_type specified as EXTERNAL

The following optional parameters are available for Apache file formats:

Parameter Name Parameter Value Default
Avro CODEC NULL, DEFLATE, or SNAPPY NULL
BLOCKSIZE <value in KB> 16
Parquet CODEC UNCOMPRESSED or GZIP UNCOMPRESSED
BLOCKSIZE <value in MB> 128
PAGESIZE <value in MB> 8
DICTIONARYENCODING TRUE or FALSE FALSE

QlikView data eXchange

For QlikView data eXchange file targets:

  • The file_identifier argument must be a file_name
  • The file_type argument must be QLIKVIEW
  • All file_attribute arguments except record_numbering are ignored
  • A reformat must be specified using the /REFORMAT option with the target layout specified as EXTERNAL

Existence of Output File

Flat file

If the file you specify to hold the output from dmexpress does not exist, a new file is created. If the file exists, you must specify either OVERWRITE or APPEND. If OVERWRITE is specified, the output records will overwrite whatever data may exist in the file. If APPEND is specified, the output records will be added to the end of the file.

You cannot specify APPEND for the following cases:

  • A file that exists and is accessed remotely using SERVERCONNECTION
  • A Tableau Data Extract file
  • An XML file
  • A file with a BOM
  • An Apache format file
  • A QlikView data eXchange file

For an XML file, OVERWRITE applies to both the target data file and the target metadata file.

Indexed and VSAM

An indexed or VSAM target file must exist before the task runs. Connect ETL does not support overwriting records that are already in such a file. See Clerity MTP and UniKix TPE documentation for more information on the VSAM file organization.

Record Length

In the /OUTFILE option, you must provide record length values for the following file types: FIXED, MFFIXED, MAINFRAME FIXED.

File Separator

A field-delimited text record consists of consecutive fields, with adjacent fields separated by single byte or multibyte separator characters. When you want the field separator to be different than the default, specify the output separator through the separator argument.

When the ENCLOSEDBY characters are different from input, a target reformat is required.

Record Alignment

If you want all records in an output file to be aligned on 2-byte boundaries, supply the ALIGNED2 keyword except for fixed format records of even length. If you want records to be aligned on 4-byte boundaries, supply the ALIGNED4 keyword except for fixed format records whose length is a multiple of 4 bytes.

Record Numbering

When you want to add a record number to the records in the file, specify the RECORDNUMBER argument. To start the numbering at other than 1, provide START recnum_start.

If you do not specify a /REFORMAT option for the file dmexpress will automatically add a record number to the beginning of each record. The number appears as a 10- digit, unsigned decimal number. When the records in the file are field-delimited text records, leading zeroes in the number do not appear and a field separator follows the number.

If you do provide a /REFORMAT, dmexpress does not automatically add a record number. See /REFORMAT for details on how to add the record number to a reformatting.

Compressed Files

If COMPRESSED is specified with SOURCECOMPRESSIONFORMAT, then the output is compressed in the same compression format as the first compressed source specified with /INFILE or /INPIPE. SOURCECOMPRESSIONFORMAT is not allowed when there is no compressed source specified with either /INFILE or /INPIPE.

When COMPRESSED is specified with BZIP2, Connect ETL generates a compressed bzip2 file as documented at http://www.bzip.org.

When COMPRESSED is specified with GZIP, Connect ETL generates a compressed gzip file that conforms to the RFC 1952 standard.

The COMPRESSED argument cannot be specified for an indexed, VSAM, Micro Focus, Tableau Data Extract, or QlikView data eXchange file.

Encoding

When SOURCEENCODING is specified as the encoding_value, the encoding specified for the task sources is applied to the task targets. As SOURCEENCODING is a keyword, no quotes are required.

Nullable fields

When a NULL value is output to a field in a file, the result is an empty field. This is defined as a field of length 0 for a delimited field, and space padded for a fixed length field. For a field of binary data type, each byte of the field will be set to hex value "20".

Examples

/outfile sort1.out stlf
The option specifies that the output from the task is to be written to the UNIX text (LF record terminator) file sort1.out.
/OUTFILE year2004/payables/current/trans_sorted.dat variable
The option requests that the output file trans_sorted.dat be created in the directory year2004/payables/current. The format of the records in the file is to be variable.
/OUTFILE tape1.out fixed 1024 
The name of the output file is tape1.out. It is to contain fixed format records of length 1024 bytes.
/OUTFILE customers.out stlf append 
This is a request for Connect ETL to append output records to the UNIX text (LF record terminator) file customers.out without overwriting the current contents of the file.
/OUTFILE out.dat variable aligned2 
The output file out.dat is to contain variable length records aligned on a 2-byte boundary.
/OUTFILE corporate_db indexed 
The file corporate_db is an existing indexed file to be used to hold the output records. It exists and is empty.
/OUTFILE numbered.dat stlf recordnumber
The output file numbered.dat is to contain numbered records. When the records in the file are not reformatted, the record number field appears at the start of the record. By default, the number is an unsigned decimal field of 10 digits.
/OUTFILE output_utf8.txt stream crlf encoding utf-8
The output file output_utf8.txt is a MS-DOS text file that contains records terminated by a combination of carriage return and linefeed. The record terminators, enclosing characters and field delimiters are to be encoded in UTF-8.
/OUTFILE output_utf8.txt stream crlf bom encoding utf-8
The output file output_utf8.txt is a MS-DOS text file that contains records terminated by a combination of carriage return and linefeed. The file should have a byte order mark (BOM) at the beginning of the file indicating an encoding of UTF-8 and the record terminators, enclosing characters and field delimiters are to be encoded in UTF-8.
/OUTFILE output_ebcdic.txt mainframe variable blocked 12000 encoding ebcdic

The output file, output_ebcdic.txt, is a mainframe variable record length with block descriptor file encoded in ebcdic. Each block of the file holds up to 12000 bytes.