/RECORDLAYOUT - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To describe the layout of the input record in terms of consecutive fields.

Format

/RECORDLAYOUT {field[, field ...]}

where

field = {composite_field } {elementary_field}
composite_field = name { field [,field … ]}[REPEAT number [DETERMINEDBY determined_by_field_name]]

Note that the braces ({ }) in composite_field have to appear as shown. They do not indicate a choice of argument in this case.

elementary_field = {name[format_length][sequence][ENCODING encoding|character_set] [nullability] [REPEAT number [DETERMINEDBY determined_by_field_name]]} {FILLER length [BITS] }
format_length = {format [length]} {length format }
format =

{CHARACTER } {numeric_format } {DATETIME (datetime_pattern)} {BIT }

{LOB }

nullability = {NOTNULLABLE } {NULLIFEMPTY } {NULLBYTEPREFIX}

Arguments

name

The name of the record layout or field in the record layout, which you will use to reference the record or the field in other dmexpress options.

The assigned name must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

number The number of times a field is repeated consecutively in the input record.
determined_by_field_name The name of the field in the input record layout, which is used to determine the size of the variable length array. This field must be positioned before the variable length array in the input record layout; its value must be numeric and must be less than or equal to the maximum number of element occurrences in the array as specified by number.
length

The length of a field. The length is in bytes for all data types except bit, when the length is in bits.

The acceptable values for the length of a field with one of the recognized data types can be found in Reference/Data types reference/Connect ETL data types in the Connect help.

The length may be omitted only for a variable length character field at the end of the input record, and when the data type is DFLOAT, FLOAT, BIT, or DATETIME. The default date/time field length is specified by the date/time pattern and the default bit field length is 1 bit.

The length does not include the null indicator byte prefix if the NULLBYTEPREFIX keyword is specified for nullability.

encoding

The encoding argument can be used for encoding a text field of data with one of the following built-in character sets: ASCII, EBCDIC, Locale, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, or UTF-32LE.

Note: When Locale is specified, the encoding used is that which is defined by the system locale where Connect ETL runs.

This encoding value can be specified as a built-in character set keyword without quotes or as a string as outlined in File Name and Syntax Requirements .

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

character_set

The character_set argument can be used for encoding a text field of data with one of the ICU library character sets.

This character set value must be specified as a string as outlined in File Name and Syntax Requirements.

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

numeric_format The type of numeric data held in the field. The default data type is number unless the position is a bit position, in which case the default data type is bit.

A list of acceptable values for the number format can be found in Reference/Data types reference/Connect ETL data types/Number data type in the Connect help.

Location

The option may appear anywhere in the task definition.

Describing the Record Layout

To reference fields in the record layout, the record layout must first be associated with a source file through the /INFILE option.

Positional record layouts as well as delimited record layouts must be named and must have at least one defined layout field.

With the /RECORDLAYOUT option you can describe your input fields without explicitly specifying their positions within the record. Connect ETL calculates the starting position of each field from the size of the fields that precede it. You can then refer to fields defined through /RECORDLAYOUT in the same way as fields defined through, /DATADICTIONARY, DELIMITEDRECORDLAYOUT and /VALUE.

The record layout consists of elementary fields and composite fields. For example, if your records contain dates in the format yyyy/mm/dd, you could describe this data via elementary fields year, month, day and composite field date, as follows:
/recordlayout date {year char 4, 
                    filler char 1, 
                    month char 2, 
                    filler char 1, 
                    day char 2} 
                         date_sent char 6 

If all the elementary fields of a composite field have data type bit, then the com­posite field has data type bit, otherwise the composite field has data type charac­ter. In the example above therefore, the field date has data type character and length 10. Note that the fields date, year, month and day can all be referenced similarly in the task. The filler fields cannot be referenced in the task. A composite field can contain other composite fields. The record layout name can be referenced just like any other composite field.

You can create a second type of composite field by using the REPEAT number clause to define an array. You can then refer to a single member of the array by fol­lowing the field name by the appropriate subscript, in square brackets. A refer­ence to the field name without any subscript in square brackets references the entire array as a composite field. The data type of an array is determined in the same way as other composite fields, i.e., if all the fields of the array have data type bit, then the array has data type bit, otherwise the array has data type character. References to arrays and array members are described more fully in the Connect help.

Bit Fields

When you need to have access to individual bits, or strings of consecutive bits, in the input record you can specify fields with a data type of BIT. The length you specify is the number of bits in the field. When two elementary bit fields follow one another in /RECORDLAYOUT without an intervening FILLER, Connect ETL assumes that the fields are contiguous in the record. When an elementary non-bit field follows a bit field which does not end on a byte boundary Connect ETL assumes that the non-bit field starts on the next byte boundary.

Date/Time Fields

A date/time elementary field consists of a calendar date component and/or a time component. The contents are in printable form. For example, the date component may appear as month/day/year with the month as two digits, the day as 2 digits and the year as 4 digits. The time component may appear as hour:minute:second with the hour as two digits using the 24-hour clock, the minutes as 2 digits and the seconds as 2 digits. 5 minutes and 9 seconds after 8 o'clock in the morning of August 4, 2000 appears as 08/04/2000 08:05:09 for this format. You would represent this date/time date_pattern as (MM0/DD0/YEAR HH0:MI0:SE0). For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

LOB Fields

LOB fields are character fields that contain the file name of the file where the actual LOB data is stored. The encoding of the field is determined by the encoding of the file that contains it and is locale if that file is not encoded. The name of the LOB file will be converted to locale before accessing the file. If the file name is relative, the files are assumed to be located relative to the file that contains the LOB field. If multiple such files are defined, the files are located with respect to the first non-XML source.

Encoding

The endianness of the encoding schemes UTF-16 and UTF-32 is determined by the BOM setting of the first /INFILE for a non-XML file. If a UTF-16LE or UTF-16BE BOM is detected in the first non-XML file, any field defined as UTF-16 will inherit the corresponding endian order. Similarly, if a UTF-32LE or UTF-32BE BOM is detected in the first non-XML file, any field defined as UTF-32 will inherit the corresponding endian order. Otherwise any field defined as UTF-16 or UTF-32 will be assumed to have a big-endian order. When the encoding is not defined, the file encoding of the first non-XML file specified in /INFILE determines the field encoding. In the absence of /INFILE, the encoding defaults to ASCII.

Nullability

The following nullability types are currently supported for positional fields:

  • NOTNULLABLE: the field will never be considered to be NULL.
  • NULLIFEMPTY: the field will be considered to be NULL if the field contents are empty. Positional fields are considered empty when they are filled with spaces and/or tabs. The NULLIFEMPTY attribute cannot be applied to BIT fields.
  • NULLBYTEPREFIX: a separate byte prefixes the field and indicates whether the field is NULL. A byte prefix of \x01 indicates that the field contains a value, and a byte prefix of \x00 indicates that the field does not contain a value (i.e. it is NULL).

If nullability is not explicitly specified for a displayable field, the default nullability setting will depend on the /NULLIF option. If the /NULLIF option is specified with the EMPTY keyword for the corresponding displayable data type, the field will be treated as NULL if it is empty. If the /NULLIF option is not specified, or if the EMPTY keyword is not specified for the corresponding displayable data type, the field will be considered not nullable. For a list of displayable data types, see the /NULLIF option.

If nullability is not explicity specified for a binary field, it will default to NOTNULLABLE.

Bit fields with NULLBYTEPREFIX must start on a byte boundary.

Examples

/recordlayout employee { filler 2, emp_id integer 4, filler 23, 
emp_name character 40 }
This option describes an input record layout employee. filler 2 and filler 23 cover the input record fields that will not be referenced in other options. The option identifies by name only those fields that will be needed elsewhere in the task. The first of these, emp_id, starts at position 3 and is a 4-byte binary integer. The second field, emp_name, is a 40-byte character field starting at position 30.
/RECORDLAYOUT customer { cust_code int 2, regular_payment uint 2, 
month_amount uint 2 repeat 12,month_paid bit repeat 12, filler 4 bits, 
date_received char 6 repeat 12, 
reminder_sent bit repeat 12 }
The record layout customer described by this option consists of seven fields, six named fields and one filler. The first field, cust_code, starts at the first byte of the record and is a 2-byte long signed binary integer. This is followed at byte 3 by the second field, regular_payment, which is a 2-byte unsigned integer. The remaining named fields define arrays in the record layout. The month_amount array defines an array containing twelve items, as indicated by repeat 12. Each item in the array is a 2-byte unsigned binary integer. The first item in the array immediately follows regular_payment at byte 5. The complete array occupies bytes 5 through 28. This is followed by the month_paid array, which consists of twelve items each of which is 1 bit long and occupies from the first bit of the 29th byte through the 4th bit of the 30th byte. The remaining 4 bits of the 30th byte are described by filler 4 bits. Bytes 31 through 102 of the record are occupied by the date_received array, consisting of twelve 6-byte character items. The record layout is completed by the reminder_sent array, which contains 12 items, each of which is 1 bit. The reminder_sent field occupies from the first bit of the 103rd byte through the 4th bit of the 104th byte. The remaining 4 bits of the 104th byte are a filler.
/RECORDLAYOUT book { book_id an 6, author {first_name 20, middle_initial 1, 
last_name 20 }, title 50, publisher 40, date_published datetime (mm/dd/year) 10 }
This /RECORDLAYOUT option defined the record layout book that describes an input record containing information on one book in a library. The record starts with a unique identifier for the book (book_id, a 6-byte unsigned decimal number). Starting in byte 7 of the record is the author information. This consists of the 20-character first_name field, the 1- character middle_initial field and the 20-character last_name field. The author information can also be referenced as a 41-character field, author, which is a com­posite field consisting of the first name, middle initial and last name. The 50-char­acter title field, the 40-character publisher field, and the 10-byte date_published field complete the record layout. date_published contains a date consisting of the month with any leading zero removed, the day with sup­pressed leading zero and the 4-digit year. The three components are separated by slashes.
/recordlayout customer
{ cust_code int 2, regular_payment uint 2, 
  monthly_account 
     {month_amount uint 2, 
	  date_received datetime (mm0/dd0/year), 
	  month_paid bit, 
	  reminder {reminder_sent bit, 
	            filler 7 bits, 
				date_sent datetime(mm0/dd0/year) 10 
				}repeat 2, 
				}repeat 12
}

The /RECORDLAYOUT option in this example defines the record layout customer that describes a record containing one year’s payment records for a customer. The first field in the record is a customer’s identifier (cust_code), held as a 2-byte binary integer. The amount of the cus­tomer’s regular payment amount (regular_payment) is held in bytes 3 and 4 as an unsigned binary integer.

Starting in byte 5 is the 12-entry array monthly_account. Each monthly_account entry contains the payment record for a single month. Each month’s entry is 35 bytes long and consists of the amount received for the monthly payment (month_amount, a 2-byte unsigned binary integer), the date payment was received (date_received, a 10-byte date field), a 1-bit flag indicating if the monthly payment is completely paid (month_paid), and a 2-entry reminder array. Each of the reminder array entries is 11 bytes long and consists of a 1-bit reminder_sent flag, a 7-bit filler and a 10-byte date_sent field. The length of the record represented by this record layout is 424 bytes.
/RECORDLAYOUT client { client_id 10 en notnullable, client_name 30 nullifempty, 
address 50, zip_code int 4 nullbyteprefix }
The option in this example describes an input layout client containing client information. The field client_id will never be considered to be NULL. The field client_name will be considered NULL if the field contains only spaces and/or tabs. The field address will be considered NULL only if the /NULLIF CHARACTER EMPTY option is present and the field contains only spaces and/or tabs. The field zip_code is prefixed with a single byte indicating whether or not the data is NULL. The null byte indicator will be in position 91 and the integer data for the zip_code field will be be in positions 92-95.
/recordlayout purchase_order { purchase_code 12, purchase_country 30, 
purchase_description 400 encoding utf-32le }
In this record layout, the character fields purchase_code and purchase_country have no explicit character encoding associated with the field and purchase_description will have the data encoded in a UTF-32, little-endian character encoding.
/RECORDLAYOUT cust_code en, regular_payment en, month_amount en repeat 12, 
month_paid repeat 12, date_received repeat 12, reminder_sent repeat 12 
determinedby cust_code

The record layout described by this option consists of 50 fields. The first field, cust_code, is a numeric field. The second field, regular_payment, is also numeric. The remaining fields define arrays in the record layout. The month_amount array defines an array containing twelve delimited fields, as indicated by repeat 12. Each field in the array is numeric. The first field in the array immediately follows regular_payment. The complete array occupies fields 3 through 14. This is followed by the month_paid array, which consists of the character fields 15 - 26. Fields 27 through 38 of the record are occupied by the date_received array. The reminder_sent array is a variable length array that can contain up to 12-character fields. The determinedby field is required when defining a variable length array because it contains the size of the array. The determinedby field, cust_code, is positioned before the variable length array in the input record layout; its value is numeric and must be less than or equal to the maximum number of element occurrences in the array.