/DELIMITEDRECORDLAYOUT - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To describe a layout of the input records in terms of named consecutive delimited fields.

Format

/DELIMITEDRECORDLAYOUT name { field [, field ... ]}

where

field = {composite_field } {elementary_field}
composite_field = name { field [, field ... ]} [REPEAT number [DETERMINEDBY determined_by_field_name]]

Note that the braces ({}) in composite_field and in the /DELIMITEDRECORDLAYOUT syntax definition must appear where shown. They do not indicate a choice of argument in this case.

elementary_field =

{name [format] [ENCODING encoding|character_set] [nullability] [REPEAT number [DETERMINEDBY determined_by_field_name]]}

{FILLER [filler_count]}

format = {CHARACTER } {number_format } {DATETIME (datetime_pattern)} {LOB }
nullability = {NOTNULLABLE} {NULLIFEMPTY [EXCLUDE WHITESPACE]}
number_format = EN | SN

Arguments

name

The name of the record layout or a field in the record layout, which you use to reference the record or the field in other dmexpress options.

The assigned name must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

number The number of times a field is repeated consecutively in the input record.
determined_by_field_name The name of the field in the input record layout, which is used to determine the size of the variable length array. This field must be positioned before the variable length array in the input record layout; its value must be numeric and must be less than or equal to the maximum number of element occurrences in the array as specified by number.
filler count The number of delimited fields that make up an unnamed portion of the record. You can omit the filler count to indicate a variable number of fields at the end of the record.
encoding

The encoding argument can be used for encoding a text field of data with one of the following built-in character sets: ASCII, EBCDIC, Locale, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, or UTF-32LE.

Note: When Locale is specified, the encoding used is that which is defined by the system locale where dmexpress runs.

This encoding value can be specified as a built-in character set keyword without quotes or as a string as outlined in File Name and Syntax Requirements.

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

character_set

The character_set argument can be used for encoding a text field of data with one of the ICU library character sets.

This character set value must be specified as a string as outlined in File Name and Syntax Requirements.

For additional information, see the following Connect help topics: "Data encoding" and "Text data type."

datetime_pattern The pattern describing the contents of a date/time field. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

Location

This option may appear anywhere in the task definition.

Notes

To reference fields in the record layout, the record layout must first be associated with a source file.

Delimited record layouts as well as positional record layouts must be named and must have at least one defined layout field.

Describing the record layout

So that Connect ETL can extract key fields, reformat records, make selections based on field values, etc., it is necessary for you to provide a map of the data in the records input to your task. One of the ways you can provide such a map when the record is composed of delimited fields is through the /DELIMITEDRECORDLAYOUT option. The basic building block of the map is an elementary field, which describes one delimited field in an input record. An ele­mentary field is not subdivided further. In many cases, your input record layout consists of a simple list of elementary fields.

When an elementary field in the input record is repeated several times in the record without any intervening fields or fillers, you can conveniently specify the field only once and add the REPEAT argument to indicate how many times it is repeated. A repeated field is referred to as an array. When all elements of an array are referenced as a single field, it is treated as a character field.

When two or more consecutive elementary fields in your input record are subdivi­sions of a larger field, it is sometimes convenient to name this larger field and refer to it in other options. A field made up of several elementary fields is referred to as a composite field. For example, a composite field student_name may consist of three elementary fields: first_name, middle_initial and last_name. In another option such as /REFORMAT you may be able to refer to one field student_name rather than its three elementary fields. In this example, you are not required to specify the composite field in addition to the elementary fields but doing so may save you time.

A composite field can have other composite fields as components, as well as ele­mentary fields. For example, a composite field home_address may consist of four elementary fields: address, city, state and zip. Another composite field, personal_info, may consist of three fields: student_name (a composite field), home_address (a composite field) and telephone (an elementary field).

The elementary fields that make up a composite field do not have to have the same data types. You cannot assign a data type to a composite field. A composite field containing one or more elementary fields is treated as if it is of character type, irrespective of the data types of its component elementary fields. The record layout name can be referenced just like any other composite field.

When a composite field in the input record is repeated several times in the record without any intervening fields or fillers, you can conveniently specify the field only once and add the REPEAT argument to indicate how many times it is repeated.

Date/time fields

A date/time elementary field consists of a calendar date component and/or a time component. The contents are in printable form. For example, the date component may appear as month/day/year with the month as two digits, the day as 2 digits and the year as 4 digits. The time component may appear as hour:minute:sec­ond with the hour as two digits using the 24-hour clock, the minutes as 2 digits and the seconds as 2 digits. 5 minutes and 9 seconds after 8 o'clock in the morning of August 4, 2004 appears as 08/04/2004 08:05:09 for this format. You would rep­resent this date/time datetime_pattern as (MM0/DD0/YEAR HH0:MI0:SE0). For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

LOB fields

The content of LOB fields is the filename as a character string in locale encoding. If the file that this layout is associated with has a different encoding, then the contents of the LOB field will be converted to locale before accessing the data file. Unless an absolute path is specified in the filename, the files are assumed to be in the same location as the source. If multiple sources are defined, the files are located with respect to the first non-XML source.

Encoding

The endianness of the encoding schemes UTF-16 and UTF-32 is determined by the BOM setting of the first /INFILE for a non-XML file. If a UTF-16LE or UTF-16BE BOM is detected in the first non-XML file, any field defined as UTF-16 will inherit the corresponding endian order. Similarly, if a UTF-32LE or UTF-32BE BOM is detected in the first non-XML file, any field defined as UTF-32 will inherit the corresponding endian order. Otherwise any field defined as UTF-16 or UTF-32 will be assumed to have a big-endian order. When the encoding is not defined, the file encoding of the first non-XML file specified in /INFILE determines the field encoding.

Nullability

The following nullability types are supported:

  • NOTNULLABLE: the field can never be NULL.
  • NULLIFEMPTY [EXCLUDE WHITESPACE]: the field is considered NULL if the field contents have either an empty string or whitespace. Delimited fields are considered empty when they have a length of 0 (two delimiters next to each other), or when they are filled with spaces and/or tabs. The EXCLUDE WHITESPACE argument can be used to treat only an empty string as NULL.

If nullability is not explicitly specified, the default nullability setting will depend on the /NULLIF option. If the /NULLIF option is specified with the EMPTY keyword for the corresponding data type, the field will be treated as NULL if it is empty. If the /NULLIF option is not specified, or if the EMPTY keyword is not specified for the corresponding data type, the field will be considered not nullable.

Examples

/DELIMITEDRECORDLAYOUT layout1 { filler 1, emp_id en, filler 3, emp_name }
This option describes an input record layout, layout1. filler 1 and filler 3 cover the input record fields that will not be referenced in other options. The option identifies by name only those fields that will be needed elsewhere in the task. The first of these, emp_id, is the second delimited field in the record and is numeric. The second named field, emp_name, is the sixth delimited field and contains character data.
/DELIMITEDRECORDLAYOUT store { location char, filler 3, store_code char, manager_code char, emp_count en, filler 4, p_ind char } 
The option describes an input record layout, store, with five named fields. The first field, location, contains character data. filler 3 covers the next 3 delimited fields, fields 2 - 4. The fifth field is the character field store_code. Another character field, manager_code, follows. The seventh field, emp_count, con­tains numeric data. filler 4 identifies an unnamed stretch of 4 fields. p_ind is the final field, the twelfth.
/DELIMITEDRECORDLAYOUT employee { employee_id en, employee_name character encoding UTF-16 }
The record layout, employee, described by this option consists of 2 fields. The first field, employee_id, is a numeric field. The second field, employee_name is a character field encoded in UTF-16.
/DELIMITEDRECORDLAYOUT customer { cust_code en, regular_payment en, month_amount en repeat 12, month_paid repeat 12, date_received repeat 12, reminder_sent repeat 12 }
The record layout, customer, described by this option consists of 50 fields. The first field, cust_code, is a numeric field. The second field, regular_payment, is also numeric. The remaining fields define arrays in the record layout. The month_amount array defines an array containing twelve delimited fields, as indicated by repeat 12. Each field in the array is numeric. The first field in the array immediately follows regular_payment. The complete array occupies fields 3 through 14. This is followed by the month_paid array, which consists of the character fields 15 - 26. Fields 27 through 38 of the record are occupied by the date_received array. The record layout is completed by the reminder_sent array, which contains 12-character fields.
/DELIMITEDRECORDLAYOUT book { book-id en, author {first_name, middle_initial, last_name}, title, publisher, date_published datetime (mm/dd/year) }
This /DELIMITEDRECORDLAYOUT option defines the record layout book, which describes an input record containing information on one book in a library. The record starts with a unique identifier for the book (book_id, a number). Starting at the second field of the record is the author information. This consists of the character first_name field, the middle_initial field and the last_name field. The author informa­tion can also be referenced as a multiple-delimited field, author, which is a composite field consisting of the first name, middle initial and last name. The character fields title and publisher and the date field date_published com­plete the record layout. date_published contains a date consisting of the month with any leading zero removed, the day with suppressed leading zero and the 4-digit year. The three components are separated by slashes.
/delimitedrecordlayout customer
              {cust_code en, regular_payment en, 
			   monthly_account 
			   {month_amount en, 
			   date_received datetime (mm0/dd0/year), 
			   month_paid char, 
			   reminder 
			   {reminder_sent char, 
			   date_sent datetime (mm0/dd0/year)
			   }repeat 2 
			   }repeat 12
			   }

The option in this example defines the record layout customer, which describes a record containing one year's payment records for a customer. The first field in the record is a customer's identifier (cust_code, a number). The amount of the customer's regular payment amount (regular_payment, also a number) is held in the second field.

Starting at the third field is the 12-entry array monthly_account. Each monthly_account entry contains the payment record for a single month. Each month's entry is 7 fields long and consists of the amount received for the monthly payment (month_amount, a number), the date payment was received (date_received), a flag indicating if the monthly payment is com­pletely paid (month_paid), and a 2-entry reminder array. Each of the reminder array entries has 2 fields and consists of a reminder_sent flag and the date_sent field. The record represented by this layout has 86 fields.
/DELIMITEDRECORDLAYOUT client { client_id en notnullable, client_name nullifempty, address }
The option describes a delimited input layout client containing client information. The field client_id is never treated as NULL. The field client_name is considered NULL if the field is empty. The field address is considered NULL only if the field is empty and the /NULLIF CHARACTER EMPTY option is present.
/DELIMITEDRECORDLAYOUT purchase_order { purchase_code, purchase_country, purchase_description encoding utf-8 }
This example describes a delimited input layout purchase_order containing international purchasing information in which the purchase_code and purchase_country have no explicit character encoding associated with the field's data and purchase_description will have the data encoded in UTF-8.
/DELIMITEDRECORDLAYOUT cust_code en, regular_payment en, month_amount en repeat 12, month_paid repeat 12, date_received repeat 12, reminder_sent repeat 12 determinedby cust_code

The record layout described by this option consists of 50 fields. The first field, cust_code, is a numeric field. The second field, regular_payment, is also numeric. The remaining fields define arrays in the record layout. The month_amount array defines an array containing twelve delimited fields, as indicated by repeat 12. Each field in the array is numeric. The first field in the array immediately follows regular_payment. The complete array occupies fields 3 through 14. This is followed by the month_paid array, which consists of the character fields 15 - 26. Fields 27 through 38 of the record are occupied by the date_received array. The reminder_sent array is a variable length array that can contain up to 12-character fields. The determinedby field is required when defining a variable length array because it contains the size of the array. The determinedby field, cust_code, is positioned before the variable length array in the input record layout; its value is numeric and must be less than or equal to the maximum number of element occurrences in the array.