Purpose
To describe a layout of the input records in terms of named consecutive delimited fields.
Format
/DELIMITEDRECORDLAYOUT | name { field [, field ... ]} |
where
field | = | {composite_field } {elementary_field} |
composite_field | = | name { field [, field ... ]} [REPEAT number [DETERMINEDBY determined_by_field_name]] |
Note that the braces ({}) in composite_field and in the /DELIMITEDRECORDLAYOUT syntax definition must appear where shown. They do not indicate a choice of argument in this case.
elementary_field | = |
{name [format] [ENCODING encoding|character_set] [nullability] [REPEAT number [DETERMINEDBY determined_by_field_name]]} {FILLER [filler_count]} |
format | = | {CHARACTER } {number_format } {DATETIME (datetime_pattern)} {LOB } |
nullability | = | {NOTNULLABLE} {NULLIFEMPTY [EXCLUDE WHITESPACE]} |
number_format | = | EN | SN |
Arguments
name |
The name of the record layout or a field in the record layout, which you use to reference the record or the field in other dmexpress options. The assigned name must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
number | The number of times a field is repeated consecutively in the input record. |
determined_by_field_name | The name of the field in the input record layout, which is used to determine the size of the variable length array. This field must be positioned before the variable length array in the input record layout; its value must be numeric and must be less than or equal to the maximum number of element occurrences in the array as specified by number. |
filler count | The number of delimited fields that make up an unnamed portion of the record. You can omit the filler count to indicate a variable number of fields at the end of the record. |
encoding |
The encoding argument can be used for encoding a text field of data with one of the following built-in character sets: ASCII, EBCDIC, Locale, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, or UTF-32LE. Note: When Locale is specified, the encoding used is that which is defined by the system locale where dmexpress runs. This encoding value can be specified as a built-in character set keyword without quotes or as a string as outlined in File Name and Syntax Requirements. For additional information, see the following Connect help topics: "Data encoding" and "Text data type." |
character_set |
The character_set argument can be used for encoding a text field of data with one of the ICU library character sets. This character set value must be specified as a string as outlined in File Name and Syntax Requirements. For additional information, see the following Connect help topics: "Data encoding" and "Text data type." |
datetime_pattern | The pattern describing the contents of a date/time field. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
Location
This option may appear anywhere in the task definition.
Notes
To reference fields in the record layout, the record layout must first be associated with a source file.
Delimited record layouts as well as positional record layouts must be named and must have at least one defined layout field.
Describing the record layout
So that Connect ETL can extract key fields, reformat records, make selections based on field values, etc., it is necessary for you to provide a map of the data in the records input to your task. One of the ways you can provide such a map when the record is composed of delimited fields is through the /DELIMITEDRECORDLAYOUT option. The basic building block of the map is an elementary field, which describes one delimited field in an input record. An elementary field is not subdivided further. In many cases, your input record layout consists of a simple list of elementary fields.
When an elementary field in the input record is repeated several times in the record without any intervening fields or fillers, you can conveniently specify the field only once and add the REPEAT argument to indicate how many times it is repeated. A repeated field is referred to as an array. When all elements of an array are referenced as a single field, it is treated as a character field.
When two or more consecutive elementary fields in your input record are subdivisions of a larger field, it is sometimes convenient to name this larger field and refer to it in other options. A field made up of several elementary fields is referred to as a composite field. For example, a composite field student_name may consist of three elementary fields: first_name, middle_initial and last_name. In another option such as /REFORMAT you may be able to refer to one field student_name rather than its three elementary fields. In this example, you are not required to specify the composite field in addition to the elementary fields but doing so may save you time.
A composite field can have other composite fields as components, as well as elementary fields. For example, a composite field home_address may consist of four elementary fields: address, city, state and zip. Another composite field, personal_info, may consist of three fields: student_name (a composite field), home_address (a composite field) and telephone (an elementary field).
The elementary fields that make up a composite field do not have to have the same data types. You cannot assign a data type to a composite field. A composite field containing one or more elementary fields is treated as if it is of character type, irrespective of the data types of its component elementary fields. The record layout name can be referenced just like any other composite field.
When a composite field in the input record is repeated several times in the record without any intervening fields or fillers, you can conveniently specify the field only once and add the REPEAT argument to indicate how many times it is repeated.
Date/time fields
A date/time elementary field consists of a calendar date component and/or a time component. The contents are in printable form. For example, the date component may appear as month/day/year with the month as two digits, the day as 2 digits and the year as 4 digits. The time component may appear as hour:minute:second with the hour as two digits using the 24-hour clock, the minutes as 2 digits and the seconds as 2 digits. 5 minutes and 9 seconds after 8 o'clock in the morning of August 4, 2004 appears as 08/04/2004 08:05:09 for this format. You would represent this date/time datetime_pattern as (MM0/DD0/YEAR HH0:MI0:SE0). For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.
LOB fields
The content of LOB fields is the filename as a character string in locale encoding. If the file that this layout is associated with has a different encoding, then the contents of the LOB field will be converted to locale before accessing the data file. Unless an absolute path is specified in the filename, the files are assumed to be in the same location as the source. If multiple sources are defined, the files are located with respect to the first non-XML source.
Encoding
The endianness of the encoding schemes UTF-16 and UTF-32 is determined by the BOM setting of the first /INFILE for a non-XML file. If a UTF-16LE or UTF-16BE BOM is detected in the first non-XML file, any field defined as UTF-16 will inherit the corresponding endian order. Similarly, if a UTF-32LE or UTF-32BE BOM is detected in the first non-XML file, any field defined as UTF-32 will inherit the corresponding endian order. Otherwise any field defined as UTF-16 or UTF-32 will be assumed to have a big-endian order. When the encoding is not defined, the file encoding of the first non-XML file specified in /INFILE determines the field encoding.
Nullability
The following nullability types are supported:
- NOTNULLABLE: the field can never be NULL.
- NULLIFEMPTY [EXCLUDE WHITESPACE]: the field is considered NULL if the field contents have either an empty string or whitespace. Delimited fields are considered empty when they have a length of 0 (two delimiters next to each other), or when they are filled with spaces and/or tabs. The EXCLUDE WHITESPACE argument can be used to treat only an empty string as NULL.
If nullability is not explicitly specified, the default nullability setting will depend on the /NULLIF option. If the /NULLIF option is specified with the EMPTY keyword for the corresponding data type, the field will be treated as NULL if it is empty. If the /NULLIF option is not specified, or if the EMPTY keyword is not specified for the corresponding data type, the field will be considered not nullable.
Examples
/DELIMITEDRECORDLAYOUT layout1 { filler 1, emp_id en, filler 3, emp_name }
/DELIMITEDRECORDLAYOUT store { location char, filler 3, store_code char, manager_code char, emp_count en, filler 4, p_ind char }
/DELIMITEDRECORDLAYOUT employee { employee_id en, employee_name character encoding UTF-16 }
/DELIMITEDRECORDLAYOUT customer { cust_code en, regular_payment en, month_amount en repeat 12, month_paid repeat 12, date_received repeat 12, reminder_sent repeat 12 }
/DELIMITEDRECORDLAYOUT book { book-id en, author {first_name, middle_initial, last_name}, title, publisher, date_published datetime (mm/dd/year) }
/delimitedrecordlayout customer
{cust_code en, regular_payment en,
monthly_account
{month_amount en,
date_received datetime (mm0/dd0/year),
month_paid char,
reminder
{reminder_sent char,
date_sent datetime (mm0/dd0/year)
}repeat 2
}repeat 12
}
The option in this example defines the record layout customer, which describes a record containing one year's payment records for a customer. The first field in the record is a customer's identifier (cust_code, a number). The amount of the customer's regular payment amount (regular_payment, also a number) is held in the second field.
/DELIMITEDRECORDLAYOUT client { client_id en notnullable, client_name nullifempty, address }
/DELIMITEDRECORDLAYOUT purchase_order { purchase_code, purchase_country, purchase_description encoding utf-8 }
/DELIMITEDRECORDLAYOUT cust_code en, regular_payment en, month_amount en repeat 12, month_paid repeat 12, date_received repeat 12, reminder_sent repeat 12 determinedby cust_code
The record layout described by this option consists of 50 fields. The first field, cust_code, is a numeric field. The second field, regular_payment, is also numeric. The remaining fields define arrays in the record layout. The month_amount array defines an array containing twelve delimited fields, as indicated by repeat 12. Each field in the array is numeric. The first field in the array immediately follows regular_payment. The complete array occupies fields 3 through 14. This is followed by the month_paid array, which consists of the character fields 15 - 26. Fields 27 through 38 of the record are occupied by the date_received array. The reminder_sent array is a variable length array that can contain up to 12-character fields. The determinedby field is required when defining a variable length array because it contains the size of the array. The determinedby field, cust_code, is positioned before the variable length array in the input record layout; its value is numeric and must be less than or equal to the maximum number of element occurrences in the array.