/DBOUTPUT - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To define a database table as a target of records.

Format

/DBOUTPUT database {table_load|sql_load} [options] [parameters]

where

table_load = TABLE dbtable [disposition] [teradata_settings] COLUMNS columnlist
sql_load = SQL sql_textvalue [,value...] 
disposition = {INSERT } {TRUNCATEANDINSERT } {MATCH match_type [update_clause] [insert_clause]} {CDC MATCH match_type CDCOPERATIONFIELD cdcoperation_field_name CDCSEQUENCEFIELD cdcsequence_field_name }
match_type = {PRIMARYKEY | CONDITION expression}
expression = column = value [AND column = value...]
update_clause = WHENMATCHED UPDATE {ALL } {column [,column...]}
insert_clause = WHENNOTMATCHED INSERT ALL
teradata_settings = [UTILITY utility_name] [SESSIONS sessions] [INSTANCES instances] [SLEEP sleep] [TENACITY tenacity] [ERRLIMIT errlimit] [WORKDATABASE work_database] [GENERATEJOBSCRIPT]
columnlist = column=value [,column=value...]
value = fieldname | valuename}
options =

[COMMITINTERVAL {number RECORDS | ENDOFTASK}]

[ABORTIF ERROR]

parameters = PARAMETERS parameter [,parameter...]
parameter = parameter_nameparameter_value

Arguments

database The name or alias of a database identified by a /DBCONNECTION option You must use the alias if you defined one. The name can be in the form of a string or identifier.
dbtable The name of the table or view in database that is the target of records. The name can be in the form of a string or identifier.
column The name of a column in the table. For elemenary fields, column can be a string or an identifier. See Constants. For additional information on naming and formatting conventions, see the Connect help topics under Syntax Reference. For composite fields or array fields, column must be an identifier. The column name must be fully qualified; the field separator between the subfields and element fields must be a period, ‘.’. When specifying an entire array, do not include square brackets, ‘[]’. For additional information on identifiers, see the Connect help topic, "Naming conventions for dmexpress command options." When mapping to Hive structures and arrays, refer to Mapping composite fields and array fields to Hive target structures and arrays. In columnlist, column is the column to which field is written. In the UPDATE clause, column is a column from columnlist that is updated. 
sql_text A SQL statement in the form of a string. The SQL statement describes the destination of records. You can specify the string in any valid format. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.
cdcoperation_field_name

The name of the Connect CDC for z/OS source field that stores the value applicable to the operation, which was performed on the originating source record accessed from the Connect ETL capture agent in the read phase of Connect ETL Change Data Capture (CDC).

The values applicable to the operation include the following: I - Insert; D - Delete; B - Before update; and U - Update.

By default, these values are stored in the Connect CDC for z/OS metadata field, DMXCDC_OPERATIONFIELD.

After Connect CDC for z/OS batch processing, the value applicable to the operation within each record of the Connect CDC for z/OS source dictates how the record is applied to the target database table.

Note: For CDC, match_type must be based on a conditional expression. After the keyword MATCH, you must specify CONDITION.

See CDC example and see the Connect help topic, "Target Database Table dialog."

cdcsequence_ field_name

The name of the Connect CDC for z/OS source field that stores the value, which Connect for Big Data references to determine the order in which to consolidate occurrences of the same record in the Connect CDC for z/OS source during Connect CDC for z/OS batch processing.

By default, the name of the Connect CDC for z/OS metadata field that stores the referenced value is DMXCDC_SEQUENCEFIELD.

See CDC example and the Connect help topic, "Target Database Table dialog."

utility_name The name of the Teradata utility that should be used for the load – "fastload", "mload", "tpump", "tptload", "tptupdate" or "tptstream".
sessions The utility "sessions" as defined in the Teradata Utility Reference. If not specified, the utility default is in effect for fastload, multiload and TPT, and 1 is used for tpump.
instances Number of TPT operator instances, as defined in the Teradata Utility Reference. Valid only when utility_name is "tptload", "tptupdate" or "tptstream". If not specified, the default is 1.
sleep The utility "sleep" as defined in the Teradata Utility Reference. If not specified, the utility default will be in effect.
tenacity The utility "tenacity" as defined in the Teradata Utility Reference. If not specified, the utility default will be in effect.
errlimit The utility "errlimit" as defined in the Teradata Utility Reference. If not specified, the appropriate value is passed to the utility such that it aborts on the first rejected record.
work_database The database where dmexpress temporary tables and utility log, error and work tables are to be created. When not specified, the database of the target table will be used. If the database of the target table is empty, i.e. The target table is not fully qualified, then the work tables will not be qualified with any database and will be created in the default database.
fieldname The name of a field or column. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.
valuename The value of a field or column
number The number of rows to be used for the commit interval.
parameter_name The name of the parameter.
parameter_value The value of the parameter as a quoted string.

Location

The option may appear anywhere in the task definition.

Defaults

Teradata Utility

By default, Connect ETL decides which Teradata utility to use based on the task definition and the target table properties. Any utility settings that are specified apply to the selected utility.

Notes

Mapping Fields to Database Columns

When a table is specified, you would map the field to database columns by name. When an SQL statement is specified, you would provide an ordered list of fields that map to the columns specified by the SQL statement.

You can map a field to a database column as long as their data types are compatible. See the appropriate topic under Data types reference/DBMS data types in the Connect help for details on compatible data types.

Mapping composite fields and array fields to Hive target structures and arrays

Mapping considerations exist when mapping from source composite fields and array fields to corresponding Hive target database table column structures and arrays.

Hive structures

When mapping from a source composite field to a corresponding Hive target structure, consider the following:

  • You cannot directly map an entire source composite field to a target structure in one comprehensive mapping.
  • Each individual member of a source composite field must be mapped to a corresponding member in the target structure.
  • If a given subfield of a composite field is not mapped, a NULL is inserted in the corresponding member of the target structure.

Hive arrays

When mapping from a source array field to a corresponding Hive target array, consider the following:

  • A source array field can be a fixed length array or a variable length array from a delimited record layout or positional record layout and can be mapped to a target variable length array.
  • Individual elements of an array are referenced using enclosing square brackets with a subscript; for example, [2].
  • When specifying an entire array, do not include enclosing square brackets with a subscript for any of the array fields of the chain of fields.
  • The number of dimensions of a source array field must be equal to the number of dimensions of the corresponding target array.
  • The number of array dimensions is indicated by the number of array fields without a subscript in the chain of fields.
As no subscripts are specified, the following chain of fields is an example of a Hive three-dimensional array:
location.monthly_data.daily_data.rainfall
In adding a subscript to the chain of fields, the Hive three-dimensional array becomes a two-dimensional array:
location.monthly_data[3].daily_data.rainfall
In adding another subscript to the chain of fields, the Hive two-dimensional array becomes a one-dimensional array:
Location[5].monthly_data[3].daily_data.rainfall
  • Provided that the elements of a source array field are of the same data type as the elements of a target array, you can directly map an entire source array field to a target array in one comprehensive mapping.

Note: When mapping an entire source array field to a Hive database table column that is an elementary text field, not an array field, the source array field is mapped as a single text field.

  • One, all, or any combination of individual elements of a source array field can be mapped to a corresponding element or elements in a target array.

Example: /DBOUTPUT mapping specification

Consider the following delimited record layout and Hive target database table specifications:
/DELIMITEDRECORDLAYOUT recordlayout1
{
field1,
field2 EN,
field9_compos {
field3,
field4,
field5 EN,
field6 EN REPEAT 10 DETERMINEDBY recordlayout1.field9.field5
},
field10_compos {
subfield1 repeat 10,
subfield2 EN,
subfield3
},
field11 EN repeat 5
}
The Hive target database table has 3 columns, which are defined as follows:
c1 string
a1 array<struct<c2:string,c3:string, c4:float,a2:array<int>>>
s1 struct<a3:array<int>, c4:string, c5:string>

As per the record layout and the definition of the Hive database table columns, the following /DBOUTPUT mapping specifications are either valid or invalid:

  • Valid
    c1 = field1
    s1.c4 = field9_compos.field3
  • Invalid

Example A

The following specification is invalid because a1 is a Hive structure and field10_compos is a composite field and only element fields of Connect ETL composite fields can be mapped to element fields of Hive structures:
a1 = field10_compos

Example B

The following specification is invalid because field11 is not a Hive column:
field11 = a1[0].a2
Reversing the equation such that a1[0].a2 is on the left side of the equal sign and field11 is on the right, corrects the specification. Mapping field11, which is a Connect ETL one-dimensional array of data type edited numeric, to a1[0].a2, which is a Hive one-dimenational array of data type integer, results in the following valid mapping specification:
a1[0].a2 = field11. 

Example C

The following specification is invalid because field11 is a Connect ETL one-dimensional array and a1.a2 is a Hive two-dimensional array:
a1.a2 = field11

The number of dimensions of a source array field must be equal to the number of dimensions of the corresponding target array.

Example D

Although the Hive side of the equation is a structure of arrays and the Connect ETL side of the equation is a composite of arrays and the number of array dimension on both sides of the equation are equal, the following specification is invalid because the data types do not match:
s1.a3 = field10_compos.subfield1

The Connect ETL composite of arrays, field10_compos.subfield1, is comprised of an array of text; the Hive structure of arrays, s1.a3, is comprised of an array of integers.

Treatment of Mapped ASCII Fields as Locale

By default, any field or value defined in Connect ETL in ASCII encoding is output to the database as if encoded with the database client character set. Because ASCII characters are a subset of many locale character sets, this behavior improves performance by eliminating the need for a conversion from ASCII to locale encoding.

Nullable columns

When nullable data is output to a database table column that is also nullable, the null indicator is preserved. When nullable data is output to a database table column that is not nullable, the record is rejected when the value is NULL. To avoid this, you can define an explicit representation for NULL values by using a conditional /VALUEoption, and then referencing the named value definition instead of referencing the nullable column directly.

Existence of target database table

The target table must exist in the database prior to running the task.

Disposition

You can specify the action Connect ETL is to take when there are already rows in the database table. See Supported dispositions for target database tables in the Connect help for information on dispositions supported for each DBMS.

Insert only

To preserve any rows already in the table and insert the target records as additional rows, specify the INSERT disposition. You need to ensure that when Connect ETL adds the new rows, no unique key conflicts occur. This is the default disposition.

Truncate and insert

To remove all the rows that exist in the table prior to running your task, and then insert the target records, specify the TRUNCATEANDINSERT disposition. If a task has multiple instances of the same database table being used as a target, then TRUNCATEANDINSERT can not be used in any of the instances.

Update and/or insert

To have Connect ETL decide whether target records should be inserted as new rows or should update existing rows, specify MATCH for the disposition. If MATCH PRIMARYKEY is specified, then each target record is matched to existing records in the table based on primary key or primary index. If MATCH CONDITION is specified, target records are matched based on the specified conditional expression. All columns that are used to match target records must be mapped in columnlist. You can then specify the action Connect ETL is to take:

  • To update existing rows, specify the WHENMATCHED UPDATE clause. You can choose to update all the columns that are part of the columnlist by specifying UPDATE ALL, or you can specify the subset of columns that are to be updated.
  • To insert new rows, specify the WHENNOTMATCHED INSERT ALL clause.

If a task uses the same database table as target multiple times but with different dispositions, for example Insert only for one target and Update and/or insert for another, the sets of records output to each target must be disjoint. Otherwise, the results are undefined.

Apply changes (CDC)

Indicates that the type of operation performed on the target database table occurs during the apply phase of Connect ETL Change Data Capture (CDC).

During the apply phase of Connect CDC for z/OS, Connect executes each of the downstream Connect tasks, through which the changed data is processed via Connect CDC for z/OS batch processing.

Using Connect CDC for z/OS metadata and primary keys from the source database tables, Connect for Big Data processes the delta changes among the records within a given Connect CDC for z/OS source by consolidating multiple occurrences of the same record to one record. For each downstream Connect for Big Data task, Connect for Big Data applies each of the records from each Connect CDC for z/OS source to the target database table.

Commit behavior

If COMMITINTERVAL is not specified, the commit behavior is determined by Connect ETL at runtime. If COMMITINTERVAL ENDOFTASK is specified, Connect ETL commits the data into the target table at the end of the task. If COMMITINTERVAL number RECORDS is specified, Connect ETL performs a commit after every number of records.

If the disposition is Truncate and insert, the truncate is committed first, separately from the insert. Therefore, if the task aborts, it is possible that the target table is empty because truncate was committed.

If COMMITINTERVAL number RECORDS is specified, Connect ETL may not be able to use the fastest write method.

Teradata utility settings

For Teradata targets, you can specify which Teradata utility to use, and change its settings. Use GENERATEJOBSCRIPT to create a file containing the utility commands when the task runs.

Refer to the Teradata Utilities Reference for more information on the utilities and settings.

Error handling

By default, Connect ETL reports any errors that occur while loading records into the database table, and execution continues. If you want dmexpress to abort in this case, specify ABORTIF ERROR. The state of the target table depends on the commit behavior.

Using a target database table in a program

You cannot use /DBOUTPUT when running dmexpress from a program.

Obtaining Target Record Number

To obtain a target record number, create and call the Connect ETL target record function, TargetRecordNumber().

Apache Avro/Parquet backed Hive tables

For Apache Avro or Apache Parquet backed Hive tables, where the SQL specifies STORED AS AVRO or STORED AS PARQUET, respectively, the following optional parameters are available:

Parameter Name Parameter Value Default
Avro CODEC NULL, DEFLATE, or SNAPPY NULL
BLOCKSIZE <value in KB> 16
Parquet CODEC UNCOMPRESSED or GZIP UNCOMPRESSED
BLOCKSIZE <value in MB> 128
PAGESIZE <value in MB> 8
DICTIONARYENCODING TRUE or FALSE FALSE

Examples

/DBOUTPUT transaction_database table sorted_transactions insert columns id = transaction_id 
This option specifies that target records are to be inserted into the table sorted_ transactions in the database transaction_database. The field transaction_id is mapped to the database column id.
/DBOUTPUT transaction_database table sorted_transactions truncateandinsert columns client_name = full_name, id = transaction_id
The option specifies that all rows in the table sorted_transactions in the database transaction_database are to be deleted prior to inserting the target records into the table. The field full_name is mapped to the database column client_name and the field transaction_id is mapped to the database column id.
/DBOUTPUT productiondb SQL " Insert into transaction_summary values(?, ?, ?)" ss_customer_id, ss_customer_name, ss_total_purchases
This option specifies that the contents of the table transaction_summary in the ODBC database productionbdb are to be updated according to the text in the SQL statement. The option specifies an explicit mapping, using parameter markers, from Connect ETL fields ss_customer_id, ss_customer_name and ss_total_purchases to columns in the table.
/DBOUTPUT productiondb SQL 
       " Insert into transaction_summary 
	     values(:1,:2,:3)" 
		     ss_customer_id, 
			 ss_customer_name, 
			 ss_total_purchases
This option specifies that the contents of the table transaction_summary in the Oracle database productionbdb are to be updated according to the text in the SQL statement. The option specifies an explicit mapping, using parameter markers, from Connect ETL fields ss_customer_id, ss_customer_name and ss_total_purchases to columns in the table.
/DBOUTPUT productiondb TABLE transactions 
      match primarykey 
	        whenmatched update department, price 
			whennotmatched insert all 
			     columns productID = fieldID, 
				         department = fieldDept, 
						 price = fieldPrice
The option specifies that the output records are either updated or inserted into the table transactions in the database productiondb. The primary key for the table, productID, is matched against the output field fieldID to determine whether the record is updated or inserted.
/DBOUTPUT productiondb TABLE customer 
   match condition customer.BirthDate = recLayout.BirthDate 
           AND customer.Name = recLayout.Name 
    whenmatched update Address 
	whennotmatched insert all 
columns customer.BirthDate = recLayout.BirthDate, 
        customer.Name = recLayout.Name, 
		customer.Address = recLayout.Address
The option specifies that the output records are either updated or inserted into the table customer in the database productiondb. When the columns BirthDate and Name match the target fields of the same name, the record is considered to be matched, and the Address field is updated. Otherwise, a new record is created with the Name, BirthDate and Address.
/DBOUTPUT customer_db TABLE aggregated_daily INSERT 
          PARAMETERS MAXPARALLELSTREAMS "3", TEMPDIRECTORY 
		  "/tmp" 
		  COLUMNS name = customer_name, 
		          date = transaction_date, 
				  amount = total_sales_on_date, 
				  num_transactions = transaction_count
Greenplum, Netezza, and Vertica database specific: This option specifies that the maximum number of parallel streams that can be established to load data to the Greenplum, Netezza, or Vertica database table aggregated_daily is 3 and the directory to which error log records are written is /tmp.
/DBOUTPUT customer_db TABLE aggregated_daily INSERT 
          PARAMETERS MAXPARALLELSTREAMS "3", CODEC "snappy", 
		  BLOCKSIZE "20" 
		  COLUMNS name = customer_name, 
		  date = transaction_date, 
		  amount = total_sales_on_date, 
		  num_transactions = transaction_count
Vertica database specific: This option enables the Vertica COPY ENFORCELENGTH, which changes the behavior of the COPY LOCAL load so that records with data exceeding the maximum length of the corresponding character type column are rejected. For additional information, see the “Connecting to Vertica Databases” topic in the Connect help.
/DBOUTPUT customer_db TABLE aggregated_daily TRUNCATEANDINSERT 
   PARAMETERS VERTICACOPYENFORCELENGTH "true" 
   COLUMNS name = customer_name, 
        date = transaction_date,
      amount = total_sales_on_date, 
	  num_transactions = transaction_count
Hive database specific: This option specifies that the maximum number of parallel streams that can be established to load data to the Avro-backed Hive table aggregated_daily is 3, the compression algorithm is snappy, and the block size in kilobytes of the Avro objects is 20.
/DBOUTPUT customer_db TABLE aggregated_daily INSERT 
          PARAMETERS MAXPARALLELSTREAMS "3", LOADMETHOD 
		  "Conventional" 
		  COLUMNS name = customer_name, 
		          date = transaction_date, 
				  amount = total_sales_on_date, 
				  num_transactions = transaction_count
Oracle database specific: This option specifies that the maximum number of parallel streams that can be established to load data to the Oracle database table aggregated_daily is 3 and the load method used is the conventional path load.
/DBOUTPUT HiveDB TABLE dmxcdc_t1 CDC 
          MATCH CONDITION source_id = dmxcdc_t1.id 
		  CDCOPERATIONFIELD dmxcdc_operationfield 
		  CDCSEQUENCEFIELD dmxcdc_sequencefield 
		  COLUMNS sourcef1 = hivef1, 
		          sourcef2 = hivef2

This option specifies that records output from Connect CDC for z/OS batch processing are to be applied to target table, dmxcdc_t1, of Hive target database, HiveDB.

The sequence field, which is stored in dmxcdc_sequencefield, is determines the order in which to consolidate duplicates of the same record in the Connect CDC for z/OS source during Connect CDC for z/OS batch processing.

The value applicable to the operation, which is stored in dmxcdc_operationfield, is used to determine how each of the the consolidated source records is applied to the applicable target columns. The values applicable to the operation field of a Connect CDC for z/OS source can include the following: I – Insert; D – Delete; B – Before update; and U – Update.

The ID column of the source record, source_id, is compared against the ID columns of the rows of the target database table, dmxcdc_t1.id, to determine whether a match exists. Depending on the values stored in dmxcdc_operationfield, source records are applied to the applicable target columns: sourcef1 is applied to hivef1 and sourcef2 is applied to hivef2.