Purpose
To define a database table as a target of records.
Format
/DBOUTPUT database {table_load|sql_load} [options] [parameters]where
table_load | = | TABLE dbtable [disposition] [teradata_settings] COLUMNS columnlist |
sql_load | = | SQL sql_textvalue [,value...] |
disposition | = | {INSERT } {TRUNCATEANDINSERT } {MATCH match_type [update_clause] [insert_clause]} {CDC MATCH match_type CDCOPERATIONFIELD cdcoperation_field_name CDCSEQUENCEFIELD cdcsequence_field_name } |
match_type | = | {PRIMARYKEY | CONDITION expression} |
expression | = | column = value [AND column = value...] |
update_clause | = | WHENMATCHED UPDATE {ALL } {column [,column...]} |
insert_clause | = | WHENNOTMATCHED INSERT ALL |
teradata_settings | = | [UTILITY utility_name] [SESSIONS sessions] [INSTANCES instances] [SLEEP sleep] [TENACITY tenacity] [ERRLIMIT errlimit] [WORKDATABASE work_database] [GENERATEJOBSCRIPT] |
columnlist | = | column=value [,column=value...] |
value | = | fieldname | valuename} |
options | = |
[COMMITINTERVAL {number RECORDS | ENDOFTASK}] [ABORTIF ERROR] |
parameters | = | PARAMETERS parameter [,parameter...] |
parameter | = | parameter_nameparameter_value |
Arguments
database | The name or alias of a database identified by a /DBCONNECTION option You must use the alias if you defined one. The name can be in the form of a string or identifier. |
dbtable | The name of the table or view in database that is the target of records. The name can be in the form of a string or identifier. |
column | The name of a column in the table. For elemenary fields, column can be a string or an identifier. See Constants. For additional information on naming and formatting conventions, see the Connect help topics under Syntax Reference. For composite fields or array fields, column must be an identifier. The column name must be fully qualified; the field separator between the subfields and element fields must be a period, ‘.’. When specifying an entire array, do not include square brackets, ‘[]’. For additional information on identifiers, see the Connect help topic, "Naming conventions for dmexpress command options." When mapping to Hive structures and arrays, refer to Mapping composite fields and array fields to Hive target structures and arrays. In columnlist, column is the column to which field is written. In the UPDATE clause, column is a column from columnlist that is updated. |
sql_text | A SQL statement in the form of a string. The SQL statement describes the destination of records. You can specify the string in any valid format. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
cdcoperation_field_name |
The name of the Connect CDC for z/OS source field that stores the value applicable to the operation, which was performed on the originating source record accessed from the Connect ETL capture agent in the read phase of Connect ETL Change Data Capture (CDC). The values applicable to the operation include the following: I - Insert; D - Delete; B - Before update; and U - Update. By default, these values are stored in the Connect CDC for z/OS metadata field, DMXCDC_OPERATIONFIELD. After Connect CDC for z/OS batch processing, the value applicable to the operation within each record of the Connect CDC for z/OS source dictates how the record is applied to the target database table. Note: For CDC, match_type must be based on a conditional expression. After the keyword MATCH, you must specify CONDITION. See CDC example and see the Connect help topic, "Target Database Table dialog." |
cdcsequence_ field_name |
The name of the Connect CDC for z/OS source field that stores the value, which Connect for Big Data references to determine the order in which to consolidate occurrences of the same record in the Connect CDC for z/OS source during Connect CDC for z/OS batch processing. By default, the name of the Connect CDC for z/OS metadata field that stores the referenced value is DMXCDC_SEQUENCEFIELD. See CDC example and the Connect help topic, "Target Database Table dialog." |
utility_name | The name of the Teradata utility that should be used for the load – "fastload", "mload", "tpump", "tptload", "tptupdate" or "tptstream". |
sessions | The utility "sessions" as defined in the Teradata Utility Reference. If not specified, the utility default is in effect for fastload, multiload and TPT, and 1 is used for tpump. |
instances | Number of TPT operator instances, as defined in the Teradata Utility Reference. Valid only when utility_name is "tptload", "tptupdate" or "tptstream". If not specified, the default is 1. |
sleep | The utility "sleep" as defined in the Teradata Utility Reference. If not specified, the utility default will be in effect. |
tenacity | The utility "tenacity" as defined in the Teradata Utility Reference. If not specified, the utility default will be in effect. |
errlimit | The utility "errlimit" as defined in the Teradata Utility Reference. If not specified, the appropriate value is passed to the utility such that it aborts on the first rejected record. |
work_database | The database where dmexpress temporary tables and utility log, error and work tables are to be created. When not specified, the database of the target table will be used. If the database of the target table is empty, i.e. The target table is not fully qualified, then the work tables will not be qualified with any database and will be created in the default database. |
fieldname | The name of a field or column. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
valuename | The value of a field or column |
number | The number of rows to be used for the commit interval. |
parameter_name | The name of the parameter. |
parameter_value | The value of the parameter as a quoted string. |
Location
The option may appear anywhere in the task definition.
Defaults
Teradata Utility
By default, Connect ETL decides which Teradata utility to use based on the task definition and the target table properties. Any utility settings that are specified apply to the selected utility.
Notes
Mapping Fields to Database Columns
When a table is specified, you would map the field to database columns by name. When an SQL statement is specified, you would provide an ordered list of fields that map to the columns specified by the SQL statement.
You can map a field to a database column as long as their data types are compatible. See the appropriate topic under Data types reference/DBMS data types in the Connect help for details on compatible data types.
Mapping composite fields and array fields to Hive target structures and arrays
Mapping considerations exist when mapping from source composite fields and array fields to corresponding Hive target database table column structures and arrays.
Hive structures
When mapping from a source composite field to a corresponding Hive target structure, consider the following:
- You cannot directly map an entire source composite field to a target structure in one comprehensive mapping.
- Each individual member of a source composite field must be mapped to a corresponding member in the target structure.
- If a given subfield of a composite field is not mapped, a NULL is inserted in the corresponding member of the target structure.
Hive arrays
When mapping from a source array field to a corresponding Hive target array, consider the following:
- A source array field can be a fixed length array or a variable length array from a delimited record layout or positional record layout and can be mapped to a target variable length array.
- Individual elements of an array are referenced using enclosing square brackets with a subscript; for example, [2].
- When specifying an entire array, do not include enclosing square brackets with a subscript for any of the array fields of the chain of fields.
- The number of dimensions of a source array field must be equal to the number of dimensions of the corresponding target array.
- The number of array dimensions is indicated by the number of array fields without a subscript in the chain of fields.
location.monthly_data.daily_data.rainfall
location.monthly_data[3].daily_data.rainfall
Location[5].monthly_data[3].daily_data.rainfall
- Provided that the elements of a source array field are of the same data type as the elements of a target array, you can directly map an entire source array field to a target array in one comprehensive mapping.
Note: When mapping an entire source array field to a Hive database table column that is an elementary text field, not an array field, the source array field is mapped as a single text field.
- One, all, or any combination of individual elements of a source array field can be mapped to a corresponding element or elements in a target array.
Example: /DBOUTPUT mapping specification
/DELIMITEDRECORDLAYOUT recordlayout1
{
field1,
field2 EN,
field9_compos {
field3,
field4,
field5 EN,
field6 EN REPEAT 10 DETERMINEDBY recordlayout1.field9.field5
},
field10_compos {
subfield1 repeat 10,
subfield2 EN,
subfield3
},
field11 EN repeat 5
}
c1 string
a1 array<struct<c2:string,c3:string, c4:float,a2:array<int>>>
s1 struct<a3:array<int>, c4:string, c5:string>
As per the record layout and the definition of the Hive database table columns, the following /DBOUTPUT mapping specifications are either valid or invalid:
- Valid
c1 = field1 s1.c4 = field9_compos.field3
- Invalid
Example A
a1 = field10_compos
Example B
field11 = a1[0].a2
a1[0].a2 = field11.
Example C
a1.a2 = field11
The number of dimensions of a source array field must be equal to the number of dimensions of the corresponding target array.
Example D
s1.a3 = field10_compos.subfield1
The Connect ETL composite of arrays, field10_compos.subfield1, is comprised of an array of text; the Hive structure of arrays, s1.a3, is comprised of an array of integers.
Treatment of Mapped ASCII Fields as Locale
By default, any field or value defined in Connect ETL in ASCII encoding is output to the database as if encoded with the database client character set. Because ASCII characters are a subset of many locale character sets, this behavior improves performance by eliminating the need for a conversion from ASCII to locale encoding.
Nullable columns
When nullable data is output to a database table column that is also nullable, the null indicator is preserved. When nullable data is output to a database table column that is not nullable, the record is rejected when the value is NULL. To avoid this, you can define an explicit representation for NULL values by using a conditional /VALUEoption, and then referencing the named value definition instead of referencing the nullable column directly.
Existence of target database table
The target table must exist in the database prior to running the task.
Disposition
You can specify the action Connect ETL is to take when there are already rows in the database table. See Supported dispositions for target database tables in the Connect help for information on dispositions supported for each DBMS.
Insert only
To preserve any rows already in the table and insert the target records as additional rows, specify the INSERT disposition. You need to ensure that when Connect ETL adds the new rows, no unique key conflicts occur. This is the default disposition.
Truncate and insert
To remove all the rows that exist in the table prior to running your task, and then insert the target records, specify the TRUNCATEANDINSERT disposition. If a task has multiple instances of the same database table being used as a target, then TRUNCATEANDINSERT can not be used in any of the instances.
Update and/or insert
To have Connect ETL decide whether target records should be inserted as new rows or should update existing rows, specify MATCH for the disposition. If MATCH PRIMARYKEY is specified, then each target record is matched to existing records in the table based on primary key or primary index. If MATCH CONDITION is specified, target records are matched based on the specified conditional expression. All columns that are used to match target records must be mapped in columnlist. You can then specify the action Connect ETL is to take:
- To update existing rows, specify the WHENMATCHED UPDATE clause. You can choose to update all the columns that are part of the columnlist by specifying UPDATE ALL, or you can specify the subset of columns that are to be updated.
- To insert new rows, specify the WHENNOTMATCHED INSERT ALL clause.
If a task uses the same database table as target multiple times but with different dispositions, for example Insert only for one target and Update and/or insert for another, the sets of records output to each target must be disjoint. Otherwise, the results are undefined.
Apply changes (CDC)
Indicates that the type of operation performed on the target database table occurs during the apply phase of Connect ETL Change Data Capture (CDC).
During the apply phase of Connect CDC for z/OS, Connect executes each of the downstream Connect tasks, through which the changed data is processed via Connect CDC for z/OS batch processing.
Using Connect CDC for z/OS metadata and primary keys from the source database tables, Connect for Big Data processes the delta changes among the records within a given Connect CDC for z/OS source by consolidating multiple occurrences of the same record to one record. For each downstream Connect for Big Data task, Connect for Big Data applies each of the records from each Connect CDC for z/OS source to the target database table.
Commit behavior
If COMMITINTERVAL is not specified, the commit behavior is determined by Connect ETL at runtime. If COMMITINTERVAL ENDOFTASK is specified, Connect ETL commits the data into the target table at the end of the task. If COMMITINTERVAL number RECORDS is specified, Connect ETL performs a commit after every number of records.
If the disposition is Truncate and insert, the truncate is committed first, separately from the insert. Therefore, if the task aborts, it is possible that the target table is empty because truncate was committed.
If COMMITINTERVAL number RECORDS is specified, Connect ETL may not be able to use the fastest write method.
Teradata utility settings
For Teradata targets, you can specify which Teradata utility to use, and change its settings. Use GENERATEJOBSCRIPT to create a file containing the utility commands when the task runs.
Refer to the Teradata Utilities Reference for more information on the utilities and settings.
Error handling
By default, Connect ETL reports any errors that occur while loading records into the database table, and execution continues. If you want dmexpress to abort in this case, specify ABORTIF ERROR. The state of the target table depends on the commit behavior.
Using a target database table in a program
You cannot use /DBOUTPUT when running dmexpress from a program.
Obtaining Target Record Number
To obtain a target record number, create and call the Connect ETL target record function, TargetRecordNumber().
Apache Avro/Parquet backed Hive tables
For Apache Avro or Apache Parquet backed Hive tables, where the SQL specifies STORED AS AVRO or STORED AS PARQUET, respectively, the following optional parameters are available:
Parameter Name | Parameter Value | Default | |
---|---|---|---|
Avro | CODEC | NULL, DEFLATE, or SNAPPY | NULL |
BLOCKSIZE | <value in KB> | 16 | |
Parquet | CODEC | UNCOMPRESSED or GZIP | UNCOMPRESSED |
BLOCKSIZE | <value in MB> | 128 | |
PAGESIZE | <value in MB> | 8 | |
DICTIONARYENCODING | TRUE or FALSE | FALSE |
Examples
/DBOUTPUT transaction_database table sorted_transactions insert columns id = transaction_id
/DBOUTPUT transaction_database table sorted_transactions truncateandinsert columns client_name = full_name, id = transaction_id
/DBOUTPUT productiondb SQL " Insert into transaction_summary values(?, ?, ?)" ss_customer_id, ss_customer_name, ss_total_purchases
/DBOUTPUT productiondb SQL
" Insert into transaction_summary
values(:1,:2,:3)"
ss_customer_id,
ss_customer_name,
ss_total_purchases
/DBOUTPUT productiondb TABLE transactions
match primarykey
whenmatched update department, price
whennotmatched insert all
columns productID = fieldID,
department = fieldDept,
price = fieldPrice
/DBOUTPUT productiondb TABLE customer
match condition customer.BirthDate = recLayout.BirthDate
AND customer.Name = recLayout.Name
whenmatched update Address
whennotmatched insert all
columns customer.BirthDate = recLayout.BirthDate,
customer.Name = recLayout.Name,
customer.Address = recLayout.Address
/DBOUTPUT customer_db TABLE aggregated_daily INSERT
PARAMETERS MAXPARALLELSTREAMS "3", TEMPDIRECTORY
"/tmp"
COLUMNS name = customer_name,
date = transaction_date,
amount = total_sales_on_date,
num_transactions = transaction_count
/DBOUTPUT customer_db TABLE aggregated_daily INSERT
PARAMETERS MAXPARALLELSTREAMS "3", CODEC "snappy",
BLOCKSIZE "20"
COLUMNS name = customer_name,
date = transaction_date,
amount = total_sales_on_date,
num_transactions = transaction_count
/DBOUTPUT customer_db TABLE aggregated_daily TRUNCATEANDINSERT
PARAMETERS VERTICACOPYENFORCELENGTH "true"
COLUMNS name = customer_name,
date = transaction_date,
amount = total_sales_on_date,
num_transactions = transaction_count
/DBOUTPUT customer_db TABLE aggregated_daily INSERT
PARAMETERS MAXPARALLELSTREAMS "3", LOADMETHOD
"Conventional"
COLUMNS name = customer_name,
date = transaction_date,
amount = total_sales_on_date,
num_transactions = transaction_count
/DBOUTPUT HiveDB TABLE dmxcdc_t1 CDC
MATCH CONDITION source_id = dmxcdc_t1.id
CDCOPERATIONFIELD dmxcdc_operationfield
CDCSEQUENCEFIELD dmxcdc_sequencefield
COLUMNS sourcef1 = hivef1,
sourcef2 = hivef2
This option specifies that records output from Connect CDC for z/OS batch processing are to be applied to target table, dmxcdc_t1, of Hive target database, HiveDB.
The sequence field, which is stored in dmxcdc_sequencefield, is determines the order in which to consolidate duplicates of the same record in the Connect CDC for z/OS source during Connect CDC for z/OS batch processing.
The value applicable to the operation, which is stored in dmxcdc_operationfield, is used to determine how each of the the consolidated source records is applied to the applicable target columns. The values applicable to the operation field of a Connect CDC for z/OS source can include the following: I – Insert; D – Delete; B – Before update; and U – Update.
The ID column of the source record, source_id, is compared against the ID columns of the rows of the target database table, dmxcdc_t1.id, to determine whether a match exists. Depending on the values stored in dmxcdc_operationfield, source records are applied to the applicable target columns: sourcef1 is applied to hivef1 and sourcef2 is applied to hivef2.