/HCATINPUT - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To define an HCatalog table as a source of records.

Format

/HCATINPUT database table_extract [parameters]

where

table_extract = TABLE dbtable [COLUMNS column[,column...]] [extract_options] [ALIAS alias]
extract_options = [EXTRACTAS ENCODING encoding]
parameters = PARAMETERS FILTER scan_filter

Arguments

database The alias of an hcatalog connection defined through /HCATCONNECTION.
dbtable The name of the table in database that is the source of records. The name can be in the form of a string or identifier.
column The name of a column to be extracted from the table. The name can be in the form of a string or identifier.
alias

An optional alternative name you assign to the database table. You can use this alias name to reference the table from other options. The name must adhere to the rules described for an identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see syntax reference in the Connect help.

When you do not assign an alias, use the dbtable argument to reference the table from other options.

When you define an alias for the source, reference the source only through the alias; you can no longer reference the table name directly in other options.

encoding The encoding of the character data in the table. See Data encoding in the Connect help for a list of the valid values.
scan_filter An optional hcatalog scan filter to describe the desired partitions to read from a partitioned hcatalog table. See details in Notes below.

Location

The /HCATINPUT option can be listed anywhere in the task definition for the following task types: copy, sort, merge, and aggregate.

For a join task, the /HCATINPUT option must be listed as follows:

  • Left side – Before the /JOINKEYS option that defines the left side.
  • Right side – After the /JOINKEYS option that defines the left side and before the /JOINKEYS option that defines the right side.

Notes

The /HCATINPUT option allows you to define an HCatalog table as a source of records. For sort tasks only, a source table can also be used as a target table.

Number of sources

If you have a source table in an aggregate task, it must be the only source.

Determining which columns are extracted

By default, Connect ETL extracts all columns from the table. To extract only a subset of the columns in the table, for example to improve the performance of the task, specify the list of columns to be extracted via the COLUMNS clause.

You can reference the columns from the source table in other options where you can reference field names from a record layout. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

The format of extracted columns

Connect ETL extracts columns into fields with types and lengths that are appropriate to the columns' native formats in the table. When you do not specify the extraction type and length explicitly, DMEpress extracts a column with a default type and length as listed in the topic Conversion between HCatalog data types and Connect ETL data types in the Connect help. Note that the default type and length may change in a future release of Connect ETL. Hence, depending on your task, it may be preferable to specify an explicit type and length for the extraction through a /HCATCOLUMNSEXTRACT option.

You can also convert the data to a specific format explicitly via the /VALUE option and include the named value definition in a /REFORMAT option.

Character encoding

If the character encoding is not set when extracting a text column from an HCatalog table, Connect ETL attempts to extract the text column in the database server’s character encoding.

Record layout

When a record originating from an HCatalog table is output to a file, buffer, or procedure, the default record layout is defined as a concatenation of the extracted columns specified in the /HCATINPUT option.

Nullable columns

See Nullable columns under /HCATCOLUMNSEXTRACT.

Filter parameter

An optional HCatalog scan filter can be specifed on one or more partition columns of partitioned HCatalog tables.

  • Although partition columns can be of different types, HCatalog restricts filtering to partition columns of type 'string' only.
  • The filter can contain the operators 'and', 'or', 'like', '()', '=', '<>' (not equal), '<', '>', '<=' and '>='.
  • Each filter value must be enclosed by double quotes ("), with the whole filter parameter enclosed by single quotes ('), or the other way around.

Using a source HCatalog table in a program

You cannot use /HCATINPUT when running Connect ETL from a program.

Examples

/HCATinput hcat_emp_db TABLE contact_info COLUMNS cell_phone,email
This option specifies that the source records are to be read from the table contact_info in the database hcat_emp_db. Only the cell_phone and email columns are to be extracted.
/HCATINPUT hcat_sales_db TABLE sales COLUMNS "sttcol" PARAMETERS FILTER 
'dtcol>="20110924" and dtcol<="20110930"'

This option specifies reading the column sttcol from the table sales in the database hcat_sales_db with the filter 'dtcol>="20110924" and dtcol<="20110930"'.