Purpose
To define the format of the data extracted from columns in an HCatalog table.
Format
/HCATCOLUMNSEXTRACT TABLE hcattable extracted_column[,extracted_column…]
where
extracted_column | = | columnname [extract_option…] |
extract_option | = | {EXTRACTAS dataspec [ENCODING encoding] [TREATAS ASCII] } {NULLABLE|NOTNULLABLE } {ALIAS alias } |
dataspec | = | {datatype length} {length datatype} |
datatype | = | {[CHARACTER] } {numeric_format } {DATETIME (datetime_pattern)} |
Arguments
dbtable | The name or alias of a source hcatalog table defined in a /HCATINPUT option. |
columnname |
The name of a column to be extracted from the table. The name can be in the form of a string or identifier. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
encoding |
The scheme used for encoding extracted CHARACTER column data. Encoding specifies the character encoding in which the text data is extracted. If necessary, the text data is converted to the specified encoding. See dmexpress data types/Text data type in the Connect help for a list of valid values. If EXTRACTAS encoding is not defined, the extracted character encoding is the encoding defined by /HCATINPUT. |
alias |
A name you assign to the extracted table column, which you will use to reference the column in other options. For a summary of valid naming and formatting conventions for identifiers and constants, see syntax reference in the Connect help. When you define an alias for a column, reference the column through the alias only; you can no longer reference the column name directly in other options. |
length | The length of the extracted column. The length is in bytes for all supported data types and does not include the length of the null indicator. The acceptable values for the length of an extracted column with one of the recognized data types can be found in the topic Conversion between hcatalog data types and dmexpress data types in the Connect help. The length can be omitted when the datatype is DFLOAT or FLOAT, as DFLOAT and FLOAT have default lengths. To extract a VARCHAR column as a variable length field, omit the data type and length. |
numeric_format | The format of the numeric data extracted from the column. See the topic Conversion between hcatalog data types and dmexpress data types in the Connect help. |
datetime_pattern |
A mask that describes the layout of an extracted date/time field. For a summary of valid naming and formatting conventions for identifiers and constants, see syntax reference in the Connect help. |
Location
The option may appear anywhere in the task definition.
Notes
The columns specified in the /HCATINPUT option are extracted using the default formats. To ensure that the data in a column is extracted in a specific format, use the /HCATCOLUMNSEXTRACT option to explicitly specify the extract format. The extracted column must be one of the columns read from the HCatalog table. The format assigned to an extracted column must be compatible with the data type of the database column, as defined in the topic Conversion between HCatalog data types and Connect ETL data types in the Connect help. Certain conversions could result in an overflow, which would be flagged with a message when the task is run.
You cannot specify multiple extraction formats for the same column in the same task.
Treating Extracted Text as if Encoded in ASCII
When TREATAS ASCII is specified for any CHARACTER column, the text data is extracted in the specified or default encoding and is treated in all subsequent operations as if it were encoded in ASCII. No character encoding conversion takes place as a result of the TREATAS ASCII specification.
Nullable columns
The HCatalog API does not specify the nullability of an HCatalog table column, so Connect ETL assumes that all table columns are nullable. When extracting data from a nullable HCatalog table column, NULL values are retained unless you specify the NOTNULLABLE keyword, in which case null values are replaced with spaces.
Note the following about processing NULL values in Connect ETL:
- NULL values sort before all other values when sorted in ascending order and after all other values when sorted in descending order.
- NULL values are treated as an empty string when involved in string concatenation.
- Unless stated otherwise, operations involving a NULL value produce a NULL value, though you can define the result using a conditional field.
- When a NULL value is output to a file, buffer, or procedure, the output field content is empty, which is defined according to the field type as: a delimited field of length 0; a fixed length field filled with spaces; or a binary field in which each byte is set to hex value "20" (an ASCII space).
Referencing HCatalog Table Columns
You can refer to an HCatalog table column by the fully qualified name tablename. columnname or as columnextractalias. tablename should match the alias of an HCatalog table or should match the name of the table defined using /HCATINPUT when an alias is not provided. columnname should match the name of the column. columnextractalias should match the alias assigned to the column name while specifying the extraction format. A partially qualified reference can be used as long as the reference can be uniquely identified.
Examples
/HCATcolumnsextract TABLE client_facts amount EXTRACTAS int 8 ALIAS int_amount
/HCATcolumnsextract TABLE client_facts client_name EXTRACTAS CHAR 10 NULLABLE
/HCATcolumnsextract TABLE client name EXTRACTAS CHAR 10 ENCODING UTF-8 NULLABLE
/HCATcolumnsextract TABLE client name EXTRACTAS CHAR 10 ENCODING UTF-8 TREATAS ASCII
This option specifies that extracted character data from the name column of the client table will be encoded in UTF-8 and will be handled as ASCII in all subsequent operations.