Purpose
To specify the fields to be extracted from an XML layout.
Format
/XMLEXTRACT layout { xmlfield [, xmlfield ... ] }
where
xmlfield | = | [name] [NOTNULLABLE] [ {xmlfield [, xmlfield ... ] } ] [NORMALIZE] |
Arguments
layout | The name or alias of an XML layout as defined in a /INFILE, /INPIPE, or /DATADICTIONARY option. |
name |
The name of the XML field that you want to extract from the XML layout. For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help. |
Location
The option may appear anywhere in the task definition.
Notes
By default, all the XML fields found in an XML layout at run-time will be extracted. If you want to extract only some fields, specify an extracted XML layout through /XMLEXTRACT.
If a composite XML field is not followed by { xmlfield [, xmlfield ... ] }, then all the fields of the composite field will be extracted.
The NORMALIZE keyword can be specified only for an XML field that is repeating. It indicates that such an element is extracted in the form of normalized records. Connect ETL requires that NORMALIZE be specified with any repeating element selected for extraction.
Nullable XML fields
When data is extracted from a nullable XML field, the nullability information is retrieved by default. If you specify the NOTNULLABLE keyword, the nullability information is not retrieved and the extracted data is filled with empty string when the value in the XML field is NULL.
Examples
/INFILE sales.txt XML LAYOUT I"SalesEntries"
/XMLEXTRACT I"SalesEntries" { I"Name" NORMALIZE}
/XMLEXTRACT I"SalesProducts" {I"Product" {I"ID", I"Name",
I"Location"{I"Section"}} NORMALIZE}
/XMLEXTRACT I"SalesProducts" {I"Product" {I"ID", I"Product$" ALIAS ProductName}
NORMALIZE}
/XMLEXTRACT I"SalesProfits" {I"Profits"
{I"Branch", I"QuarterlyProfit(1)", I"QuarterlyProfit(2)"} NORMALIZE}
/XMLEXTRACT I"CLIENTS"
{I"SEQUENCE$" { I"NAME", I"CITY", I"STATE"} NORMALIZE}
/XMLEXTRACT I"CLIENTS" {I"SEQUENCE$"{ I"NAME",
{I"CHOICE$" {I"CHECKING", I"SAVINGS"} NORMALIZE} NORMALIZE}
The XML fields "NAME", "CHECKING" and "SAVINGS" should be extracted. Since the fields "CHECKING" and "SAVINGS" form a repeating choice without a name, the unnamed choice is identified as "CHOICE$". Since "Name" and "CHOICE$" form a repeating sequence without a name, the unnamed sequence is identified as "SEQUENCE$". "SEQUENCE$" and "CHOICE$" are repeating, so NORMALIZE is specified for both. The XML fields "NAME", "CHECKING" and "SAVINGS" that repeat at nested levels in the DTD will be extracted as one record per repetition.