/XMLEXTRACT - Connect_ETL - 9.13

Connect ETL Data Transformation Language (DTL) Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Data Transformation Language (DTL) Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:01:45.019000

Purpose

To specify the fields to be extracted from an XML layout.

Format

/XMLEXTRACT layout { xmlfield [, xmlfield ... ] }

where

xmlfield = [name] [NOTNULLABLE] [ {xmlfield [, xmlfield ... ] } ] [NORMALIZE]
Note: The braces ({ }) in /XMLEXTRACT and in xmlfield must appear where shown. They do not indicate a choice of arguments.

Arguments

layout The name or alias of an XML layout as defined in a /INFILE, /INPIPE, or /DATADICTIONARY option.
name

The name of the XML field that you want to extract from the XML layout.

For a summary of valid naming and formatting conventions for identifiers and constants, see Syntax reference in the Connect help.

Location

The option may appear anywhere in the task definition.

Notes

By default, all the XML fields found in an XML layout at run-time will be extracted. If you want to extract only some fields, specify an extracted XML layout through /XMLEXTRACT.

If a composite XML field is not followed by { xmlfield [, xmlfield ... ] }, then all the fields of the composite field will be extracted.

The NORMALIZE keyword can be specified only for an XML field that is repeating. It indicates that such an element is extracted in the form of normalized records. Connect ETL requires that NORMALIZE be specified with any repeating element selected for extraction.

Nullable XML fields

When data is extracted from a nullable XML field, the nullability information is retrieved by default. If you specify the NOTNULLABLE keyword, the nullability information is not retrieved and the extracted data is filled with empty string when the value in the XML field is NULL.

Examples

/INFILE sales.txt XML LAYOUT I"SalesEntries" 
/XMLEXTRACT I"SalesEntries" { I"Name" NORMALIZE}
The XML field "Name" should be extracted from the input file specified in the /INFILE option. The field "Name" that repeats in the DTD will be extracted as one record per repetition.
/XMLEXTRACT I"SalesProducts" {I"Product" {I"ID", I"Name", 
I"Location"{I"Section"}} NORMALIZE}
The XML fields "ID", "Name" and "Section" should be extracted from the input file. The composite XML fields that they are part of are "Product" and "Location". All the other fields that are part of these composite fields are not extracted. The field "Product" that repeats in the DTD will be extracted as one record per repetition.
/XMLEXTRACT I"SalesProducts" {I"Product" {I"ID", I"Product$" ALIAS ProductName} 
NORMALIZE}
The XML fields "ID" and "Product$" should be extracted. Since the XML element "Product" has both an attribute and unnamed content, the unnamed content needs to be referred to as "Product$". The XML field "Product" that repeats in the DTD will be extracted as one record per repetition.
/XMLEXTRACT I"SalesProfits" {I"Profits"
 {I"Branch", I"QuarterlyProfit(1)", I"QuarterlyProfit(2)"} NORMALIZE}
The XML fields "Branch", "QuarterlyProfit(1)" and I"QuarterlyProfit(2)" should be extracted. Since the XML element "QuarterlyProfit" appears multiple times at the same level as part of the composite XML field "Profits", the XML fields that need to be extracted have to be identified by indicating which occurrence to extract. The XML field "Profits" that repeats in the DTD will be extracted as one record per repetition.
/XMLEXTRACT I"CLIENTS" 
                    {I"SEQUENCE$" { I"NAME", I"CITY", I"STATE"} NORMALIZE}
The options specify that the XML fields "NAME", "CITY" and "STATE" should be extracted. Since the fields "NAME", "CITY" and "STATE" form a repeating sequence without a name, the repeating unnamed sequence is identified as "SEQUENCE$". Also, because "SEQUENCE$" is a repeating sequence, it is specified with NORMALIZE. The XML fields "NAME", "CITY" and "STATE" that repeat at nested levels in the DTD will be extracted as one record per repetition.
/XMLEXTRACT I"CLIENTS" {I"SEQUENCE$"{ I"NAME", 
{I"CHOICE$" {I"CHECKING", I"SAVINGS"} NORMALIZE} NORMALIZE}

The XML fields "NAME", "CHECKING" and "SAVINGS" should be extracted. Since the fields "CHECKING" and "SAVINGS" form a repeating choice without a name, the unnamed choice is identified as "CHOICE$". Since "Name" and "CHOICE$" form a repeating sequence without a name, the unnamed sequence is identified as "SEQUENCE$". "SEQUENCE$" and "CHOICE$" are repeating, so NORMALIZE is specified for both. The XML fields "NAME", "CHECKING" and "SAVINGS" that repeat at nested levels in the DTD will be extracted as one record per repetition.