Read From XML - data_integration_1 - 23.1

spectrum-inline

Product type
Software
Portfolio
Integrate
Enrich
Locate
Verify
Product family
Spectrum™ software
Product
Spectrum™ software > Quality > Addressing
Spectrum™ software > Geo Addressing > Enterprise Tax
Spectrum™ software > Quality > Context Graph
Spectrum™ software > Spatial > Spectrum Spatial
Spectrum™ software > Discovery
Spectrum™ software > Data Integration
Spectrum™ software > Dataflow Designer
Spectrum™ software > Spectrum™ Platform
Spectrum™ software > Web Services
Spectrum™ software > GeoEnrichment
Spectrum™ software > Quality > Spectrum Quality
Version
23.1
ft:locale
en-US
Product name
Precisely Spectrum
ft:title
spectrum-inline
First publish date
2007
ft:lastEdition
2023-06-02
ft:lastPublication
2023-06-02T09:54:39.526000

The Read from XML stage reads an XML file into a job or subflow. It defines the file's path and data format, including XML schema and data element details.

Simple XML elements are converted to flat fields and passed on to the next stage. Simple XML data consists of records made up of XML elements that contain only data and no child elements. For example, this is a simple XML data file:

<customers>
    <customer>
        <name>Sam</name>
        <gender>M</gender>
        <age>43</age>
        <country>United States</country>
    </customer>
    <customer>
        <name>Jeff</name>
        <gender>M</gender>
        <age>32</age>
        <country>Canada</country>
    </customer>
    <customer>
        <name>Mary</name>
        <gender>F</gender>
        <age>61</age>
        <country>Australia</country>
    </customer>
</customers>

Notice that in this example each record contains simple XML elements such as <name>, <gender>, <age>, and <country>. None of the elements contain child elements.

The Read from XML stage automatically flattens simple data like this because most stages require data to be in a flat format. If you want to preserve the hierarchical structure, use an Aggregator stage after Read from XML to convert the data to hierarchical data.

Complex XML elements remain in hierarchical format and are passed on as a list field. Since many stages require data to be in a flat format, so you may have to flatten complex XML to make the data usable by downstream stages. See Flattening Complex XML Elements for more information.

Note: Read From XML does not support the XML types xs:anyType and xs:anySimpleType.

File Properties Tab

Table 1. File Properties Tab

Option Name

Description

   

Schema file

Specifies the path to an XSD schema file. Click the ellipses button (...) to locate the file you want. Note that the schema file must be on the server in order for the data file to be validated against the schema. If the schema file is not on the server, validation is disabled.

Alternatively, you can specify an XML file instead of an XSD file. If you specify an XML file the schema will be inferred based on the structure of the XML file. Using an XML file instead of an XSD file has some limitations:

  • The XML file cannot be larger than 1 MB. If the XML file is more than 1 MB in size, try removing some of the data while maintaining the structure of the XML.
  • The data file will not be validated against the inferred schema.
Note: If the Spectrum Technology Platform server is running on Linux, remember that file names and paths on these platforms are case sensitive.
   

Data file

Specifies the path to the XML data file. Click the ellipses button (...) to locate the file you want.

Note: If the Spectrum Technology Platform server is running on Linux, remember that file names and paths on these platforms are case sensitive.
   

Preview

Displays a preview of the schema or XML file. When you specify an XSD file, the tree structure reflects the selected XSD. Once you specify both a schema file and a data file, you can click on the schema elements in bold to see a preview of the data that the element contains.

   

Fields Tab

Table 2. Fields Tab

Option Name

Description

Filter

Filters the list of elements and attributes to make it easier to browse. The filter does not have any impact on which fields are included in the output. It only filters the list of elements and attributes to make it easier to browse.

XPath

The XPath column displays the XPath expression for the element or attribute. It is displayed for information purposes only. For more information about XPath, review this page.

Field

The name that will be used in the dataflow for the element or attribute. To change the field name, double-click and type the field name you want.

Type

The data type to use for the field. For more information, see Field Data Types.

Include

Specifies whether to make this field available in the dataflow or to exclude it.

Modify

Click this button to change the field name. For fields with a numeric or date data type this button also allows you to modify number and date/time formats.

Example: Simple XML File

In this example, you want to read this file into a dataflow:

<addresses>
    <address>
        <addressline1>One Global View</addressline1>
        <city>Troy</city>
        <state>NY</state>
        <postalcode>12128</postalcode>
    </address>
    <address>
        <addressline1>1825B Kramer Lane</addressline1>
        <city>Austin</city>
        <state>TX</state>
        <postalcode>78758</postalcode>
    </address>
</addresses>

In this example, you could choose to include the <addressline1>, <city>, <state>, and <postalcode>. This would result in one record being created for each <address> element because <address> is the common parent element for <addressline1>, <city>, <state>, and <postalcode>.