Convert Tabular XML - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Transforms a field containing XML tabular data into the native BRD format used by Data360 Analyze nodes.

If you want to transform data from a file (rather than from a field), see the Tabular XML File node.

A "tabular" XML field is one in which a single record of data is stored inside a single XML element, where the fields of the XML record are in attributes on this record element or in sub-elements underneath the record element. This node works best for situations where all data for each output record is underneath this record element. For other situations, see XML Data.

To configure this node:

  1. In the Data property, enter the name of the input field that contains the XML tabular data.
  2. In the RecordXPath property, enter an XPath expression to identify where the records are stored in the XMLfile. XPath is a syntax for defining parts of an XML file. This XPath is case sensitive.
  3. Identify the fields and values within the specified record by entering absolute XPaths in the FieldNameXPath and FieldValueXPath properties.
Note: Field names and values are matched based upon the order in which they are found in the XML file. Therefore, even if the fields inside of an XML record change order between records, the node will output the values in the correct field. Also, even if the order of unions in the FieldNameXPath are different from the order in the FieldValueXPath, the values are still output to the correct fields.

Examples

There are a few simple XPath expressions that can read a large variety of XML files, as in the following examples where the input data is a school's student database that stores the name, student ID, and grade of all students.

Example 1

You have the following input XML data:

<students>

<student>

<name value="Peter"/><id value="1001"/><grade value="1"/>

</student>

<student>

<name value="Mary"/> <id value="1002"/><grade value="1"/>

</student>

<student>

<name value="John"/><id value="1003"/> <grade value="1"/>

</student>

<student>

<name value="Thomas"/><id value="2001"/><grade value="2"/>

</student>

<student>

<name value="Luke"/> <id value="2002"/> <grade value="2"/>

</student>

</students>

The following property values will output the name, ID, and grade in a table format. In the RecordXPath property, enter:

/students/student

In the FieldNameXPath property, enter:

/students/student/*

In the FieldNameFunction property, enter:

local-name()

In the FieldValueXPath property, enter:

/students/student/*/@value

The output is as follows:

Name Id Grade
Peter 1001 1
Mary 1002 1
John 1003 1
Thomas 2001 2
Luke 2002 2

Example 2

Consider the same school database, but it has been reorganized, as follows:

<students>

<student>

<name>Peter</name><id>1001</id><grade>1</grade>

</student>

<student>

<name>Mary</name><id>1002</id><grade>1</grade>

</student>

<student>

<name>John</name><id>1003</id><grade>1</grade>

</student>

<student>

<name>Thomas</name><id>2001</id><grade>2</grade>

</student>

<student>

<name>Luke</name><id>2002</id><grade>2</grade>

</student>

</students>

You could produce the same output as with the previous example by using the following property settings. In the RecordXPath property, enter:

/students/student

In the FieldNameXPath property, enter:

/students/student/*

In the FieldNameFunction property, enter:

local-name()

In the FieldValueXPath property, enter:

/students/student/*/text()

Example 3

Consider the same school database, but it has been reorganized, as follows:

<students>

<student>

<column>

<name>Name</name><value>Peter</value>

</column>

<column>

<name>Id</name><value>1001</value>

</column>

<column>

<name>Grade</name><value>1</value>

</column>

</student>

<student>

<column>

<name>Name</name><value>Mary</value>

</column>

<column>

<name>Id</name><value>1002</value>

</column>

<column>

<name>Grade</name><value>1</value>

</column>

</student>

<student>

<column>

<name>Name</name><value>John</value>

</column>

<column>

<name>Id</name><value>1003</value>

</column>

<column>

<name>Grade</name><value>1</value>

</column>

</student>

<student>

<column>

<name>Name</name><value>Thomas</value>

</column>

<column>

<name>Id</name><value>2001</value>

</column>

<column>

<name>Grade</name><value>2</value>

</column>

</student>

<student>

<column>

<name>Name</name><value>Luke</value>

</column>

<column>

<name>Id</name><value>2002</value>

</column>

<column>

<name>Grade</name><value>2</value>

</column>

</student>

</students>

You could produce the same output as with the previous examples by using the following property settings. In the RecordXPath property, enter:

/students/student

In the FieldNameXPath property, enter:

/students/student/column/name/text()

Leave the FieldNameFunction property blank.

In the FieldValueXPath property, enter:

/students/student/column/value/text()

Example 4

In this example, the input data is in simple, text-only elements that may or may not have sub-elements. The firstName and lastName are simple elements, and address has two simple sub-elements, as follows:

<people>

<person>

<firstName>Mary</firstName>

<lastName>Hunt</lastName>

<address>

<line1>321 Summer Street</line1>

<line2>Boston, MA 02210</line2>

</address>

</person>

</people>

The following property values will output the text of each simple element in a field named after the element's tag. In the RecordXPath property, enter:

/people/person

In the FieldNameXPath property, enter:

/people/person//*[not(*)]

In the FieldNameFunction property, enter:

local-name()

In the FieldValueXPath property, enter:

/people/person//*[not(*)]/text()

Using the above property values, the output is as follows:

firstName lastName line1 line2
Mary Hunt 321 Summer Street Boston, MA 02210

Example 5

This example matches people to a list of their friends. The list can contain a variable number of friends:

<people>

<person>

<name>Rocco</name>

<friends>

<friend>

<name>Nicola</name>

</friend>

<friend>

<name>Michelle</name>

</friend>

<friend>

<name>Akshu</name>

</friend>

</friends>

</person>

<person>

<name>Marty</name>

<friends>

<friend>

<name>Sam</name>

</friend>

<friend>

<name>Ryan</name>

</friend>

</friends>

</person>

</people>

Since the list can contain a variable number of friends, we want to output all of the data in the friends tag as a single cell and then use a second Convert Tabular XML node to produce a new record for each friend.In the first Convert Tabular XML node, configure the properties as follows:

In the RecordXPath property, enter:

/people/person

In the FieldNameXPath property, enter:

/people/person/*

In the FieldNameFunction property, enter:

local-name()

In the FieldValueXPath property, enter:

/people/person/*[not(*)]/text()|/people/person/*[*]|/people/person/*[not(node())]

Leave the FieldValueFunction property blank.

The output from the first Convert Tabular XML node is as follows:

name friends
Rocco

<friends>

<friend>

<name>Nicola</name>

</friend>

<friend>

<name>Michelle</name>

</friend>

<friend>

<name>Akshu</name>

</friend>

</friends>

Marty

<friends>

<friend>

<name>Sam</name>

</friend>

<friend>

<name>Ryan</name>

</friend>

</friends>

Join this output data to a second Convert Tabular XML node and configure the properties as follows:

In the Data property, enter:

friends

In the RecordXPath property, enter:

/people/person

Leave the FieldNameXPath property blank. In this case, this property is left blank and the FieldNameFunction property has a string value (see below); this is because using the local name of /friends/friend/name would conflict with the name field in our input data. See also the FieldNameFunction property description below for a more generalized example of how to avoid name collisions.

In the FieldNameFunction property, enter:

"Friend"

In the FieldValueXPath property, enter:

/friends/friend/name/text()

Set the PassThroughFields property to Unused.

The output from this second ConvertTabular XMLnode is as follows:

name friend
Rocco Nicola
Rocco Michelle
Rocco Akshu
Marty Sam
Marty Ryan

Properties

Data

Specify the name of the input field that contains the XML data. A value is required for this property.

RecordXPath

Specify an XPath expression to identify where the records are stored in the XMLfile (XPath is a syntax for defining parts of an XML file).

For example, if the records are located within the following XML tags, <students> <student>, you would enter: /students/student

The node creates a new output record when the RecordXPath's opening tag is found. When the closing tag is found, this node outputs all the data it has read from this XML file.

This XPath is case sensitive. A value is required for this property.

FieldNameXPath

Optionally specify an absolute XPath expression to identify the names of the fields to be output. In nearly all cases, the XML value that you enter in this property should be located underneath the value specified in the RecordXPath property. This XPath is case-sensitive.

For example, if you entered /students/student in the RecordXPath property, you would enter /students/student/* in this property.

If this property is blank, then if the value specified in the FieldValueXPath property is used. If the FieldValueXPath property is blank, the value specified in the RecordXPath property is used.

The field names returned from this expression must conform to the naming syntax allowed in the BRD output record. For example, if the returned field name is a complex element containing angle brackets (< >) or commas (,), then the node will error. The FieldNameFunction property can provide a way to translate from full elements to simple element names.

FieldNameFunction

Optionally specify an XPath function to run against the fields that match the value specified in the FieldNameXPath property. The return value of this function specifies the name of the fields.

If you leave this property blank, the behavior of the node depends on the return type of the FieldNameXPath. If the value specified in the FieldNameXPath property returns an element, then the node uses the name of that tag as the name of this field. This behavior is equivalent to setting this property to 'local-name()' for this particular field. However, if FieldNameXPath returns a text value, usually from an attribute or from an XPath ending with the 'text()' path, then the node uses that string as the name of the field. This behavior is equivalent to having this property contain 'string()' for this particular field.

A value of 'local-name()' in this property uses the name of the returned elements as the field name. The 'string()' function sets the names of the fields to the body of text that the nodes contain.

In addition to the above examples, any valid XPath function is acceptable. For example, a counter could be appended to the field names in order to make them all unique, or a prefix could be added to mark which type of XML file the records came from.

For example, the input XML is as follows:

<employees> <employee> <name>John Hobbes</name>

<supervisor> <name>Robert Bradley</name> </supervisor> </employee> </employees>

To output all elements with the name 'name', since there are two fields named 'name', one for the employees and one for the supervisor, configure the FieldNameFunction property to avoid the name collision, as follows:

concat(local-name(parent::node()), " ", local-name())

FieldValueXPath

Optionally specify an absolute XPath expression that identifies the values to output. In nearly all cases, the XML value that you enter in this property should be located underneath the value specified in the RecordXPath property. This XPath is case-sensitive.

This property should select the element, and the FieldValueFunction property should contain 'string()'. With these property settings, the node will output only the text within the element tags. For more complex elements, such as lists, it is usually better to select the entire XML element in the FieldValueXPath property while leaving the FieldValueFunction property blank. This way, the node will output the list as a single element such that a second Convert Tabular XML node can be used to output a new record for each element in the list.

If this property is blank, if a value is specified in the FieldNameXPath property it is used. If the FieldValueXPath property is blank, the value specified in the RecordXPath property is used.

FieldValueFunction

Optionally specify an XPath function to run against the fields that match the value specified in the FieldValueXPath property. The return value of this function specifies the value of the fields. Any XPath function can be used in this statement.

When selecting text values from elements, enter 'string()' in this property. When selecting complex elements, leave this property blank.

Note: If you leave this property blank and append 'text()' to the end of the FieldValueXPath property, this will cause errors if there are empty tags because they will match the FieldNameXPath property but not the FieldValueXPath property.

For example, if the XML input is as follows:

<catalog> <cd> <title/>

<artist>Bonnie Tyler</artist>

<country>UK</country>

. . . </cd> </catalog>

With the following property settings:

RecordXPath: /catalog/cd

FieldNameXPath: /catalog/cd/*

FieldNameFunction: local-name()

FieldValueXPath: /catalog/cd/*/text()

FieldValueFunction: <blank>

The FieldNameXPath would pick up the title tag while the FieldValueXPath would ignore it since the title tag has no text. These settings would cause an error. In order to process this data correctly, the following settings are needed:

RecordXPath: /catalog/cd

FieldNameXPath: /catalog/cd/*

FieldNameFunction: local-name()

FieldValueXPath: /catalog/cd/*

FieldValueFunction: string()

In this case, the title tag is picked up by the FieldNameXPath and FieldValueXPath since the tag matches /catalog/cd/*. The string() function then returns an empty string ("") for the title, thus avoiding the error in the first example.

ConcatenationMode

Optionally specify what to do when tags under the records as specified by the RecordXPath property are not exactly the same in the XML data. Choose from:

  • Exact - The tags in all of the processed records must be the same, and all those tags will be used as the output metadata. If the tags are not exactly the same in all of the processed records, the node will fail.
  • Union - All tags of all processed records will be used as the output metadata.
  • Intersection - Only the tags that are common to all of the processed records will be used as the output metadata. If no tags exist in all of the records, the node will fail.

The default value is Exact.

Namespaces

Optionally specify XML namespaces to identify elements in the XML data (a namespace is a set of symbols that allow you to organize objects of varying kinds, so that these objects can then be referred to by name). Each new namespace declaration should be on a new line and can be in any of the three following formats, where xmlns is the xmlns attribute in XML data, <prefix> is a user-defined prefix for the namespace (which can be different from the prefix used in the XML data), and <uri> is the URI for the namespace which must match a namespace uri in the XML data:

  1. xmlns:<prefix>="<uri>"
  2. <prefix>="<uri>"
  3. <prefix>

If <uri> is missing in the declaration, the corresponding prefix will be associated with the default namespace.

Note: The Convert Tabular XML node does not use the namespace URL to look up information. The purpose of using an URI is to give the namespace a unique name.

For example, in the XML data, a namespace is defined as: xmlns:foo="http://Namespace-name-Foo", and in this property, a namespace is defined as: xmlns:foofoo="http://Namespace-name-Foo". All entities prefixed by "foofoo" in the Xpath Expression properties in this node will be matched against all entities prefixed by "foo" in the XML data.

If duplicate prefixes are specified in the property, the latter ones will overwrite the previous ones.

TrimWhitespace

Optionally specify whether leading and trailing whitespace from each output field is trimmed.

The default value is True.

PassThroughFields

Optionally specify which input fields will "pass through" the node unchanged from the input to the output. Choose from:

  • All - Passes through all input data fields to the output.
  • None - Passes none of the input data fields to the output; as such, only the fields created by the node appear on the output.
  • Used - Passes through all fields that the node used to create the output.
  • Unused - Passes through all fields that the node did not use to create the output.

The default value is Unused.

InputPrefix

Optionally specify a prefix to prepend to all input fields before they are passed to the output. If blank, no prefix is added. This feature serves two purposes:

  • First, it helps separate the input fields that are passed-through the node from the output fields that are generated by the node.
  • Second, it avoids field name collisions if an input field has the same name as a generated field.

If this property contains a value and the PassThroughFields property is set to None, the node will error.

The default value is blank.

NullValueBehavior

Optionally specify what to do if the input data field specified in the Data property is NULL, an empty string (""), or contains only whitespace. Choose from:

  • Error - The node errors and stops processing when the specified input field contains a NULL value, an empty string, or all whitespace.
  • Log - The node logs that this situation occurred and continues processing without outputting a record.
  • Ignore - The node continues to process the input data without any output or log message.
  • Emit - The record is output as per the PassThroughFields property definition. All generated fields will contain NULL. If the first record contains a NULL or blank entry, the node will error as it will be unable to define the generated fields for that record.

The default value is Error.

Inputs and outputs

Inputs: in1.

Outputs: out1.