Tabular XML File - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000
Table 1. Changes in this topic
Change type Description
Updated in version 3.16.0 Added SubstituteInvalidCharacters property.

Transforms an input XML file which contains tabular data into the native BRD format used by Data360 Analyze nodes.

This particular variant of the Tabular XML node reads a file in from disk via either a single static filename or a list of filenames read from the input pin. In all other respects, it behaves exactly as defined in the description below.

The Tabular XML nodes transform "tabular" XML data into a BRD. A"tabular" XML file is one in which a single "record" of data is stored inside a single XML element. The "fields" of the XML record could either be in attributes on this record element or in subelements underneath the record element. This node works best for situations where all data for each output record is underneath this record element.

The Tabular XML nodes can take in XML either from a file on disk or as a string of XML data in a field of an input BRD.

In order to use the Tabular XML nodes, the user first needs to identify the element that defines the records in the XML. The XPath to this element is placed in the RecordXPath property. Once the record element has been identified, then the user needs to identify the fields and values within that record via the FieldNameXPath and FieldValueXPath properties. FieldNameXPath and FieldValueXPath must be absolute XPaths.Field names and values are matched based upon the order in which they are found in the XML file. Because of this, even if the fields inside of an XML record change order between records, the node will output the values in the correct field. Also, even if the order of unions in the FieldNameXPath are different from the order in the FieldValueXPath, the values are still output to the correct fields.

There are a few simple XPath expressions that can read a large variety of XML files. Here are a few examples:

Example 1:

Let's say that a school has a student database that stores the name, student ID, and grade of all the students in the following XML:


      
<students>
	<student>
		<name value="Peter"/>
		<id value="1001"/>
		<grade value="1"/>
	</student>	
	<student>		
		<name value="Mary"/>		
		<id value="1002"/>		
		<grade value="1"/>	
	</student>	
	<student>		
		<name value="John"/>		
		<id value="1003"/>		
		<grade value="1"/>	
	</student>	
	<student>		
		<name value="Thomas"/>		
		<id value="2001"/>		
		<grade value="2"/>	
	</student>	
	<student>		
		<name value="Luke"/>		
		<id value="2002"/>		
		<grade value="2"/>	
	</student>
</students>

    

The following property values will output the name, ID, and grade in a table format:


RecordXPath:	   /students/student
FieldNameXPath:    /students/student/*
FieldNameFunction: local-name()
FieldValueXPath:   /students/student/*/@value

The output table will be the following:


Name		Id	Grade
Peter		1001	1
Mary		1002	1
John		1003	1
Thomas		2001	2
Luke		2002	2

Example 2:

Let's imagine that the above school database has been reorganized as below:


      
<students>	
	<student>		
		<name>Peter</name>		
		<id>1001</id>		
		<grade>1</grade>	
	</student>	
	<student>		
		<name>Mary</name>		
		<id>1002</id>		
		<grade>1</grade>	
	</student>	
	<student>		
		<name>John</name>	
		<id>1003</id>		
		<grade>1</grade>	
	</student>	
	<student>		
		<name>Thomas</name>		
		<id>2001</id>		
		<grade>2</grade>
	</student>	
	<student>		
		<name>Luke</name>		
		<id>2002</id>		
		<grade>2</grade>	
	</student>
</students>

    

The user could get out the same table as before using the following property settings:


RecordXPath:	   	/students/student
FieldNameXPath:    	/students/student/*
FieldNameFunction: 	local-name()
FieldValueXPath:   	/students/student/*/text()

Example 3:

Again, let's assume that we have the same data, but the school has rearranged it into the following format:


<students>	
	<student>		
		<column>			
			<name>Name</name>			
			<value>Peter</value>		
		</column>	
		<column>		
			<name>Id</name>		
			<value>1001</value>	
		</column>	
		<column>		
			<name>Grade</name>
			<value>1</value>
		</column>	
	</student>
	<student>	
		<column>
			<name>Name</name>
			<value>Mary</value>	
		</column>	
		<column>		
			<name>Id</name>	
			<value>1002</value>	
		</column>	
		<column>		
			<name>Grade</name>	
			<value>1</value>	
		</column>
	</student>
	<student>	
		<column>		
			<name>Name</name>	
			<value>John</value>	
		</column>	
		<column>	
			<name>Id</name>	
			<value>1003</value>	
		</column>	
		<column>	
			<name>Grade</name>	
			<value>1</value>
		</column>	
	</student>
	<student>	
		<column>	
			<name>Name</name>
			<value>Thomas</value>	
		</column>	
		<column>		
			<name>Id</name>	
			<value>2001</value>	
		</column>	
		<column>		
			<name>Grade</name>	
			<value>2</value>
		</column>	
	</student>
	<student>	
		<column>		
			<name>Name</name>	
			<value>Luke</value>	
		</column>	
		<column>		
			<name>Id</name>	
			<value>2002</value>	
		</column>	
		<column>		
			<name>Grade</name>	
			<value>2</value>
		</column>	
	</student>
</students>

The following settings will output an identical table to the first and second cases:


RecordXPath:	   	/students/student
FieldNameXPath:    	/students/student/column/name/text()
FieldNameFunction:	<blank>
FieldValueXPath:	/students/student/column/value/text()

Example 4:

Let's say that you have all of your data in simple, text-only elements that may or may not have subelements. In this example, firstName and lastName are simple elements while address has two simple subelements.


<people>	
	<person>		
		<firstName>Mary</firstName>		
		<lastName>Hunt</lastName>		
		<address>			
			<line1>321 Summer Street</line1>			
			<line2>Boston, MA  02210</line2>		
		</address>
	</person>
</people>

You could use the following XPaths to output the text of each simple element in a field named after the element's tag. Thus, these properties:


RecordXPath:       /people/person
FieldNameXPath:    /people/person//*[not(*)]
FieldNameFunction: local-name()
FieldValueXPath:   /people/person//*[not(*)]/text()

This would produce the following output data:


firstName	lastName	line1			line2
Mary		Hunt		321 Summer Street	Boston, MA  02210

Example 5:

In this case, we are matching people to a list of their friends. This list can contain a variable number of friends.


<people>	
	<person>		
		<name>Rocco</name>		
		<friends>			
			<friend>				
				<name>Nicola</name>			
			</friend>			
			<friend>				
				<name>Michelle</name>			
			</friend>			
			<friend>				
				<name>Akshu</name>			
			</friend>		
		</friends>	
	</person>	
	<person>		
		<name>Marty</name>		
		<friends>			
			<friend>				
				<name>Sam</name>			
			</friend>			
			<friend>				
				<name>Ryan</name>			
			</friend>		
		</friends>	
	</person>
</people>

Since the list can contain a variable number of friends, we want to output all of the data in the friends tag as a single cell and then use a follow-on Convert Tabular XML node to produce a new record for each friend.Our first Tabular XML node would have the following property values:


RecordXPath:        /people/person
FieldNameXPath:     /people/person/*
FieldNameFunction:  local-name()
FieldValueXPath:    /people/person/*[not(*)]/text()|/people/person/*[*]|/people/person/*[not(node())]
FieldValueFunction: <blank>

This first node would produce the following output:


name	friends
Rocco	<friends>			
			<friend>
				<name>Nicola</name>			
			</friend>			
			<friend>				
				<name>Michelle</name>			
			</friend>			
			<friend>				
				<name>Akshu</name>			
			</friend>		
		</friends>
Marty	<friends>			
			<friend>				
				<name>Sam</name>			
			</friend>			
			<friend>				
				<name>Ryan</name>			
			</friend>		
		</friends>

Our second node would then have the following configuration:


Data:              friends
RecordXPath:       /friends/friend
FieldNameXPath:    <intentionally blank>FieldNameFunction: "Friend"
FieldValueXPath:   /friends/friend/name/text()
PassThroughFields: Unused

This node will produce the following output:


name	friend
Rocco	Nicola
Rocco	Michelle
Rocco	Akshu
Marty	Sam
Marty	Ryan

You'll notice that we left the FieldNameXPath blank and put a string in the FieldNameFunction. We do this since using the local name of /friends/friend/name would conflict with the name field in our input data. There is a more generalized example of how to avoid name collisions in the FieldNameFunction node help.

Properties

File

Click the folder icon and browse to the XML file that you want to import.

Choose the (from Field) variant of this property to look up values from an input field with the name specified, opening each filename found in turn. A value is required for this property.

Note: If you are using the server edition and wish to use a file on your local machine, you must first upload it to the Data360 Analyze server. See Upload data files for more details.

RecordXPath

Specify an XPath expression that identifies where records are stored in the XML file. The node creates a new output record when the RecordXPath opening tag is found. When the closing tag is found, this node outputs all the data it has read from this XML file. This XPath is case sensitive.

A value is required for this property.

FieldNameXPath

Optionally specify an absolute XPath expression that identifies the names of the fields to be output. In nearly all cases, this property should have a value that is located underneath RecordXPath. This XPath is case-sensitive.

If this property is blank, then FieldNameXPath will be given the value of FieldValueXPath if it is filled in. Otherwise, FieldNameXPath is set to the value of RecordXPath.

The field names returned from this expression must conform to the naming syntax allowed in the BRD output record. For example, if the returned field name is a complex element containing angle brackets (<, >) or commas (,), then the node will error. The FieldNameFunction property can provide a way to translate from full elements to simple element names.

FieldNameFunction

Optionally specify an XPath function to run against the nodes that match FieldNameXPath. The return value of this function specifies the name of the fields.

When this property is blank, the behavior of this node depends upon the return type of FieldNameXPath. If FieldNameXPath returns an element, then this node will use the name of that tag as the name of this field. This behavior is equivalent to setting this property to "local-name()" for this particular field. However, if FieldNameXPath returns a text value, usually from an attribute or from an XPath ending with the "text()" path, then the node simply uses that string as the name of the field. In other words, this behavior is equivalent to having this property contain "string()" for this particular field.

A value of "local-name()" in this property would use the name of the returned elements as the field name. The "string()" function will set the names of the fields to the body of text that the nodes contain.

In addition to the above examples, the user can use any XPath functions in this statement. For example, a counter could be appended to the field names in order to make them all unique, or a prefix could be added to mark which type of XML file the records came from. Any valid XPath function is acceptable.

For example, let's say that the following is the input XML:

<employees> 
	<employee> 
		<name>John Hobbes</name>
		<supervisor>
			<name>Robert Bradley</name> 
		</supervisor> 
	</employee> 
</employees>

If we want to output all the elements with the name "name", then we will have a problem since there are two fields named "name": one for the employees and one for the supervisor. However, this name collision is avoided if FieldNameFunction is set to the following:

concat(local-name(parent::node()), " ", local-name())

FieldValueXPath

Optionally specify an absolute XPath expression that identifies the values to be output. In nearly all cases, this property should have a value that is located underneath RecordXPath. This XPath is case sensitive.

In order to read out simple elements, this property should select the element, and the FieldValueFunction property should contain "string()". Under these settings, the node will output only the text within the element tags. For more complex elements, such as lists, it is usually better to select the entire XML element in the FieldValueXPath while leaving the FieldValueFunction blank. This way, the node will output the list as a single element such that a follow-up Convert Tabular XML node can output a new record for each element in the list. This final case is covered in Example 5 of the node description.

If this property is blank, then FieldValueXPath will be given the value of FieldNameXPath if it is filled in. Otherwise, FieldValueXPath is set to the value of RecordXPath.

FieldValueFunction

Optionally specify an XPath function to run against the nodes that match FieldValueXPath. The return value of this function specifies the value of the fields. This property can be left blank if no function is to be run against the field value nodes. Any XPath function can be used in this statement.

When selecting text values from elements, this property should be set to "string()". When selecting complex elements, this property should be left blank.

Some users might be tempted to leave this property blank and append a "text()" node to the end of the FieldValueXPath. Please bear in mind that this will cause errors if there are empty tags (i.e., <tag/>) because they will match the FieldNameXPath but not the FieldValueXPath.

To demonstrate this error, let's say that we have the following XML input:

<catalog> 
	<cd> 
		<title/>
		<artist>Bonnie Tyler</artist>
		<country>UK</country>
		... 
	</cd>
</catalog>

With the following settings:

  • RecordXPath: /catalog/cd
  • FieldNameXPath: /catalog/cd/*
  • FieldNameFunction: local-name()
  • FieldValueXPath: /catalog/cd/*/text()
  • FieldValueFunction: <blank>

the FieldNameXPath would pick up the title tag while the FieldValueXPath would ignore it since the title tag has no text. These settings would cause an error. In order to process this data correctly, the following settings are needed:

  • RecordXPath: /catalog/cd
  • FieldNameXPath: /catalog/cd/*
  • FieldNameFunction: local-name()
  • FieldValueXPath: /catalog/cd/*
  • FieldValueFunction: string()

Under this setup, the title tag is picked up by the FieldNameXPath and FieldValueXPath since the tag matches /catalog/cd/*. The string() function then returns an empty string ("") for the title, thus avoiding the error in the first example.

Namespaces

Optionally specify any namespaces used to identify unambiguously an element in the XML data.

Each line contains one namespace declaration. The namespace declaration can have one of the following syntaxes:

1. xmlns:<prefix>="<uri>" 2. <prefix>="<uri>" 3. <prefix>

where

  • xmlns is the xmlns attribute in XML data
  • <prefix> is a user-defined prefix for the namespace, and <prefix> here can be different from the prefix used in the XML data
  • <uri> is the URI for the namespace, and it has to match a namespace uri in the XML data

If <uri> is missing in the declaration, the corresponding prefix will be associated with the default namespace.

Note that Convert Tabular XML does not look up for information in the namespace <uri> .

Example:

If in the XML data, a namespace is defined as: xmlns:foo="http://Namespace-name-Foo", and in this property, a namespace is defined as: xmlns:foofoo="http://Namespace-name-Foo". All entities prefixed by "foofoo" in the Xpath Expression properties in this node will be matched against all entities prefixed by "foo" in the XML data.

If duplicate prefixes are specified in the property, the latter ones will overwrite the previous ones.

TrimWhitespace

Optionally specify whether to trim leading and trailing whitespace from each output field.

The default value is True.

PassThroughFields

Optionally specify which input fields will "pass through" the node unchanged from the input to the output. Choose from:

  • All - Passes through all the input data fields to the output.
  • None - Passes none of the input data fields to the output; as such, only the fields created by the node appear on the output.
  • Used - Passes through all the fields that the node used to create the output.
  • Unused - Passes through all the fields that the node did not use to create the output.

The default value is Unused.

NullValueBehavior

Optionally specify how this node will handle situations in which the input data field is NULL, an empty string (""), or contains only whitespace. For Tabular XML File, this property is applied to values in the File property. For Convert Tabular XML, this property is applied to input values in the Data property. Choose from:

  • Error - The node will throw an error and stop processing when the input field evaluates to a NULL, an empty string, or all whitespace
  • Log - The node logs that this situation occurred and continues processing without outputting a record
  • Ignore - The node continues to process the input data without any output or log message.
  • Emit - The node emits the output record per the PassThroughFields property. All generated fields will contain NULL. If the first record contains a NULL or blank entry, then the node will fail as it will be unable to define the generated fields for that record.

The default value is Error.

SubstituteInvalidCharacters

There are some reserved characters which cannot appear in metadata. This optional property defines what to do if invalid characters appear in the input data and are to be used in the record metadata.

  • True - Invalid characters are substituted for acceptable BRD metadata characters.
  • False - The node errors and stops processing.

The default value is False.

Note: This affects both characters that are reserved characters in the metadata (for example, newline, ':') and characters which are not in the Data360 Analyze server's character set. If this property is set to True, the node will substitute such characters into a valid alternative such that the field names in the metadata are valid.

Inputs and outputs

Inputs: 1 optional (input fields).

Outputs: data.