This deprecated node reads and converts an XML file, given a designated file specification.
XPath is used to select specific sections of the XML file to parse and then the user-defined code translates that file segment into the BRD format.
There are three main steps for setting up this node. First, the user must set the number and names of the output pins. The number of output pins determines the number of independent tables of data that can be output by this node. Second, the user needs either to provide the filename of a single XML file to be parsed or provide the name of the input column containing the names of the files to be parsed. Third, the user needs to provide some Python code that contains rules for parsing the XML. This third item is discussed in more detail in the Code property.
Properties
File
Click the folder icon and browse to the XML file that you want to import.
If you provide a value, you should leave the property FilenameExpr blank.
FilenameExpr
Specify an expression which identifies the input column that contains the filenames of the input XML files. If you provide a value, you should leave the property File blank.
FilenameOutputField
Optionally specify the name of a field to add to the output that contains the corresponding input filename. If left blank, no filename field is added to the output.
Code
The Code property allows the user to define rules to parse XML. This property accepts multiple XPath queries and calls the function associated with these XPaths on each match. Let's imagine that we have the following XML file:
<corporateDirectory>
<division type="Corporate">
<entry>
<name>Antone</name>
<title>Director</title>
<city>Boston</city>
</entry>
<entry>
<name>Michelle</name>
<title>Business Analyst</title>
<city>Melbourne</city>
</entry>
<!-- ...thousands of additional entries... -->
</division>
<division type="Engineering">
<entry>
<name>Takeshi</name>
<title>Software Engineer</title>
<city>Kyoto</city>
</entry>
<!-- ...thousands of additional entrie... -->
</division>
</corporateDirectory>
Let's output a table with the name, title, and division of all employees, one employee per row. In this case, it is not feasible to parse /corporateDirectory/division in one function since that would use too much RAM. Therefore, we use two functions: one to retrieve /corporateDirectory/division/@type and another to output /corporateDirectory/division/entry along with @type. These two functions load only small parts of the file into RAM.
Here's the code to do this:
data = {}
@attributeHandler('/corporateDirectory/division/@type')
def TypeHandler(attr): data['Division'] = attr.nodeValue
@elementHandler('/corporateDirectory/division/entry')
def EntryHandler(element): data['Name'] = element.name
data['Title'] = element.title
outputRecord(data, 0)
And here is the output table:
Name Title Division
Antoine Director Corporate
Michelle Business Analyst Corporate
...
Takeshi Software Engineer Engineering
...
ExtraFieldBehavior
Optionally specify the behavior of this node when the user attempts to output data in columns that are not defined in the output metadata. Choose from:
- Error - If the user attempts to output a column that is not in the output metadata, then this node fails. The log will report information on the offending column.
- Log - Behaves just like the Error case, but the node does not fail. Rather, it continues processing the rest of the XML file.
- Ignore - The extraneous column is ignored completely. No message is written to the log.
The default value is Error.
MissingFieldBehavior
Optionally specify the behavior of this node when no data is provided for a column in the output metadata. Choose from:
- Error - If the user does not provide any data for a column in the output metadata, then this node fails. The log will report information on which column was skipped.
- Log - Behaves just like the Error case, but the node does not fail. Rather, it continues processing the rest of the XML file.
- Ignore - The missing column is ignored completely. No message is written to the log.
The default value is Ignore.
TypeCoercionBehavior
Optionally specify the behavior of the XML node when the user implicitly sets the metadata and when a data value does not map to a BRD type. Choose from:
- Error - If the type of any of the data values in the output do not match a BRD type, then this node fails. The log will report information on the name of the key that failed.
- Coerce to Unicode - If the type of any of the data values in the output do not match a BRD type, then this data value is converted to Unicode before it is output. The type of the output metadata for this key is set to Unicode, and all future output to this field are coerced to Unicode.
- Coerce to String - If the type of any of the data values in the output do not match a BRD type, then this data value is converted to an ASCII string before it is output. The type of the output metadata for this key is set to string, and all future output to this field are coerced to string.
The default value is Error.
Inputs and outputs
Inputs: 1 optional.
Outputs: out1, multiple optional.