Provides a node interface for using the Lavastorm Data Reader (LDR) to acquire data.
In general, all that should be required to use this node is to populate the InputSpec property with a DRIX and the OutputSpec property with a DROX, and specify the file(s) to import.
Supports most well-structured, deterministic file formats.
Properties
Filename
Optionally click the folder icon and browse to the file that you want to import.
Use this property when you are only importing a single file. If you use this property, do not also use the FilenameFieldExpr property.
FilenameFieldExpr
Optionally specify the name of the input field containing the absolute filenames to load.
Used for cases when multiple data files are to be loaded. The data files will all be read with the same input and output specifications.
The output metadata for each of the output files must be the same (no extra fields in arrays etc). Cannot be used in conjunction with the Filename property.
InputSpec
The specification in this property defines the structure of the file that is to be read.
This must conform to the LDR Input Specification xsd, and is validated against the xsd. If a DRIX file has been constructed for this data format, and stored outside the node on disk, this DRIX field will simply contain a primaryField, which references a type from a library included in an includes tag. For example:
<drix>
<include name="MySpec" minimumVersion="1"/>
<primaryField name="field" type="FileType"/>
</drix>
Would look for the MySpec
library, with a version>=1
, in the specified library.
Once found, the specification is loaded, and the FileType
type in the library will be used as the primary field.
This means that the type declaration FileType
must contain sufficient information to parse the input file.
OutputSpec
Optionally specify the structure of the output.
This must conform to the LDR Output Specification xsd, and is validated against the xsd. In general, this DROX field will simply contain details of the outputs of this node, and the fields to include in these outputs.
An simple example output would be:
<drox> <output name="Output1" mapping="Mapping1"/> <mapping name="Mapping1"> <includes> <fields> <pattern pattern="file.record.#.*"/> </fields> </includes>
<excludes> <fields> <pattern pattern= "file.record.field1.*"/> </fields> </excludes> </mapping> </drox>
In this example, we are saying that we want to output all of the emittable fields lying under constructed types under the file.record field, except for all of the emittable fields under the file.record.field1 field.
When using multiple input files, it is generally recommended that the auto-generated fileId be included in each of the outputs.
A number of different properties affect how the outputs defined in the output specification are mapped to the node output pins.
In general, the name specified in the output specification must match a node output pin name for it to be written to that pin.
If, however, there is no output pin called with the same name as a DROX output, then there are a number of options:
1. If MetadataPinName is set, then the associated Metadata output pin will be used and contain the names of all outputs that are not written to a pin. a. If, in addition to this, WriteUnpinnedOutputs is set to True, then the unpinned outputs will be written to file, and referenced in the output specified by the MetadataPinName b. If, however, WriteUnpinnedOutputs is set to False, then the output is not written to file, but the output name is written to the Metadata Pin 2. If MetadataPinName is not set, then no Metadata output pin is used, and all unpinned outputs will be ignored, with no indications in the output that it has been ignored.
WriteUnpinnedOutputs
Optionally specify how to handle outputs which do not have associated output pins.
If a MetadataPinName is set, and WriteUnpinnedOutputs is set to True, then these outputs will be written to file and the filename referenced in the metadata output.
If a MetadataPinName is set, and WriteUnpinnedOutputs is set to False, then these outputs will be ignored, however the fact that they are ignored is noted in the metadata output.
If a MetadataPinName is not set, WriteUnpinnedOutputs has no effect.
The default value is True.
DefaultByteOrder
Optionally specify the byte order used to read the data file. This is then the default byte order for all types in the LDR. For types where byteOrder is important, this can generally be overriden via the bigEndian parameter on the corresponding type. Choose from:
- BIG_ENDIAN
- LITTLE_ENDIAN
- NATIVE (based on the native byte order of the machine on which the Data360 Analyze Server is running)
The default value is BIG_ENDIAN.
LibrarySearchPaths
Optionally specify the absolute path of the directories containing any referenced DRIX files.
Each path entry must be on a new line.
All search paths must exist at a location on the file system where the Data360 Analyze server is located.
For example: {{nil.brain.home}}/lib/ldr/commonTypes
{{nil.brain.home}}/lib/ldr/converters/asn1
MetadataPinName
Optionally specify the name of the output pin to be used for Metadata output.
Metadata contains a list of all of the different files, and file types that were read, and a list of the outputs and files that were created, and outputs that were ignored. If this property is not set, no Metadata output is created.
LdrClasspaths
Optionally specify entries that are to be added to the classpath used by the node. Any external jars required by the node should be placed in this property. Each entry must be placed on a newline.
BigToString
Optionally specify whether java.lang.BigInteger and java.lang.BigDecimal fields should be converted to Strings and output as such. The internal BRD formats numeric types (double, long) cannot handle numeric values greater than 8 bytes long. If these types are not converted to String, then they must be output as byte array data.
The default value is True.
StringToUnicode
Optionally specify whether or not all java Strings should be output as Unicode in the output records
The default value is False.
ByteArrayToBase64
Optionally specify whether or not byte arrays can be output as Base64 encoded strings. When this is set to False, byte arrays cannot be output by this node.
The default value is False.
ErrorPinName
Optionally specify the name of the output to which error and warning details are sent.
An error output pin must always be present in the Data reader node.
MaxNumberRecoverableErrors
Optionally specify the number of errors that can be encountered on each file, before the file is considered corrupt/not matching the specification, and processing on this file stops.
The function of the node after reaching the error threshold is dependant on the FailedOnErroredFile property.
When this property is set to -1, there is no maximum threshold, and the node can continue encountering recoverable errors until the file is read.
The threshold settings are used to determine what is classified as a recoverable error to contribute to the count, and what can be ignored.
RecoverableErrorThreshold
Optionally specify the threshold for which errors contribute to the recoverable error count.
For all errors which have an ErrorLevel greater than or equal to the error threshold, the running count of recoverable errors encountered is incremented.
The default value is recoverableError.
LoggableErrorThreshold
Optionally specify the threshold for which errors are logged.
For all errors which have an ErrorLevel greater than or equal to the log threshold, a new entry will be added to the error log.
The default value is warning.
SignAndOverflowReportingLevel
Optionally specify the error level to use when there is a *possibility* of Sign and Overflow errors occurring. Choose from:
- ignore
- info
- warning
- recoverable error
- fatal error
When set to ignore, no sign and overflow checking is performed, therefore no sign and overflow errors will be logged.
When set to all other error levels, sign and overflow checking is performed. If there is a *possibility* of a sign or overflow error (due to for example a short argument being passed to a property expecting a byte), then the message will be logged with the specified level.
If an actual error occurs (for example where an unsigned short value of 65535 is passed to a property expecting an unsigned byte), then the SignAndOverflowErrorLevel property is used to determine how the error should be logged.
If not set, the default for this property is based on the value of the ls.brain.node.(node).mathOverflowBehavior property.
Where the following ls.brain.node.(node).mathOverflowBehavior -> signAndOverflowErrorReporting mapping is performed:
ignore -> ignore
log -> warning
error -> warning
SignAndOverflowErrorLevel
Optionally specify the error level to use when a sign or overflow error occurs in the data. Choose from:
- ignore
- info
- warning
- recoverableError
- fatalError
When set to ignore, no sign and overflow errors will be logged in the case of actual sign and overflow conversions. Note that the SignAndOverflowReporting property determines whether or not sign and overflow checks are performed, and how to log the *possibility* of sign and overflow errors.
Therefore, even if this property is set to fatalError, no sign and overflow errors will be logged if the SignAndOverflowReporting property is set to ignore. Further, even if this property is set to ignore, if the SignAndOverflowReporting property is set to fatalError, and there is the *possibility* of a sign or overflow error, this *possibility* will force a fatal error.
When this, and the SignAndOverflowReporting property are both set to something other than ignore, then if an actual error occurs (for example where an unsigned short value of 65535 is passed to a property expecting an unsigned byte), then this property is used to determine how the error should be logged.
If not set, the default for this property is based on the value of the ls.brain.node.(node).mathOverflowBehavior property.
Where the following ls.brain.node.(node).mathOverflowBehavior -> signAndOverflowErrorReporting mapping is performed:
ignore -> ignore
log -> warning
error -> fatalError
DeprecatedErrorLevel
Optionally specify the error level with which deprecation errors will be raised.
The default value is warning.
FailOnErroredFile
Optionally specify the operation of the node after encountering a file which is corrupt/not matching specification.
This can happen in two cases:
1. The number of recoverable errors encountered in the file reaches the threshold set in the RecoverableErrorThreshold property.
2. A non-recoverable error is encountered in the file (file cannot be correctly scanned).
This property is ignored when we are using this node in single file mode, and the node will error after reaching the recoverable error threshold, or after encountering a non-recoverable error.
We consider single file mode any time we specify the data file through the Filename field.
If we have an input file list, and use the FilenameFieldExpr - this is considered the multi-file case, even if we have only one file specified in the input.
In the case of multiple files:
When this property is set to True, the node will error.
When this property is set to False, the file is skipped, a note written in the error output pin (if this exists), and reading progresses to the next file.
CreateParseTrace
Optionally specify whether or not the LDR is to be used in debug mode.
If this is set to True, then an associated ParseOutputSpecification must be provided to inform what we are debugging.
Using the LDR in debug mode can greatly impact performance, and lead to extremely large debug parse log files being written to disk if not used correctly.
The parse trace provides an account of all of the fields that the LDR is attempting to parse, where it is successful, and where parsing fails.
The default value is False.
ParseOutputSpecification
Optionally specify what to provide debug parse information for.
Allows for specific field investigation, investigation between specific byte locations, and can be restricted to a maximum number of records output.
Using the LDR in debug mode can greatly impact performance, and lead to extremely large debug parse log files being written to disk if not used correctly.
This parse trace specification is ignored if CreateParseTrace is not set to True.
The parse trace specification has the following structure:
<debug>
Top level container of all configuration.
Must contain one (and only one) of the elements <filename> and <output>.
May contain one element <filePosition> - must occur after the <filename> or <output> element
May contain one element <maxOutput> - must occur after the <filename> or <output> element
May contain any number of <fieldRestriction> elements - must occur after the <filename> or <output> element.
<filename>
Specifies a fully qualified filename to which the debug output is to be written.
Either this, or the <output> element must exist, but not both.
e.g. <filename>/home/tmp/debugOutput.brd</filename>
<output>
Specifies the name of a node output pin to which the debug output is to be written
Either this, or the <filename> element must exist, but not both.
e.g.<output>debug<output>
<filePositions>
Specifies for which bytes of the file debug output is to be produced.
If not present, then debug information will be produced for all bytes in the file Attributes:
byteStart - Specifies the first byte in the file for which debug information is to be produced (optional, defaults to first byte in file)
byteEnd - Specifies the last bye in the file for which debug information is to be produced (optional, defaults to last byte in file)
e.g. <filePositions byteStart="1000" byteEnd="1100"/>
<maxOutput>
Specifies the maximum number of output records produced.
e.g. <maxOutput>1000</maxOutput>
<fieldRestriction>
Specifies for which fields debug output is to be produces.
If not present, debug output will be produced for all fields Attributes:
fieldName - The name of the field which is to be output (required)
matchOption
How the fieldName attribute is to be matched against the field names in the DRIX (required).
Allowable options : "exact" - field name in DRIX must match exactly to the fieldName attribute) "contains" - field name in DRIX must contain the string in the fieldName attribute) "regex" - field name in DRIX must match the regular expression in the fieldName attribute
nestLevel
For each field that matches a fieldRestriction, specifies for how many nested levels of subFields under this field debug output will be produced (optional).
If not provided, all subFields under a matching field will produce debug output.
When multiple fieldRestrictions are present, output will be produced when any of the fieldRestrictions are satisfied.
Inputs and outputs
Inputs: multiple optional.
Outputs: multiple optional.