Imports a file that contains data specified by one or more COBOL copybook specifications.
In general, when only one copybook is to be used, the COBOL File node should be used. However, if multiple copybooks are used, this node can be used to configure the DRIX and DROX to read the file.
The default InputSpec property shows how a DRIX could be constructed to read a file specified by a single copybook. In order to read a file specified by multiple copybooks, the same approach can be used. In order to define a field reads a fixed length COBOL record specified by the copybook at "c:\myCopybookFile.txt" to read a record, simply use:
<field name="MyCopybookRecord" type=".cobol.SimpleCobolRecord">
{{^BaseGeneratedTypeArgs^}}
<typeArg name="cobolSpecFilename" value="c:\myCopybookFile.txt"/>
</field>
In case of a variable length COBOL record, specified by the copybook at "c:\myCopybookFile.txt" to read a record, simply use:
<field name="MyCopybookRecord" type=".cobol.SimpleCobolRecord">
{{^BaseGeneratedTypeArgs^}}
{{^VariableTypeArg^}}
<typeArg name="cobolSpecFilename" value="c:\myCopybookFile.txt"/>
</field>
When multiple copybooks are to be read, then this node becomes even more useful. For example, consider the case where we have the following data format:
File ::= SubFile*
SubFile ::= CPID
Record
newline (\r\n)Record ::= RecordType1 | RecordType2 | RecordType3Where:
RecordType1 is specified by the copybook {{^copybookDir^}}/cp1.txt.
RecordType2 is specified by the copybook {{^copybookDir^}}/cp2.txt.
RecordType3 is specified by the copybook {{^copybookDir^}}/cp3.txt.
And
CPID == 1 indicates a RecordType1
CPID == 2 indicates a RecordType2
CPID == 3 indicates a RecordType3.And, assume that these SubFiles are placed into 2kb blocks.
Then, assuming the copybookDir property has been set somewhere and the files are fixed length, the following DRIX could be used to parse the data file:
<drix xmlns="http://schemas.lavastorm.com/ldr/drix">
<include library="cobol"/>
<include library="newline"/>
<include library="blocked"/>
<include library="integer"/>
<library name="cobolNode" version="000.000.000.001">
<namespace name="cobolNode">
<repeatRange min="1" max="unbounded">
<field type=".blocked.BlockedRecords{.cobolNode.SubFile}">
<arg name="blockSize" value="2048"/>
</field>
</repeatRange>
<field name="cpid" type=".integer.UInt8" readRequired="true"/>
<or>
<field name="recordType1" type="RecordType1">
<arg name="cpid">
<fromField field="cpid"/>
</arg>
</field>
<field name="recordType1" type="RecordType2">
<arg name="cpid">
<fromField field="cpid"/>
</arg>
</field>
<field name="recordType3" type="RecordType3">
<arg name="cpid">
<fromField field="cpid"/>
</arg>
</field>
</or>
<skip type=".string.newline.AsciiWindowsNewline"/>
<param name="cpid" javaType="int"/>
<field name="record" type=".cobol.SimpleCobolRecord">
{{^BaseGeneratedTypeArgs^}}
<typeArg name="cobolSpecFile">
<expr>"{{^copybookDir^}}/cp"+param().cpid()+".txt"</expr>
</typeArg>
</field>
<test expected="1">
<fromParam param="cpid"/>
</test>
<super/>
<test expected="2">
<fromParam param="cpid"/>
</test>
<super/>
<test expected="3">
<fromParam param="cpid"/>
</test>
<super/>
</namespace>
</library>
<primaryField type="cobolNode.File"/>
</drix>
Clearly, the different RecordType fields could also be split out into their own types, each testing the value of the cpid field and then placed in an or tag under the SubFile type between the cpid field and the newline skip. The OutputSpec would then also need to be changed to output these, and you would probably want to ensure that the different record types are all placed into separate output pins/files as shown below:
<?xml version="1.0" encoding="utf-8"?>
<drox>
<dumper name="CobolDumper" javaClass="com.lavastorm.ldr.converters.cobol.output.CobolOutputDefinitionGenerator"/>
<dump name="CobolDumper">
<include>
<fields>
<pattern pattern="record.rec.recordType1.record"/>
</fields>
</include>
</dump>
<dump name="CobolDumper">
<include>
<fields>
<pattern pattern="record.rec.recordType2.record"/>
</fields>
</include>
</dump>
<dump name="CobolDumper">
<include>
<fields>
<pattern pattern="record.rec.recordType3.record"/>
</fields>
</include>
</dump>
</drox>
Properties
copyLibDir
Optionally specify the directory within which the copylibs referenced by any COPY statements in the COBOL specification are to be found.
If there are no COPY statements, this property has no effect. If this property is set, it is used for the location of all copylibs. Or, if the COBOL copybook is specified in the COBOLCopybookFilename property, the directory containing the COBOL copybook is used for the location of all copylibs. Otherwise, you will receive an error.
strictFormatting
Optionally specify the formatting of the COBOL specification. If the specification is strictly formatted, the specification can only exist between columns 7 and 72. Column 6 is reserved for control chars. All data outside these margins is ignored. On the other hand, if this is not set, a free-format mode is set and no columns are ignored. If the first non whitespace char in the document is a control character, then this is used as a control.
encoding
Optionally specify the encoding used by the data specified in a COBOL copybook for all non-binary types. Choose from:
- ASCII - all non-binary items are ASCII encoded.
- EBCDIC - all non-binary items are EBCDIC encoded.
The default value is ASCII.
Filename
Optionally click the folder icon to browse to the file that you want to import.
Use this property when you are only importing a single file. If you use this property, do not also use the FilenameFieldExpr property.
FilenameFieldExpr
Optionally specify the name of the input field that contains the absolute filenames to import.
Use this property when importing multiple files. The data files will all be read with the same input and output specifications. The output metadata for each of the output files must be the same (no extra fields in arrays etc).
If you use this property, do not also use the Filename property.
InputSpec
Optionally specify the structure of the file that is to be read.
This must conform to the LDR Input Specification xsd, and is validated against the xsd. If a DRIX file has been constructed for this data format, and stored outside the node on disk, this DRIX field will simply contain a primaryField, which references a type from a library included in an includes tag.
For example:
<drix> <include name="MySpec" minimumVersion="1"/> <primaryField name="field" type="FileType"/> </drix>
Would look for the MySpec library, with a version>=1, in the specified library. Once found, the specification is loaded, and the FileType type in the library will be used as the primary field. This means that the type declaration FileType must contain sufficient information to parse the input file.
OutputSpec
Optionally specify the structure of the output.
This must conform to the LDR Output Specification xsd, and is validated against the xsd. In general, this DROX field will simply contain details of the outputs of this node, and the fields to include in these outputs.
An simple example output would be:
<drox> <output name="Output1" mapping="Mapping1"/> <mapping name="Mapping1"> <includes> <fields> <pattern pattern="file.record.#.*"/> </fields> </includes>
<excludes> <fields> <pattern pattern= "file.record.field1.*"/> </fields> </excludes> </mapping> </drox>
In this example, we are saying that we want to output all of the emittable fields lying under constructed types under the file.record field, except for all of the emittable fields under the file.record.field1 field.
When using multiple input files, it is generally recommended that the auto-generated fieId be included in each of the outputs. A number of different properties affect how the outputs defined in the output specification are mapped to the node output pins. In general, the name specified in the output specification must match a node output pin name for it to be written to that pin.
If, however, there is no output pin called with the same name as a DROX output, then there are a number of options:
1. If MetadataPinName is set, then the associated Metadata output pin will be used and contain the names of all outputs that are not written to a pin. a. If, in addition to this, WriteUnpinnedOutputs is set to True, then the unpinned outputs will be written to file, and referenced in the output specified by the MetadataPinName b. If, however, WriteUnpinnedOutputs is set to False, then the output is not written to file, but the output name is written to the Metadata Pin 2. If MetadataPinName is not set, then no Metadata output pin is used, and all unpinned outputs will be ignored, with no indications in the output that it has been ignored.
WriteUnpinnedOutputs
Optionally specify how to handle outputs which do not have associated output pins.
If a MetadataPinName is set, and WriteUnpinnedOutputs is set to True, then these outputs will be written to file and the filename referenced in the metadata output.
If a MetadataPinName is set, and WriteUnpinnedOutputs is set to False, then these outputs will be ignored, however the fact that they are ignored is noted in the metadata output.
If a MetadataPinName is not set, WriteUnpinnedOutputs has no effect.
The default value is True.
PadByte
Optionally specify the padding, or uninitialized byte which is used.
Often, COBOL records are initialized using a statement such as MOVE SPACES or MOVE ZEROES to the entire record. Alternatively, the entire record might not be initialized with any values. In such cases, if individual fields are not set, and the record is written, then the output will simply contain the default uninitialized values, or the values that were initially moved to the entire record.
In some cases, this cannot be distinguished from correct data (e.g. if " " is used, and we are trying to read an X field, or for any binary integer cases).
However, in other cases, this will lead to an invalid encoding which will throw errors. In such cases, the padByte should be set to the bytes used as initializers over the entire record. Then, this will be used while decoding the record to ensure that unset items are not recognized as bad data. If this padByte is not set, then no such checking is performed.
FixedOrVariable
Optionally specify whether the COBOL records are of a fixed or variable length.
Even if the COBOL records are of a variable length, the variable option should only be set if the output files have used a file organization or recording mode specifying variable length records.
In all other cases, fixed should be used, as variable length records in such cases will always be output padded up to the maximum allowable record length.
The default value is fixed.
storage
Optionally specify the storage alignment for all of the data specified in a COBOL copybook. Choose from:
- ST_1_8 - allows for all storage sizes (1,2,3,..8 etc)
- ST_1_2_4_8 - allows for storage size 1, 2, 4, 8, etc
- ST_2_4_8 - allows for storage size 2, 4, 8, etc
The default value is ST_2_4_8.
endian
Optionally specify the endian-ness of all of the data specified by a COBOL copybook. Endian-ness is the ordering of individually addressable sub-units (words, bytes, or even bits) within a longer data word stored in external memory.
The DEFAULT option specifies that the endian value to use will be taken from the defaultByteOrder property which sets the endian-ness for the entire file (and not just the copybook specified section of the file). Choose from:
- DEFAULT
- LITTLE_ENDIAN
- BIG_ENDIAN
- NATIVE
The default value is DEFAULT.
comp5Endian
Optionally specify the endian-ness of all the COMP-5 data specified by a COBOL copybook.
Endian-ness is the ordering of individually addressable sub-units (words, bytes, or even bits) within a longer data word stored in external memory.
The DEFAULT option specifies that the endian value to use will be taken from the general endian property. Choose from:
- DEFAULT
- LITTLE_ENDIAN
- BIG_ENDIAN
- NATIVE
The default value is DEFAULT.
compEndian
Optionally specify the endian-ness of all the COMP-4, COMP and BINARY data specified by a COBOL copybook. Endian-ness is the ordering of individually addressable sub-units (words, bytes, or even bits) within a longer data word stored in external memory.
The DEFAULT option specifies that the endian value to use will be taken from the general endian property. Choose from:
- DEFAULT
- LITTLE_ENDIAN
- BIG_ENDIAN
- NATIVE
The default value is DEFAULT.
binarySignMode
Optionally specify the encoding mechanism used for COMP-4 and COMP-5 binary fields in data specified in a COBOL copybook. Most pure binary integers use 2's-complement, but each vendor is free to choose their own method for all types. Choose from:
- ONES_COMP
- TWOS_COMP
- SIGN_MAG
- UNSIGNED
The default value is TWOS_COMP
floatEncoding
Optionally specify the floating point representation used by COMP-1 and COMP-2 fields. Generally will be IEE-754 except for the IBM floating point cases (where an IBM COBOL compiler/hardware is used).
Choose from:
- IEEE - standard IEEE-754 Floating point encoding.
- IBM - IBM float encoding.
The default value is IEEE.
sync
Optionally specify the alignment/synchronization for all elementary binary data is turned on by compiler directive or command line. Choose from:
- NO_SYNC - SYNC clause where present has no effect
- SYNC_16 - where a SYNC clause is present, all data items are sync'd on 16 bit boundaries regardless of the size of the data item
- SYNC_32 - where a SYNC clause is present, all data items are sync'd on 32 bit boundaries regardless of the size of the data item
- SYNC_2_4 - where a SYNC clause is present, all data items are sync'd according to the storageSize property, and the size of the data item - data items greater than 4 bytes will be sync'd to 4 bytes
- SYNC_VARY - where a SYNC clause is present, all data items are sync'd according to the storageSize property, and the size of the data item, there is no maximum byte size to which the sync will be applied.
The default value is NO_SYNC.
nationalCharset
Optionally specify the character set to use for national PIC clauses specified in the COBOL copybook.
The default value is UTF-16.
display1Charset
Optionally specify the character set to use for characters specified in the COBOL copybook with a USAGE DISPLAY-1 in the PIC clause .
The default value is UTF-16.
DefaultByteOrder
Optionally specify the byte order used to read the data file. This is then the default byte order for all types in the LDR. For types where byteOrder is important, this can generally be overridden via the bigEndian property on the corresponding type. Choose from:
- BIG_ENDIAN
- LITTLE_ENDIAN
- NATIVE (based on the native byte order of the machine on which the Data360 Analyze Server is running)
The default value is BIG_ENDIAN.
LibrarySearchPaths
Optionally specify the absolute path of the directories containing any referenced DRIX files. Each path entry must be on a new line.
All search paths must exist at a location on the file system where the Data360 Analyze Server is located.
Example: {{nil.brain.home}}/lib/ldr/commonTypes
{{nil.brain.home}}/lib/ldr/converters/asn1
MetadataPinName
Optionally specify the name of the output pin to be used for Metadata output.
Metadata contains a list of all of the different files, and file types that were read, and a list of the outputs and files that were created, and outputs that were ignored.
If this property is not set, then no Metadata output is created.
LdrClasspaths
Optionally specify entries that are to be added to the classpath used by the node. Any external jars required by the node should be placed in this property. Each entry must be placed on a newline.
BigToString
Optionally specify whether java.lang.BigInteger and java.lang.BigDecimal fields should be converted to Strings and output as such. The internal BRD formats numeric types (double, long) cannot handle numeric values greater than 8 bytes long. If these types are not converted to String, then they must be output as byte array data.
The default value is True.
StringToUnicode
Optionally specify whether or not all java Strings should be output as Unicode in the output records
The default value is False.
ByteArrayToBase64
Optionally specify whether or not byte arrays can be output as Base64 encoded strings. When this is set to False, byte arrays cannot be output by this node.
The default value is False.
RedefinesErrorLevel
Optionally specify the error level to use for error filtering on REDEFINES clauses.
REDEFINES clauses can cause a set of bytes in the data file to be read into multiple different types. The idea is that only one of these types will be valid. However, there is not enough information in the copybook or the data to determine which of the types is valid. Therefore, these cannot be simply placed into an "or" tag. Instead, each of these fields need to be read independently. In such cases, it is possible for one of the REDEFINES fields to have encoding errors.
Error filtering is then applied to these errors based on the specified RedefinesErrorLevel.
Choose from:
- ignore
- info
- warning
- recoverableError
- fatalError
- none
Where each of these options are handled according to the corresponding LDRException.ErrorLevel value, except for none. When none is selected, no error filtering is performed on redefines errors.
The default value is ignore.
ErrorPinName
Optionally specify the name of the output to which error and warning details are sent.
MaxNumberRecoverableErrors
Optionally specify the number of errors that can be encountered on each file, before the file is considered corrupt/not matching the specification, and processing on this file stops.
The function of the node after reaching the error threshold is dependant on the FailedOnErroredFile property.
When this property is set to -1, there is no maximum threshold, and the node can continue encountering recoverable errors until the file is read.
The threshold settings are used to determine what is classified as a recoverable error to contribute to the count, and what can be ignored.
RecoverableErrorThreshold
Optionally specify the threshold for which errors contribute to the recoverable error count.
For all errors which have an ErrorLevel greater than or equal to the error threshold, the running count of recoverable errors encountered is incremented.
The default value is recoverableError.
LoggableErrorThreshold
Optionally specify the threshold for which errors are logged.
For all errors which have an ErrorLevel greater than or equal to the log threshold, a new entry will be added to the error log.
The default value is warning.
SignAndOverflowReportingLevel
Optionally specify the error level to use when there is a *possibility* of Sign and Overflow errors occurring. Choose from:
- ignore
- info
- warning
- recoverable error
- fatal error
When set to ignore, no sign and overflow checking is performed, therefore no sign and overflow errors will be logged.
When set to all other error levels, sign and overflow checking is performed. If there is a *possibility* of a sign or overflow error (due to for example a short argument being passed to a property expecting a byte), then the message will be logged with the specified level.
If an actual error occurs (for example where an unsigned short value of 65535 is passed to a property expecting an unsigned byte), then the SignAndOverflowErrorLevel property is used to determine how the error should be logged.
If not set, the default for this property is based on the value of the ls.brain.node.(node).mathOverflowBehavior property.
Where the following ls.brain.node.(node).mathOverflowBehavior -> signAndOverflowErrorReporting mapping is performed:
ignore -> ignore
log -> warning
error -> warning
SignAndOverflowErrorLevel
Optionally specify the error level to use when a sign or overflow error occurs in the data. Choose from:
- ignore
- info
- warning
- recoverableError
- fatalError
When set to ignore, no sign and overflow errors will be logged in the case of actual sign and overflow conversions. Note that the SignAndOverflowReporting property determines whether or not sign and overflow checks are performed, and how to log the *possibility* of sign and overflow errors.
Therefore, even if this property is set to fatalError, no sign and overflow errors will be logged if the SignAndOverflowReporting property is set to ignore. Further, even if this property is set to ignore, if the SignAndOverflowReporting property is set to fatalError, and there is the *possibility* of a sign or overflow error, this *possibility* will force a fatal error.
When this, and the SignAndOverflowReporting property are both set to something other than ignore, then if an actual error occurs (for example where an unsigned short value of 65535 is passed to a property expecting an unsigned byte), then this property is used to determine how the error should be logged.
If not set, the default for this property is based on the value of the ls.brain.node.(node).mathOverflowBehavior property.
Where the following ls.brain.node.(node).mathOverflowBehavior -> signAndOverflowErrorReporting mapping is performed:
ignore -> ignore
log -> warning
error -> fatalError
DeprecatedErrorLevel
Optionally specify the error level with which deprecation errors will be raised.
The default value is warning.
FailOnErroredFile
Optionally specify the operation of the node after encountering a file which is corrupt/not matching specification.
This can happen in two cases:
1. The number of recoverable errors encountered in the file reaches the threshold set in the RecoverableErrorThreshold property.
2. A non-recoverable error is encountered in the file (file cannot be correctly scanned).
This property is ignored when we are using this node in single file mode, and the node will error after reaching the recoverable error threshold, or after encountering a non-recoverable error.
We consider single file mode any time we specify the data file through the Filename field.
If we have an input file list, and use the FilenameFieldExpr - this is considered the multi-file case, even if we have only one file specified in the input.
In the case of multiple files:
When this property is set to True, the node will error.
When this property is set to False, the file is skipped, a note written in the error output pin (if this exists), and reading progresses to the next file.
CreateParseTrace
Optionally specify whether or not the LDR is to be used in debug mode.
If this is set to True, then an associated ParseOutputSpecification must be provided to inform what we are debugging.
Using the LDR in debug mode can greatly impact performance, and lead to extremely large debug parse log files being written to disk if not used correctly.
The parse trace provides an account of all of the fields that the LDR is attempting to parse, where it is successful, and where parsing fails.
The default value is False.
ParseOutputSpecification
Optionally specify what to provide debug parse information for.
Allows for specific field investigation, investigation between specific byte locations, and can be restricted to a maximum number of records output.
Using the LDR in debug mode can greatly impact performance, and lead to extremely large debug parse log files being written to disk if not used correctly.
This parse trace specification is ignored if CreateParseTrace is not set to True.
The parse trace specification has the following structure:
<debug>
Top level container of all configuration.
Must contain one (and only one) of the elements <filename> and <output>.
May contain one element <filePosition> - must occur after the <filename> or <output> element
May contain one element <maxOutput> - must occur after the <filename> or <output> element
May contain any number of <fieldRestriction> elements - must occur after the <filename> or <output> element.
<filename>
Specifies a fully qualified filename to which the debug output is to be written.
Either this, or the <output> element must exist, but not both.
e.g. <filename>/home/tmp/debugOutput.brd</filename>
<output>
Specifies the name of a node output pin to which the debug output is to be written
Either this, or the <filename> element must exist, but not both.
e.g. <output>debug<output>
<filePositions>
Specifies for which bytes of the file debug output is to be produced.
If not present, then debug information will be produced for all bytes in the file Attributes:
byteStart - Specifies the first byte in the file for which debug information is to be produced (optional, defaults to first byte in file)
byteEnd - Specifies the last bye in the file for which debug information is to be produced (optional, defaults to last byte in file)
e.g. <filePositions byteStart="1000" byteEnd="1100"/>
<maxOutput>
Specifies the maximum number of output records produced.
e.g. <maxOutput>1000</maxOutput>
<fieldRestriction>
Specifies for which fields debug output is to be produces.
If not present, debug output will be produced for all fields Attributes:
fieldName - The name of the field which is to be output (required)
matchOption
How the fieldName attribute is to be matched against the field names in the DRIX (required).
Allowable options : "exact" - field name in DRIX must match exactly to the fieldName attribute) "contains" - field name in DRIX must contain the string in the fieldName attribute) "regex" - field name in DRIX must match the regular expression in the fieldName attribute
nestLevel
For each field that matches a fieldRestriction, specifies for how many nested levels of subFields under this field debug output will be produced (optional).
If not provided, all subFields under a matching field will produce debug output.
When multiple fieldRestrictions are present, output will be produced when any of the fieldRestrictions are satisfied.
Inputs and outputs
Inputs: multiple optional.
Outputs: multiple optional.