Schema Settings for COBOL Copybooks - 17.1

Inline Quality and Discovery

Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Inline Quality and Discovery

The following table explains the schema settings for COBOL Copybook data sources.

Option Description
Byte order

Controls order of bytes in binary word values (16 bit values).

  • Mainframe systems (EBCDIC) tend to be Big Endian, which means that the first byte (8 bits) of a word is the high byte, and the second byte is the low byte.
  • Intel based platforms such as PCs are Little Endian, which results in the bytes being swapped around.
  • UNIX platforms could be either, but generally Intel based platforms are Little Endian and all others are Big Endian. If you are loading data from a platform that is not Big Endian (such as binary PC data) then you must select Little Endian for the data to be interpreted correctly.
Data alignment

Specifies how data is stored and depends solely on the compiler that generates the application that created the data.

In general, IBM mainframe (MVS) data is stored using the two-byte method and ICL/PC (Microfocus compiler) data is stored using the one-byte method.

IBM MVS half word alignments:

pic digits 0123456789012345678

byte length 0222244444888888888

Intel, ICL, single byte alignments:

pic digits 0123456789012345678

byte length 0112233444555667788

There are several ways to ascertain which method is used:

  • Consult the original COBOL compiler documentation used to create the application where the data originated.
  • Ask someone who knows how the original data was created and by which system (generally, systems programmers will know this).
  • Assume IBM mainframe data is two-byte aligned, and everything else is one-byte aligned.

Try the Preview option to see which setting works best for the file. If you select the incorrect setting, generally the record contents after the COMP field will be misaligned.

Record delimiter

Specifies how records are delimited in your data file.

Typically, COBOL data files are fixed length and have no record delimiter. However, if COBOL data is exported from the original application and transferred into other file systems (especially UNIX), they could contain record delimiters added by the export or transfer process.

If the file originated from

  • Windows—select CR/LF.
  • UNIX—select LF.
  • IBM mainframe—select None.
Note: The preview pane will attempt to display end of line characters. These characters will differ depending on what font is selected.
Redefines

Select:

All to account for all REDEFINES clauses in a copybook. If this option is selected and the system encounters a REDEFINES clause, it removes the REDEFINES clause and keeps both representations of the data in the copybook. The data file will be populated to match the copybook.

Note: You cannot selectively pick REDEFINES clauses to describe data areas. If this is the case, modify your copybook prior to importing data to a repository.
Note: can handle nested REDEFINES clauses.

Select:

First to ignore all REDEFINES clauses in the copybook.

Treat unsigned comp-3 fields as comp-6 Place a check in this box only if your COBOL compiler supports COMP-3 without an embedded sign.
Character Encoding: Controls the character set for the file. EBCDIC data is translated into a correct ASCII representation on load. Generally, UNIX COBOL files will be ASCII and IBM mainframe data will be EBCDIC.
National Character Encoding: If your COBOL copybooks define national data items holding Unicode strings (such as PIC clause containing "N" and USAGE NATIONAL) or your compiler options from the data source are set to NSYMBOL(NATIONAL) or CODEPAGE, you should specify the National Character encoding standard of the data source.