Read from Variable Format File reads data from a file containing records of varying layout. Each record is read in as a list field. You can specify the tag that indicates the parent record type, and all other record types will become list fields under the parent.
Variable format files have these characteristics:
- Records in the file may have different fields, and different numbers of fields.
- Each record must contain a tag (usually a number) identifying the type of record.
- Hierarchical relationships are supported.
Example of a Variable Format File
This example shows a variable format file containing information about checking account activity for two customers, Joe Smith and Anne Johnson. In this example, the file is a delimited file that uses a comma as the field delimiter.
001 Joe,Smith,M,100 Main St,555-234-1290
100 CHK12904567,12/2/2007,6/1/2012,CHK
200 1000567,1/5/2012,Fashion Shoes,323.12
001 Anne,Johnson,F,1202 Lake St,555-222-4932
100 CHK238193875,1/21/2001,4/12/2012,CHK
200 1000232,3/5/2012,Blue Goose Grocery,132.11
200 1000232,3/8/2012,Trailway Bikes,540.00
The first field in each record contains the tag which identifies the type of record and therefore the record's format:
- 001: Customer record
- 100: Account record
- 200: Account transaction record
For delimited files it is common for the tag value (001, 100, 200) to be in a fixed number of bytes at the start of the record as shown in the above example.
Each record has its own format:
- 001: FirstName,LastName,Gender,Address,PhoneNumber
- 100: AccountID,DateOpened,ExpirationDate,TypeOfAccount
- 200: TransactionID,DateOfTransaction,Vendor,Amount
Record format 100 (account record) is a child of the previous 001 record, and record format 200 (account transaction record) is a child of the previous record 100 (account record). So in the example file, Joe Smith's account CHK12904567 had a transaction on 1/5/2012 in the amount of 323.12 at Fashion Shoes. Likewise, Anne Johnson's account CHK238193875 had two transactions, one on 3/5/2012 at Blue Goose Grocery and one on 3/8/2012 at Trailway Bikes.
File Properties Tab
Option Name | Description |
---|---|
Server name | Indicates whether the file you select as input is located on the computer running Spectrum Enterprise Designer or on the server. If you select a file on the local computer, the server name will be My Computer. If you select a file on the server the server name will be . |
File name | Specifies the path to the file. Click the ellipses
button (...) to go to the file you
want. You can read multiple files by using a wild card character to read data from multiple files in the directory. The wild card characters * and ? are supported. For example, you could specify *.csv to read in all files with a While reading a file from an HDFS file server, the compression formats supported are:
Note: The extension of the file indicates the compression
format to be used to decompress the file.
Attention: If the server
is running on Linux, remember that file names and paths on these
platforms are case sensitive.
|
Record type | The format of the records in the file. Select one
of:
|
Character encoding | The text file encoding. For more information, see Character Encodings. |
Record length |
For fixed width files, specifies the exact number of characters in each record. |
Field separator | Specifies the character used to separate fields in a delimited
file. For example, this record uses a pipe (|) as a field
separator:
These characters available to define as field separators are:
If the file uses a different character as a field separator, click the ellipses button to select another character as a delimiter. |
Tag separator | Specifies the character placed after the
tag field to demarcate the identifying field for each record in a
delimited file. A tag separator must be a single character. By default, these characters are available to be selected as tag separators:
If the file uses a different character as a tag separator, click the ellipses button to add and select a custom tag separator. Note: By default, the Record
separator character is the same as the selected
Field separator character. To enable
this field and select a different character, uncheck the
Same as Field separator
checkbox.
|
Same as Field separator | Indicates if the tag separator is the
same as the field separator. Uncheck this to select a different
character as the tag separator. Note: By default, this checkbox is
checked and the Tag separator field is
disabled.
|
Text qualifier |
The character used to surround text values in a delimited file. For example, this record uses double quotes (") as a text qualifier.
The characters available to define as text qualifiers are:
If the file uses a different text qualifier, click the ellipses button to select another character as a text qualifier. |
Record separator |
Specifies the character used to separate records in line a sequential or delimited file. This field is not available if you check the Use default EOL check box. The record separator settings available are:
If your file uses a different record separator, click the ellipses button to select another character as a record separator. |
Root tag name |
The tag to use for records that are a parent of other record types. For example if you have three record types 001, 100, and 200, and record types 100 and 200 are children of record type 001, then 001 is the root tag. |
Use fixed-width tags |
Specifies whether to allocate a fixed amount of space at the beginning of each record in which to place the record tag. This example shows a file with the tags 001, 100, and 200 in a fixed-width field:
|
Tag start position |
If you check the Use fixed-width tags box, this option specifies the position in each record where the tag begins. For example, if the tag begins in the fourth character in the record, you would specify 4. |
Tag width |
If you check the Use fixed-width tags box,
this option specifies the number of spaces to allocate for tags
starting from the position specified in the Tag start
position field. For example, if you specify 3 in the
Tag start position field and you specify
7 in the Tag width field, then positions 4
though 10 would be considered the record tag. The value you specify
must be large enough to include all the characters of the longest
tag name. The value in the Tag width field is automatically increased if you lengthen the tag name in the Root tag name field. The maximum tag width is 1024. |
Use default EOL |
Specifies that the file's record separator is the default end of line (EOL) character used on the operating system on which the server is running. Do not select this option if the file uses an EOL character that is different from the default EOL character used on the server's operating system. For example, if the file uses a Windows EOL but the server is running on Linux, do not check this option. Instead, select the Windows option in the Record separator field. |
Treat records with fewer fields than defined as malformed |
If you enable this option, child records that contain fewer
fields than a complete record are considered malformed. When a
malformed record is encountered, processing advances to the next
root tag, ignoring all child tags in between. An exception is
written to the log containing information about the malformed child
records along with a line number. Records are always considered
malformed in these situations, regardless of whether you enable
this option.
|
Fields Tab
The Fields tab specifies the characteristics of each field read in from the file.
Runtime Tab
Field Name | Description |
---|---|
File name |
Displays the file name selected in the first tab. |
Starting record |
If you want to skip records at the beginning of the file when reading records into the dataflow, specify the first record you want to read. For example, if you want to skip the first 50 records, in a file, specify 51. The 51st record will be the first record read into the dataflow. |
All records |
Select this option if you want to read all records starting from the record specified in the Starting record field to the end of the file. |
Max records |
Select this option if you want to only read in a certain number of records starting from the record specified in the Starting record field. For example, if you want to read the first 100 records, select this option and enter 100. |