The Excel Output node allows you to output your Analysis data to a Microsoft Excel .xlsx file. On the cloud version of the product, this file must be pushed to Amazon S3. On the Enterprise version of the product, this file must be pushed to the system's HDFS or file path.
Details tab
Output Path
Specify the path to the Amazon S3 bucket or the HDFS/file system path that the Excel file will be pushed to.Formatting of paths is as follows:
s3:///<bucket_name>/<path>
hdfs:///<path>
file:///<path>
Mapping
The Mapping grid allows you to specify a number of parameters, including:
- Source - Specify an input source from which to pull fields of data. Note that the Excel Output node may accept multiple input sources.
- Array Field - Optionally reference an array field within a Source, that you want to use to select specific fields from the Source. The contents of the array field should be representative of the data you want to pull from the Source and output to the Excel file. Note that when using the Array Field parameter, fields will be pushed to the Excel file in the order in which they appear in the array. Without the Array Field parameter, field order is not guaranteed.
- Default Sheet Name - This is a required parameter. The value entered is used as the default value for sheet names in the Excel file when sheet names are not specified with the Sheet Name Field parameter.
- Sheet Name Field - Optionally create sheet names in the Excel file. When a Sheet Name Field is specified, the node will create 1 sheet for each unique value in the Sheet Name field, containing the records that correspond to that unique value.
Basic Excel Output node example
Input Data, aka ‘
|
field1 |
field2 |
|---|---|
|
bill |
c |
|
jane |
b |
|
bob |
a |
myData':
Output Path: s3://someBucket/somePath
Source: myData
Default Sheet Name: sheet1
Excel Output Data in s3://someBucket/somePath:
Sheet name = sheet1
|
field2 |
field1 |
|---|---|
|
bill |
c |
|
jane |
b |
|
bob |
a |
Excel Output node example using Array Field
As shown in the basic example above, without an Array Field specified, outputted field order is not guaranteed to match what is shown in the Excel Output node UI.
To guarantee outputted field order, you could utilize an Array Field.
Suppose for example, we reconstructed the input data, myData, from the example above into the following Array Field.
field3 = [
{
fieldNames: ["field1", "field2"],
fieldValues: ["c", "bill"]
},
{
fieldNames: ["field1", "field2"],
fieldValues: ["b", "jane"]
},
{
fieldNames: ["field1", "field2"],
fieldValues: ["a", "bob"]
}
]
Were we to specify field3 as our Array Field and use the same Source, Output Path, and Default Sheet Name as the example above, we would produce the following Excel Output, with field order matching the field order of our Array Field.
Excel Output Data s3://someBucket/somePath:
Sheet name = sheet1
|
field1 |
field2 |
|---|---|
|
c |
bill |
|
b |
jane |
|
a |
bob |
Excel Output node example using Sheet Name Field
The Sheet Name Field parameter can be used to create sheet names in the Excel file. When a Sheet Name Field is specified, the node will create 1 sheet for each unique value in the Sheet Name field, containing the records that correspond to that unique value. Note that the values in the Sheet Name field will not be included as record values in the Excel output data.
For example, suppose you had the following data set:
|
field0 |
field1 |
field2 |
|---|---|---|
|
sheet1 |
c |
bill |
|
sheet1 |
b |
jane |
|
sheet2 |
a |
bob |
Were you to specify field0 as the Sheet Name Field, you would end up with the following Excel Output:
Sheet name = sheet1
|
field1 |
field2 |
|---|---|
|
c |
bill |
|
b |
jane |
Sheet name = sheet2
|
field1 |
field2 |
|---|---|
|
a |
bob |
Options tab
The Options tab allows you to further control how files are output to the file system that the Excel Output node pushes data to.
Use traditional folder structure when used for output check box - This check box allows you to output files to a traditional folder structure as opposed to a Hadoop folder structure.
Partition ByThis parameter allows you to select a set of fields to partition the Excel Output node's contents by. Importantly, the field(s) that you select to partition by will not be included in the data file contents; rather, a folder will be created for each unique combination of values within the partitioning field set.As a simple example, consider the following data set.
|
name |
amount |
grade |
|---|---|---|
|
bill |
100 |
A |
|
ann |
200 |
A |
|
carl |
300 |
A |
|
leslie |
400 |
B |
|
tom |
500 |
B |
|
stacey |
600 |
B |
|
ben |
700 |
C |
|
jill |
800 |
C |
|
tony |
900 |
C |
|
quinn |
1000 |
D |
Were you to select grade as the Partition By field, you would end up with four folders, A, B, C, and D; and, each folder would contain the records that had the folder's grade, with only the name and amount fields present.For example, the A folder would look something like this:
Security tab
Configure the settings on the Security tab to enable the node to output data to an external storage account for Amazon or Azure installations.
Security Options Choose from:
- Internal Account - Default option. Select Internal Account if S3 data is stored in the default region.
- External Account - Select External Account if S3 data is stored in other regions apart from default region. The Access Key is the ID of the authenticated user that will connect to the AWS account. The Secret Key is the secret access key of the authenticated user that will connect to AWS.
- Assume Role - Returns a set of temporary security credentials that you can use to access AWS resources that you might not normally have access to. The Role ARN is the Amazon Resource Name (ARN) of the role to assume. The Session Name is the identifier for the assumed role session.
Region - Choose the appropriate region.