Excel Output - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
ft:lastEdition
2024-07-09
ft:lastPublication
2024-07-09T15:09:58.774265

The Excel Output node allows you to output your Analysis data to a Microsoft Excel .xlsx file. On the cloud version of the product, this file must be pushed to Amazon S3. On the Enterprise version of the product, this file must be pushed to the system's HDFS or file path.

Details tab

Output Path

Specify the path to the Amazon S3 bucket or the HDFS/file system path that the Excel file will be pushed to.Formatting of paths is as follows:

s3:///<bucket_name>/<path>

hdfs:///<path>

file:///<path>

Mapping

The Mapping grid allows you to specify a number of parameters, including:

  • Source - Specify an input source from which to pull fields of data. Note that the Excel Output node may accept multiple input sources.
  • Array Field - Optionally reference an array field within a Source, that you want to use to select specific fields from the Source. The contents of the array field should be representative of the data you want to pull from the Source and output to the Excel file. Note that when using the Array Field parameter, fields will be pushed to the Excel file in the order in which they appear in the array. Without the Array Field parameter, field order is not guaranteed.
  • Default Sheet Name - This is a required parameter. The value entered is used as the default value for sheet names in the Excel file when sheet names are not specified with the Sheet Name Field parameter.
  • Sheet Name Field - Optionally create sheet names in the Excel file. When a Sheet Name Field is specified, the node will create 1 sheet for each unique value in the Sheet Name field, containing the records that correspond to that unique value.

Basic Excel Output node example

Input Data, aka ‘

field1

field2

bill

c

jane

b

bob

a

myData':

Output Path: s3://someBucket/somePath

Source: myData

Default Sheet Name: sheet1

Excel Output Data in s3://someBucket/somePath:

Sheet name = sheet1

field2

field1

bill

c

jane

b

bob

a

Excel Output node example using Array Field

As shown in the basic example above, without an Array Field specified, outputted field order is not guaranteed to match what is shown in the Excel Output node UI.

To guarantee outputted field order, you could utilize an Array Field.

Suppose for example, we reconstructed the input data, myData, from the example above into the following Array Field.


			field3 = [
			{
			fieldNames: ["field1", "field2"], 
			fieldValues: ["c", "bill"]
			},
			{
			fieldNames: ["field1", "field2"], 
			fieldValues: ["b", "jane"]
			},
			{
			fieldNames: ["field1", "field2"],
			fieldValues: ["a", "bob"]
			}
		]

Were we to specify field3 as our Array Field and use the same Source, Output Path, and Default Sheet Name as the example above, we would produce the following Excel Output, with field order matching the field order of our Array Field.

Excel Output Data s3://someBucket/somePath:

Sheet name = sheet1

field1

field2

c

bill

b

jane

a

bob

Excel Output node example using Sheet Name Field

The Sheet Name Field parameter can be used to create sheet names in the Excel file. When a Sheet Name Field is specified, the node will create 1 sheet for each unique value in the Sheet Name field, containing the records that correspond to that unique value. Note that the values in the Sheet Name field will not be included as record values in the Excel output data.

For example, suppose you had the following data set:

field0

field1

field2

sheet1

c

bill

sheet1

b

jane

sheet2

a

bob

Were you to specify field0 as the Sheet Name Field, you would end up with the following Excel Output:

Sheet name = sheet1

field1

field2

c

bill

b

jane

Sheet name = sheet2

field1

field2

a

bob

Options tab

The Options tab allows you to further control how files are output to the file system that the Excel Output node pushes data to.

Use traditional folder structure when used for output check box - This check box allows you to output files to a traditional folder structure as opposed to a Hadoop folder structure.

Partition ByThis parameter allows you to select a set of fields to partition the Excel Output node's contents by. Importantly, the field(s) that you select to partition by will not be included in the data file contents; rather, a folder will be created for each unique combination of values within the partitioning field set.As a simple example, consider the following data set.

name

amount

grade

bill

100

A

ann

200

A

carl

300

A

leslie

400

B

tom

500

B

stacey

600

B

ben

700

C

jill

800

C

tony

900

C

quinn

1000

D

Were you to select grade as the Partition By field, you would end up with four folders, A, B, C, and D; and, each folder would contain the records that had the folder's grade, with only the name and amount fields present.For example, the A folder would look something like this:

 

Security tab

Configure the settings on the Security tab to enable the node to output data to an external storage account for Amazon or Azure installations.

Security Options Choose from:

  • Internal Account - Default option. Select Internal Account if S3 data is stored in the default region.
  • External Account - Select External Account if S3 data is stored in other regions apart from default region. The Access Key is the ID of the authenticated user that will connect to the AWS account. The Secret Key is the secret access key of the authenticated user that will connect to AWS.
  • Assume Role - Returns a set of temporary security credentials that you can use to access AWS resources that you might not normally have access to. The Role ARN is the Amazon Resource Name (ARN) of the role to assume. The Session Name is the identifier for the assumed role session.

Region - Choose the appropriate region.