S3 Get - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Downloads files from a server using the S3 protocol.

The S3 nodes enable you to access data from AWS S3 buckets, so that you can integrate your data flows with Amazon. See:

Downloading a single file from Amazon S3

  1. In the Bucket property, specify the name of an AWS bucket that contains the file that you want to download.
  2. In the Object property, specify the name of the AWS file that you want to download.
  3. In the Directory property, specify where to store the file that is downloaded.

Downloading multiple files from Amazon S3

  1. Drag an S3 List node onto the canvas and connect it to an S3 Get node.
  2. In the Bucket property of the S3 List node, type the name of the AWS bucket which contains the files that you want to download. Run the S3 List node to generate a list of files in the specified bucket.
  3. On the S3 Get node, select the (from Field) variant of the Bucket property and specify the name of the input field that contains the AWS bucket name.
  4. Select the (from Field) variant of the Object property, then specify the name of the input field that references the files that you want to download.
  5. In the Directory property, specify where to store the downloaded files.

Example

You want to download all files with the "Product A_" prefix from an AWS bucket:

  1. Drag an S3 List node onto the canvas.
  2. In the Bucket property, type the name of the AWS bucket for which you want to list files, e.g. my.example.bucket
  3. In the ObjectPrefix property, type Product.
  4. Run the S3 List node and connect it to an S3 Get node.

    The output of the S3 List node is as follows:

    Bucket

    unicode

    Object

    unicode

    Owner

    unicode

    Size

    long

    Modified

    datetime

    StorageClass

    unicode

    my.example.bucket Product_A.txt IT 25 2017-10-27T08:04:58-0400 STANDARD
    my.example.bucket Product_B.txt IT 25 2017-10-27T08:04:58-0400 STANDARD
    my.example.bucket Product_C.txt IT 25 2017-10-27T08:04:58-0400 STANDARD
    my.example.bucket Product_D.txt IT 25 2017-10-27T08:04:58-0400 STANDARD

  5. On the S3 Get node, select the (fromField) variant of the Bucket property and type the name of the input field that contains the AWS bucket name, in this case Bucket.
  6. Select the (from Field) variant of the Object property, then specify the name of the input field that references the files that you want to download, in this case Object.
  7. In the Directory property, specify where to store the downloaded files, e.g. C:\Users\<username>\Documents\S3 download.
  8. Run the S3 Get node to download the files from AWS.

By default, authentication is based on the AWS Default Credential Provider Chain. Alternatively, you can choose to specify an AccessKey and a SecretKey, or for cross-account access (for example when your S3 bucket is located in a different account to your Data360 Analyze instance) you can use instance profiles by configuring the RoleARN and RoleSessionName properties. For more information, see https://aws.amazon.com/premiumsupport/knowledge-center/cross-account-access-s3/

Tip: For additional information on interacting with AWS, see the AWS Documentation.

Properties

Bucket

Specify the name of the AWS Bucket.

Choose the (from Field) variant of this property to look up the value from an input field with the name specified. A value is required for this property.

Object

Specify the name of the AWS Object.

Choose the (from Field) variant of this property to look up the value from an input field with the name specified. A value is required for this property.

AccessKey

Optionally specify the AWS Access Key.

If the Access Key is not set then the AWS default strategy will be used.

Note: If you do not want to use an access key, authentication will fall back to the AWS Default Credential Provider Chain. For more information, see https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html

SecretKey

Optionally specify the AWS Secret Key.

Note: The Secret Key can only be set on the node if the Access Key is also set on the node. If the Access Key is set on the node, then the Secret Key must also be set.

RoleARN

Optionally specify the Amazon Resource Name (ARN) of the role to assume when using cross account access using Instance Profiles.

If the RoleARN (and AccessKey) are not set then authentication will fall back to the AWS Default Credential Provider Chain. For more information, see https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html.

Note: This property cannot be set at the same time as the AccessKey and SecretKey properties.

RoleSessionName

Optionally specify the role session name used to uniquely identify a session when the same role is assumed by different principals or for different reasons.

In cross-account scenarios, the role session name is visible to, and can be logged by the account that owns the role.

Note: This property must be set if using the RoleARN property.

Region

Optionally specify the AWS region.

Note: These should be of the form "us-east-1" or similar.

If the Region is not set then the AWS default strategy will be used - see AWS documentation on DefaultAWSCredentialsProviderChain.

Endpoint

Optionally specify the URL endpoint used to connect to other system services. The default is amazonaws.com endpoint.

The URL should contain the protocol (http or https), the host, and the port. Depending on Region entered the default value is s3.us-east-2.amazonaws.com for region us-east-2.

FileName

Optionally specify the location of the file to store the retrieved object.

Choose the (from Field) variant of this property to look up the value from an input field with the name specified. The default value is the name of the object.

Directory

Specify the location of a directory as the root to store all retrieved objects.

A value is required for this property.

ProxyURL

Optionally specify the URL of the proxy server to use for requests. The URL should contain the protocol (http or https), the host, and the port. All other parts of the URL will be ignored.

The URL should be in a format similar to the following:

HTTP proxy: http://proxy.example.com:1234

HTTPS proxy: https://proxy.example.com:443

ProxyUsername

Optionally specify the proxy username, where the proxy requires authentication.

This property should be left blank if ProxyURL is not set.

ProxyPassword

Optionally specify the proxy password, where the proxy requires authentication.

This property should be left blank if ProxyURL is not set.

IgnoreSystemProxy

If proxy settings are not defined then the system proxy is automatically used. This optional property tells Data360 Analyze to ignore the system proxy when set to true.

The default value is False.

RangeStart

Optionally specify the initial byte offset of the contents to be retrieved. If this is set and the RangeEnd is not set then the file retrieved will be from this offset to the end.

RangeEnd

Optionally specify the last byte offset of the contents to be retrieved.

FailureBehavior

Optionally specify what to do when a file fails to download. Choose from:

  • Error - Report error and stop further processing.
  • Log - Log a warning message and skip the file.
  • Ignore - Skip the file.

The default value is Log.

Inputs and outputs

Inputs: 1 optional.

Outputs: downloaded files, errors.