Azure Datalake Storage List - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Lists files on an Azure Datalake Storage server.

Azure Datalake Storage nodes enable you to access data lakes on Azure storage, so that you can integrate your data flows accordingly. See:

To create a list of files in an Azure Datalake Storage location:

  1. Enter the RemotePath to the location you want to interrogate.

    If you want to list files from multiple Azure Datalake Storage locations, use the (from Field) variant of the RemotePath property, to point to an input field that references the Azure locations.

  2. Provide your Azure AccountName, together with the AccountKey, or the following properties combined, in the relevant field/s:
    • ClientID, together with the ClientSecret and the TenantID.
Tip: For additional information on Azure Datalake Storage, see the Microsoft Azure online documentation.

As well as being able to provide a simple list of files within your Azure Data Lake, the Azure DataLake Storage List node allows you to inspect the file contents using the MetadataMode property.

Properties

FileSystem

Specify the file system of the Azure Datalake Storage.

A value is required for this property.

RemotePath

Specify the path to the Azure Datalake Storage objects.

A value is required for this property.

Recurse

Optionally specify whether to recursively enumerate the files under RemotePath.

The default value is False.

AccountName

Specify the Azure Account Name.

A value is required for this property.

One of the following should be entered:

  • AccountKey

    The Azure Secret Key.

Or the combination of:

  • ClientID

    The Client ID for the registered app.

  • ClientSecret

    The Client Secret for the registered app.

  • TenantID

    The Tenant ID (directory) for the registered app.

ListSubdirectories

Optionally specify a list of patterns matching subdirectories which are to be included in the output.

Each subdirectory pattern must be entered on a separate line.

This property only has an effect when the Recurse property is set to True.

By default, all files are listed.

ExcludeSubdirectories

Optionally specify a list of patterns matching subdirectories which are to be excluded from the output.

Each subdirectory pattern must be entered on a separate line.

This property only has an effect when the Recurse property is set to True.

By default, no directories are excluded.

IncludeFilePatterns

Files will only be reported by the Azure Datalake Storage List node which match the pattern entered and do not match any of the ExcludeFilePatterns patterns.

Optionally specify a list of patterns matching filenames which are to be included in the output.

Each filename pattern must be entered on a separate line.

By default, all files are included.

ExcludeFilePatterns

Optionally specify a list of patterns matching filenames which are to be excluded from the output.

Each filename pattern must be entered on a separate line.

By default, no files are excluded.

FilterPatternType

Optionally specify how to treat strings representing patterns entered in properties ListSubdirectories, ExcludeSubdirectories, IncludeFilePatterns or ExcludeFilePatterns in Filtering group properties.

Choose from:

  • Glob Pattern - A glob pattern to match against the field(s) specified in Filtering group properties.
  • Regular Expression - A regular expression to match against the field(s) specified in Filtering group properties.

The default value is Glob Pattern.

CaseInsensitivePatterns

Optionally specify whether to use case-insensitive or case-sensitive pattern matching.

The default value is False.

MetadataMode

Optionally specify to what extent additional file information will be retrieved.

Choose from:

  • Minimal - Standard file metadata excluding creation date and user defined metadata.
  • Basic - Standard file metadata.
  • Identification - Additional media type outputted describing each file, based on media types.
  • Structure - Detailed structure and other file metadata output on the "file metadata" pin.

The default value is Basic.

Due to the amount of communication with the Azure Data Lake Storage system, the choice of Minimal for the MetadataMode will result in the node executing significantly faster.

Note that for the Identification and Structure choices, the node needs to process the actual file content to determine for example CSV headers and parquet metadata.

FailureBehavior

Optionally specify what to do when the request fails. Choose from:

  • Error - Report error and stop further processing.
  • Log - Log a warning message and skip the file.
  • Ignore - Skip the file.

The default value is Log.

Inputs and outputs

Inputs: 1 optional.

Outputs: listed files, errors.