Lists files on an Azure Datalake Storage server.
Azure Datalake Storage nodes enable you to access data lakes on Azure storage, so that you can integrate your data flows accordingly. See:
To create a list of files in an Azure Datalake Storage location:
-
Enter the RemotePath to the location you want to interrogate.
If you want to list files from multiple Azure Datalake Storage locations, use the (from Field) variant of the RemotePath property, to point to an input field that references the Azure locations.
- Provide your Azure AccountName, together with the AccountKey, or the following properties combined, in the relevant field/s:
- ClientID, together with the ClientSecret and the TenantID.
As well as being able to provide a simple list of files within your Azure Data Lake, the Azure DataLake Storage List node allows you to inspect the file contents using the MetadataMode property.
Properties
FileSystem
Specify the file system of the Azure Datalake Storage.
A value is required for this property.
RemotePath
Specify the path to the Azure Datalake Storage objects.
A value is required for this property.
Recurse
Optionally specify whether to recursively enumerate the files under RemotePath.
The default value is False.
AccountName
Specify the Azure Account Name.
A value is required for this property.
One of the following should be entered:
-
AccountKey
The Azure Secret Key.
Or the combination of:
-
ClientID
The Client ID for the registered app.
-
ClientSecret
The Client Secret for the registered app.
-
TenantID
The Tenant ID (directory) for the registered app.
ListSubdirectories
Optionally specify a list of patterns matching subdirectories which are to be included in the output.
Each subdirectory pattern must be entered on a separate line.
This property only has an effect when the Recurse property is set to True.
By default, all files are listed.
ExcludeSubdirectories
Optionally specify a list of patterns matching subdirectories which are to be excluded from the output.
Each subdirectory pattern must be entered on a separate line.
This property only has an effect when the Recurse property is set to True.
By default, no directories are excluded.
IncludeFilePatterns
Files will only be reported by the Azure Datalake Storage List node which match the pattern entered and do not match any of the ExcludeFilePatterns patterns.
Optionally specify a list of patterns matching filenames which are to be included in the output.
Each filename pattern must be entered on a separate line.
By default, all files are included.
ExcludeFilePatterns
Optionally specify a list of patterns matching filenames which are to be excluded from the output.
Each filename pattern must be entered on a separate line.
By default, no files are excluded.
FilterPatternType
Optionally specify how to treat strings representing patterns entered in properties ListSubdirectories, ExcludeSubdirectories, IncludeFilePatterns or ExcludeFilePatterns in Filtering group properties.
Choose from:
- Glob Pattern - A glob pattern to match against the field(s) specified in Filtering group properties.
- Regular Expression - A regular expression to match against the field(s) specified in Filtering group properties.
The default value is Glob Pattern.
CaseInsensitivePatterns
Optionally specify whether to use case-insensitive or case-sensitive pattern matching.
The default value is False.
MetadataMode
Optionally specify to what extent additional file information will be retrieved.
Choose from:
- Minimal - Standard file metadata excluding creation date and user defined metadata.
- Basic - Standard file metadata.
- Identification - Additional media type outputted describing each file, based on media types.
- Structure - Detailed structure and other file metadata output on the "file metadata" pin.
The default value is Basic.
Due to the amount of communication with the Azure Data Lake Storage system, the choice of Minimal for the MetadataMode will result in the node executing significantly faster.
Note that for the Identification and Structure choices, the node needs to process the actual file content to determine for example CSV headers and parquet metadata.
FailureBehavior
Optionally specify what to do when the request fails. Choose from:
- Error - Report error and stop further processing.
- Log - Log a warning message and skip the file.
- Ignore - Skip the file.
The default value is Log.
Inputs and outputs
Inputs: 1 optional.
Outputs: listed files, errors.