Before you use CloudFSUtil to transfer your files, create a configuration file on your host storage system and set the properties specific to your envronment, including remote connection and authentication information.
The configuration file location must then be specified in the following environment variable: DMX_REMOTEFILE_INI_FILE=config_file_path
The configuration file contains sections for Databricks and Amazon S3 filesystems. The first line of the section is the remote system acronym, enclosed in brackets, followed by remote system-specific key value pairs, one pair per line. For example:
[remote_system_acronym]
key1=value1
key2=value2
Amazon s3 file system
Specify one or more sets of key=value pairs to authenticate the configuration file.
[s3]
- AWSACCESSKEYID=AWS_access_key_ID
- AWSACCESSKEY=AWS_secret_access_key
- AWSACCESSKEY_REPO=alias_to_access_key_stored_in_Connect_repository
Example:
AWSACCESSKEYID=BSTSBEGQ111JLOKC2C
AWSACCESSKEY=XXUhlABCnU5JmNo05GszQZpjxxxxxxxxxxxxxxxxxx
Note the following:
- When multiple types are specified, the AWS session token takes precedence. Access key authentication takes precedence over IAM role authentication.
- If no authentication is specified, the process assumes that the utility is running in an EC2 instance and retrieves access key information from the metadata service.
- If you do not want to provide sensitive information (AWSTOKEN or AWSACCESSKEY) in clear text, add the information to the Connect repository and specify the repository alias using the corresponding repository variables (AWSTOKEN_REPO or AWSACCESSKEY_REPO). See the Connect help topic “The DMExpress Repository Manager” on how to add sensitive strings to the Connect repository.
- AWS Storage credentials are optional when CloudFSUtil runs inside EC2.
As an alternative, specify the following:
- AWSTOKEN=AWS_session_token
- AWSACCESSKEYID=AWS_access_key_ID
- AWSACCESSKEY=AWS_secret_access_key
- AWSTOKEN_REPO=alias_to_temporary_token_stored_in_Connect_repository
For authentication with IAM roles, specify the following:
- AWSSAMLIDPPLUGIN=AWS_SAML_Identity_Provider_Plugin
- AWSIAMROLE=AWS_IAM_Role (optional)
Databricks file system
[dbfs]
- DBFSHOST=host_url
- DBFSTOKEN=token
Example:
DBFSHOST=https://xxx-123456798xxxxxxxxxxx.net
DBFSTOKEN=123456789xxxxxxxxxxxxxx
Azure Data Lake Storage (ADLS)
- AZURECLIENTID=azure_client_id
- AZURECLIENTSECRET=azure_clientsecret
- AZURETENANTID=azure_tenant_id
Example:
AZURECLIENTID="8437g845-.........-qpo024mn87we"
AZURECLIENTSECRET="secretAlias"
AZURETENANTID="p0ij458s-.........-w344ml09hnb5"