Work with the Connect Execution Profile File - Connect_ETL - 9.13

Connect ETL Installation Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Installation Guide
Copyright
2024
First publish date
2003
Last updated
2024-11-08
Published on
2024-11-08T16:36:35.232000

To run Connect jobs in a distributed cluster, you need to specify Databricks configuration information in a Connect execution profile file. This JSON file allows both JDBC (tables) and DBFS (remote file system) connections access to the Databricks cluster.

To enable job or task options defined in a specific execution file (not a global or user setting), provide the filename as a parameter when running a Connect job from the command prompt.

The execution profile file has the following case sensitive JSON format for Databricks:
},
"databricks":{
"deploymentConfigurations": [
{
"name":<alias>,
"token":<token_string>,
"tokenAlias":<alias>,
"tokenRepository":<repository>,
"host":<url>,
"workDirectory":<dbfs_path>,
"clusterID":<unique_cluster_ID>,
"clusterConfig":<filename>,
"extraOptions": [
{
"option":<option_name>,
"arguments":<text_array>,
},

Where:

Parameter Description
name (required)

An alias to this definition used in connection definition or command line execution. This alias must be included in the Spark Master URL to use a Databricks File System (DBFS) connection.

token Is required if tokenAlias has no value; a unique user token string used for TLS authentication. The default value is no value and must not be set to set a tokenAlias value.
tokenAlias

Is required if token has no value; an alias for the token in the token repository. The default value is no value and must not be set to set a token value.

tokenRepository Is optional. Either Built-in or CyberArk. The default is Built-in.
host Is required; the URL of the Databricks endpoint to which Connect connects and sends API requests.
workDirectory

Is required; DBFS path for Connect to save and retrieve jobs, tasks, and configuration material. workDirectory is independent of the WORKTABLEDIRECTORY parameter set in the Connect JDBC configuration file. The workDirectory value can be the same or different from the WORKTABLEDIRECTORY JDBC parameter value.

clusterID

Is required if clusterConfig has no value; a unique ID for the DBFS cluster. The default value is no value and must not be set to set a clusterConfig value.

clusterConfig Is required if clusterID has no value; the path and name of a file containing a Databricks cluster definition in the format used for the Databricks Jobs API. Refer to the Databricks API reference for cluster definition details. The default value is no value and must not be set to set a clusterID value.
option

Is optional. The name of an option, such as Spark configuration options.

arguments

An array of text values to assign to an option.

Example:

In the following Databricks example, the first and third cluster deployment sections are interactive clusters, and the second example is a job cluster.
}
"databricks": {
"deploymentConfigurations": [
{
"name": " DatabricksInteractiveClusterConfig",
"tokenAlias": "DatabricksTokenAlias",
"host": "https://westus.azuredatabricks.net",
"workDirectory": "dbfs:/azureblob/",
"clusterID": "1234-567890-rood123",
},
{
"name": "DatabricksJobClusterConfig",
"tokenAlias": "DatabricksCyberArkTokenAlias",
"tokenRepository": "CyberArk",
"host": "https://westus.azuredatabricks.net",
"workDirectory": "dbfs:/azureblob/",
"clusterConfig": "c:\tmp\create_cluster.json"
},
{
"name": "AnotherDatabricksInteractiveCluster",
"host": "https://adb-123456789012345.71.azuredatabricks.net",
"token": "dapi1123bg4567bb8901f23456c7890h12a3",
"workDirectory": "dbfs:/azureblob/",
"clusterID": "0111-311131-oka45",
"extraOptions":
}
]
}