To run Connect jobs in a distributed cluster, you need to specify Databricks configuration information in a Connect execution profile file. This JSON file allows both JDBC (tables) and DBFS (remote file system) connections access to the Databricks cluster.
To enable job or task options defined in a specific execution file (not a global or user setting), provide the filename as a parameter when running a Connect job from the command prompt.
},
"databricks":{
"deploymentConfigurations": [
{
"name":<alias>,
"token":<token_string>,
"tokenAlias":<alias>,
"tokenRepository":<repository>,
"host":<url>,
"workDirectory":<dbfs_path>,
"clusterID":<unique_cluster_ID>,
"clusterConfig":<filename>,
"extraOptions": [
{
"option":<option_name>,
"arguments":<text_array>,
},
Where:
Parameter | Description |
---|---|
name (required) |
An alias to this definition used in connection definition or command line execution. This alias must be included in the Spark Master URL to use a Databricks File System (DBFS) connection. |
token | Is required if tokenAlias has no value; a unique user token string used for TLS authentication. The default value is no value and must not be set to set a tokenAlias value. |
tokenAlias |
Is required if token has no value; an alias for the token in the token repository. The default value is no value and must not be set to set a token value. |
tokenRepository | Is optional. Either Built-in or CyberArk. The default is Built-in. |
host | Is required; the URL of the Databricks endpoint to which Connect connects and sends API requests. |
workDirectory |
Is required; DBFS path for Connect to save and retrieve jobs, tasks, and configuration material. workDirectory is independent of the WORKTABLEDIRECTORY parameter set in the Connect JDBC configuration file. The workDirectory value can be the same or different from the WORKTABLEDIRECTORY JDBC parameter value. |
clusterID |
Is required if clusterConfig has no value; a unique ID for the DBFS cluster. The default value is no value and must not be set to set a clusterConfig value. |
clusterConfig | Is required if clusterID has no value; the path and name of a file containing a Databricks cluster definition in the format used for the Databricks Jobs API. Refer to the Databricks API reference for cluster definition details. The default value is no value and must not be set to set a clusterID value. |
option |
Is optional. The name of an option, such as Spark configuration options. |
arguments |
An array of text values to assign to an option. |
Example:
}
"databricks": {
"deploymentConfigurations": [
{
"name": " DatabricksInteractiveClusterConfig",
"tokenAlias": "DatabricksTokenAlias",
"host": "https://westus.azuredatabricks.net",
"workDirectory": "dbfs:/azureblob/",
"clusterID": "1234-567890-rood123",
},
{
"name": "DatabricksJobClusterConfig",
"tokenAlias": "DatabricksCyberArkTokenAlias",
"tokenRepository": "CyberArk",
"host": "https://westus.azuredatabricks.net",
"workDirectory": "dbfs:/azureblob/",
"clusterConfig": "c:\tmp\create_cluster.json"
},
{
"name": "AnotherDatabricksInteractiveCluster",
"host": "https://adb-123456789012345.71.azuredatabricks.net",
"token": "dapi1123bg4567bb8901f23456c7890h12a3",
"workDirectory": "dbfs:/azureblob/",
"clusterID": "0111-311131-oka45",
"extraOptions":
}
]
}