Configuration parameters are specified in a file named sqdata_cloud.conf
located in the execution working directory.
Example Configuration
[cloudfs]
staging_dir = /tmp/azure
bucket = my-container@my-account.blob.core.windows.net
nb_workers = 2
Keyword and Parameter Descriptions
Keyword | Description |
---|---|
staging_dir | Temporary local directory for staging files before upload. Default: Execution working directory. Note: Must exist and be writable.
|
bucket | Default Azure Blob account and container name if not specified in the datastore URL. Format: Default: None |
nb_workers | Number of worker threads for uploading data to Azure Blob Storage. Default: 1 |
Syntax
DATASTORE azure://[<container@account.blob.core.windows.net>]/<path>
OF JSON | AVRO
AS <alias_name>
DESCRIBED BY GROUP <group_name>
STAGING SIZE <n>G
STAGING DELAY <mmm>
Keyword and Parameter Descriptions
Keyword | Description |
---|---|
<container@account.blob.core.windows.net> | Optional. If omitted, the value from sqdata_cloud.conf is used. |
<path> |
The Azure Blob path (file name). |
OF JSON | AVRO | Specifies the format of the blob object. |
DESCRIBED BY GROUP <group name> | Associates the datastore with a data group. |
STAGING SIZE <n>G | Rotate file after writing n gigabytes. To specify rotation after 4 Gigabytes have been written, you would specify STAGING SIZE 4G. |
STAGING DELAY <mmm> | Rotate file after mmm minutes. To specify rotation every hour (60 minutes), you would specify STAGING DELAY 60. |
If no STAGING
keyword is specified, a single target file is continuously updated.
When file rotation is enabled—based on size or time—Azure Blob filenames are dynamically generated to ensure uniqueness and prevent overwriting.
DATASTORE azure://<container@account.blob.core.windows.net>/foo/bar.ext ->azure://<container@account.blob.core.windows.net>/foo/bar_<unique>.ext
DATASTORE azure://<container@account.blob.core.windows.net>/foo/bar ->azure://<container@account.blob.core.windows.net>/foo/bar.<unique>
The <unique>
suffix is derived from the CDC capture LSN/SCN/RBA at the time the file is opened. If this information is unavailable, a timestamp in ISO format (e.g., 20200501184559
) is used.