Configuration parameters are specified in a file named sqdata_cloud.conf
located in the execution working directory.
Example Configuration
[cloudfs]
staging_dir = /tmp/s3
bucket = my-amazon-bucket
nb_workers = 2
Keyword and Parameter Descriptions
Keyword | Description |
---|---|
staging_dir | Temporary local directory for staging files before upload. Required due to the non-streaming nature of cloud storage. Default: Execution working directory Note: Must exist and be writable.
|
bucket | Default S3 bucket name if not specified in the datastore URL. Default: None |
nb_workers | Number of worker threads for uploading data to S3. Default: 1 |
hash_method | Hash algorithm for object integrity verification. Valid values: Default: None For more information, see Checking object integrity in Amazon S3. |
endpoint | Custom S3-compatible endpoint URL (e.g., for third-party storage services). Default: None |
Syntax
DATASTORE s3://[<bucket>]/<key>
OF JSON | AVRO | DELIMITED | TEXT
AS <alias_name>
DESCRIBED BY GROUP <group_name>
STAGING SIZE <n>G
STAGING DELAY <mmm>
Keyword and Parameter Descriptions
Keyword | Description |
---|---|
<bucket > |
Optional. If omitted, the value from |
<key> | Refers to the object key, which is a unique identifier for an object within a bucket. |
OF JSON | AVRO | DELIMITED | TEXT | Specifies the format of the S3 object. |
DESCRIBED BY GROUP <group name> |
Associates the datastore with a data group. |
STAGING SIZE <n>G | Rotate file after writing n gigabytes. To specify rotation after 4 Gigabytes have been written, you would specify STAGING SIZE 4G. |
STAGING DELAY <mmm> | Rotate file after mmm minutes. To specify rotation every hour (60 minutes), you would specify STAGING DELAY 60. |
If no
STAGING
keyword is specified, a single target file is continuously updated. When file rotation is enabled—based on size or time—Amazon S3 object filenames are dynamically generated to ensure uniqueness and prevent overwriting.DATASTORE s3://<bucket>/foo/bar.ext -> s3://<bucket>/foo/bar_<unique>.ext
DATASTORE s3://<bucket>/foo/bar -> s3://<bucket>/foo/bar.<unique>
The <unique>
suffix is derived from the CDC capture LSN/SCN/RBA at the time the file is opened. If this information is unavailable, a timestamp in ISO format (e.g., 20200501184559
) is used.