Job initiator scripts - Connect_ETL - 9.13

Connect ETL Vertica Load Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL Vertica Load Guide
Copyright
2023
First publish date
2003
ft:lastEdition
2023-09-11
ft:lastPublication
2023-09-11T19:09:11.694437

The job initiator script is typically initiated as a command-line executable from a scheduling subsystem, such as Control-M. You can also initiate the script from a Connect ETL custom task.

Vertica recommends partitioning a large input file into multiple input files, which can be loaded in parallel across multiple Vertica initiator nodes. For multi-terabyte tables, for example, distribute the full load across file sizes of 250-500 gigabytes. A 10 terabyte fact table would require 20-40 load files to maintain performance.

Depending on the number and size of the input data files to be loaded, two job initiator scripts are available for use:

  • runLargeFileLoader.sh
  • runMultiFileLoader.sh

RunLargeFileLoader Job Initiator Script

When the Extract, Transformation, and Load (ETL) project consists of one large input file that must be loaded to a Vertica database, initiate the Connect ETL runLargeFileLoader job initiator script.

The runLargeFileLoader job initiator script, runs the configuration file, LargeFileLoader.env, submits a Connect ETL partition job, which partitions the input and places the partitions on named pipes. For each partition, the runLargeFileLoader job initiator script submits an instance of the Connect ETL transformation and load job that runs in parallel with the other instances of the Connect ETL transformation and load job, calls the Connect ETL transformation task which, converts and reformats the data on the named-pipes into standard input, and calls the custom load task, which initiates the database load script, verticaLoad.sh.

RunMultiFileLoader Job Initiator Script

When the ETL project consists of multiple small input files that must be loaded to a Vertica database, initiate the Connect ETL runMultiFileLoader job initiator script.

The runMultiFileLoader job initiator script, submits multiple copies of the Connect ETL transformation and load job, which read multiple input files and copies them to named pipes. Each named pipe is read, transformed, and loaded by calling the custom task, which initiates the database load script.