New functionality - Connect_ETL - Latest

Connect ETL Release Notes

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
Latest
ft:locale
en-US
Product name
Connect ETL
ft:title
Connect ETL Release Notes
Copyright
2025
First publish date
2003
ft:lastEdition
2025-10-23
ft:lastPublication
2025-10-23T15:59:36.693620

The following functionality has been added since Release 9.0 of Connect. For the most complete information on the product, refer to the Connect Help via the Task Editor or Job Editor or directly from the Connect ETL/Connect for Big Data programs folder.

Release New Feature
9.13.19 Initial support for additional code pages in DMXMFSRT/DMXMMSRT. In addition to the existing MFCODESET support for code pages 0037 and 0500, the following code pages are now also supported for source/target data and JCL text constants: 0273, 0277, 0278, 0280, 0284, 0285, 0297, and 0871. If MFCODESET is set to 1145, and an additional environment variable DMX_MF_FORCE_0284_FOR_1145 is set to any value, code page 0284 will be used by DMXMFSRT, but will allow any fallback to MFJSORT to use 1145. The only difference between code page 0284 and 1145 is the replacement of the generic currency character ¤ with the euro currency character €.
9.13.18

Support for higher precision arithmetic expressions with only constants. When the environment value DMX_ARITHMETIC_HIGH_PRECISION is set to any value other than AUTO, a quad precision floating point number will also be used as the intermediate data type instead of a double precision floating point number for arithmetic binary operations (+-*/) with constant operands. This previously only affected arithmetic expressions with fields or a combination of fields and constants. This will provide 33 significant digits of precision instead of 15, but there will be an impact on performance. If DMX_ARITHMETIC_HIGH_PRECISION is set to AUTO, the performance of those binary arithmetic operations will be improved by using a 64-bit or 128-bit integer instead of a floating point number as the intermediate data type for decimal/integer operands when possible.

9.13.17 Verify sequence for join task pre-sorted sources. Join tasks with pre-sorted sources can now specify the /CHECKSEQUENCE option in SDK syntax, or the VERIFYORDER keyword in the /TASKTYPE JOIN PRESORTED option in DTL syntax. The source file record sequence will be verified upon input to the join task. For more information, refer to /TASKTYPE.
Improved performance for high precision arithmetic operations (int64 conversions). Additional optimizations for converting 64-bit integers to decimal numbers should improve the performance of binary arithmetic operations with integer and decimal operands when DMX_ARITHMETIC_HIGH_PRECISION is set to AUTO.
9.13.6 Improved performance for high precision arithmetic operations (int64 for small numbers). When the environment variable DMX_ARITHMETIC_HIGH_PRECISION is set to AUTO, a 64-bit integer will be used as the intermediate data type for arithmetic operations with smaller integer and decimal operands when possible for improved performance.
Improved performance for high precision arithmetic operations (fallback triggers). When the environment variable DMX_ARITHMETIC_HIGH_PRECISION is set to AUTO, the detection for using a faster 64-bit integer as the intermediate data type for arithmetic operations has improved slightly for multiplication, and significantly for integer division when the environment variable DMX_ARITHMETIC_HIGH_PRECISION_DIV_INT is also set to any value. The detection should also be less sensitive to locale. In addition, a performance issue with the addition of decimal operands when DTL syntax is used has been resolved.
9.13.15

Task Editor support for variable record length with little-endian prefix. The Task Editor now supports a new file type "Variable record length with 2-byte record prefix (little endian)" and record format "Variable length with 2-byte prefix (little endian)" in the corresponding source and target dialogs.

Note: The existing file type "Variable record length with 2-byte record prefix" and record format "Variable length with 2-byte prefix" have been renamed to be more specific, to "Variable record length with 2-byte record prefix (big endian)" and "Variable length with 2-byte prefix (big endian)".

Mainframe variable maximum record length increased. The maximum record length for the "Mainframe variable record length" file type (MAINFRAME VARIABLE in DTL, MAINFRAMEUNBLOCKEDVARIABLE in SDK) has been increased from 32756 to 32760 bytes. Tasks created using the Task Editor will need to be resaved in order for the new limit to take effect.

9.13.13 Connect Server GUI password repository support. Passwords stored in a repository (e.g. CyberArk) can be used to connect to Connect ETL servers running on Windows or *nix. This requires some local configuration (execution profile, client certificates) on the Windows machine where the GUI is run. For details, see the Connect online help topics Connect Server Connection dialog, Execution profile file, and The Connect Repository Manager.

Improved performance for high precision arithmetic operations. When the environment variable DMX_ARITHMETIC_HIGH_PRECISION is set to AUTO, high precision integers will be used instead of high precision floating point as the intermediate data type when possible. This mainly affects addition, subtraction, and multiplication arithmetic operations with decimal and binary integer data type operands, including integer constants.

9.13.12 Improvements to mainframe IAM file detection.
Improvements for Windows 11 support.
9.13.10 Salesforce QueryAll support. Data extracted from Salesforce.com objects can now include deleted records. See the Connect ETL help topic, "Source Salesforce.com Object dialog", or the Connect ETL Data Transformation Language (DTL) Guide command options "/SFDCINPUT, /SFDCDATA", for more information.
9.13.8 Remote VSAM file enhancements. IAM file sizes can now be properly detected. Also, additional parameters TARGET_DD_UNIT, JOB_CLEANUP, and LOG_CLEANUP are now available for the DMX_VSAM_PARAMETERS environment variable to customize the JCL and help with debugging issues.
9.13.7 Initial support for higher precision arithmetic expressions. When the environment value DMX_ARITHMETIC_HIGH_PRECISION is set to any value, Connect ETL will use a quad precision floating point instead of a double precision floating point as the internal intermediate format for arithmetic binary operations (+-*/) and select functions (Abs, Mod). This will provide 33 significant digits of precision instead of 15, but there will be some impact to performance.
9.13.6 Target header enhancement. Target headers can now be generated even when the source is empty. See the Connect ETL Data Transformation Language (DTL) Guide /OUTFILE HEADER ALWAYS keyword or the Target File dialog help topic for more information.
9.13.6 Optional warning for possible precision loss. When the environment value DMX_ARITHMETIC_PRECISION_WARNINGS is set to any value, Connect ETL will issue a ARITHOPPLOSS warning for operands in arithmetic expressions, or a FUNCARGPLOSS warning for arguments in certain functions, that may lose precision when converted internally to a double, which supports a maximum of 15 significant digits.
9.13.2

Additional SFTP key exchange support. Connect ETL now supports the following key exchanges for SFTP: diffie-hellman-group14-sha256, diffie-hellman-group16-sha512, and diffie-hellman-group18-sha512

9.13.1 Access Databricks clusters using Azure Active Directory (AD). Precisely Connect ETL now supports Databricks cluster access using Azure AD tokens when deploying jobs from an Azure VM.
9.12.3

Launch of Connect ETL on Databricks for Azure Cloud. Precisely Connect ETL now supports customers who want to leverage the analytics capabilities of Databricks for Azure Cloud without recalibrating their ETL pipelines. With this release, Connect ETL customers can move their existing ETL jobs to Databricks easily by running data processing on Databricks under the Intelligent Execution (IX) layer and then moving their current non-Databricks ETL workloads to Databricks clusters.

9.12.3

Improved integration with Azure Cloud services. Precisely Connect continues to improve integration with the Azure cloud platform. This release provides integration with Azure Key Vault; customers can manage their secrets in Azure multi-tenant environments, and Connect ETL can securely access secrets from the Vault on-the-fly to establish data connections. This release adds support for Azure Active Directory (AD) authentication, enabling Connect ETL jobs to use Azure AD for authentication.

Connect ETL now provides direct support for Azure storage which enables customers to read/write to Azure storage from on-prem, hybrid, and on-cloud environments.

9.12.3

Connect ETL/Connect for Big Data now supports the ability to configure the degree of parallelism to be passed to the Oracle merge query via a parallel hint, if using the update or update-and-insert target table dispositions.By default, Connect maintains existing behavior and constructs the PARALLEL hint such that the Oracle database server determines the degree of parallelism to use to execute the query. To configure the PARALLEL hint syntax generated by Connect, and therefore the parallelism used by Oracle to execute the merge query, set DMX_ORACLE_PARALLEL_HINT_VALUE=n,

where n can have the following values:

If n > 0, merge queries for Oracle target tables will be generated with the syntax for a PARALLEL hint with n degrees of parallelism

If n = 0, merge queries for Oracle target tables will be generated with the syntax for the NOPARALLEL hint

If n < 0, merge queries for Oracle target tables will be generated without any Oracle hints

9.12.1 Generate target header layout when using composite fields in target reformat.
9.10.33 DataFunnel support for PostgreSQL. DataFunnel supports PostgreSQL as a source via ODBC. See the Connect help topics, "Connect DataFunnel" and "DMXDFNL configuration file."
9.9.27

Protegrity Data Security Gateway. Connect ETL/Connect for Big Data now supports protecting data using API calls to Protegrity Data Security Gateway. See the Connect ETL help topic, "Protect function", or Install Guide topic, "Connecting to Protegrity Data Security Gateway".

9.9.27

Db2 for Linux, UNIX, and Windows data source support in Precisely Connect Portal. Connect Portal supports Db2 for Linux, UNIX, and Windows (LUW) JDBC data source connections to replicate your captured data changes to Apache Kafka targets. For more information, see the Connect Portal help topic, "Adding New Data Connections."

9.9.27 Enhanced connectivity to Amazon S3. Connect ETL retries failed REST API calls to Amazon S3 using an exponential back-off algorithm.
9.9.24

Databricks databases. Through JDBC connectivity, Connect for Big Data supports Databricks databases as sources and targets for all dispositions. See the Connect ETL help topic, "Connecting to Databricks."

9.9.20

Db2 for IBM i data source support in Precisely Connect Portal. Connect Portal supports Db2 for IBM i JDBC data source connections to replicate your captured data changes to Apache Kafka targets. For more information, see the Connect Portal help topic, "Adding New Data Connections."

9.9.20 Change Data Capture on z/OS Installation. The installation process for installing Connect CDC on z/OS platforms has been enhanced to improve both usability and reliability.
9.9.14

Abort on Rejected Records and ENFORCELENGTH parameter for Vertica targets. The "Abort if any record is rejected" parameter configures Connect ETL to abort tasks that update Vertica targets upon rejecting a record. The default behavior to process remaining records remains unchanged. A new Target Database Table parameter Vertica COPY ENFORCELENGTH enables the Vertica COPY command ENFORCELENGTH option when loading Vertica targets. Please see the Connect ETL on-line help topic "Connecting to Vertica Databases" for details.

9.9.14

CyberArk Enterprise Password Vault. Connect ETL/Connect for Big Data now supports retrieving passwords stored in a CyberArk vault to authenticate connections to servers, databases, message queues, and remote files. See the Connect ETL help topic, "Task Editor", or Install Guide topic, "Connecting to CyberArk Enterprise Password Vault".

9.9.14

Azure Synapse Analytics (formerly SQL Data Warehouse). Through JDBC connectivity, Connect for Big Data supports Azure Synapse Analytics as sources and targets for all dispositions. See the Connect ETL help topic, "Connecting to Azure Synapse Analytics."

9.9.5 Connect ETL now supports new SFTP encryption algorithms such as hmac-sha2-512 and hmac-sha2-256.
9.9.4

Snowflake databases. Through JDBC connectivity, Connect for Big Data supports Snowflake databases as sources and targets for all dispositions. See the Connect ETL help topic, "Connecting to Snowflake."

9.8.7
Precisely Connect Portal. Precisely Connect Portal (formerly Connect Portal UI) is a secure, browser-based application that features change data capture (CDC), replication, and bulk data copy functionality. Connect Portal supports your data integration strategies by:
  • Using log-based capture to determine when data has changed from your Oracle, Db2 for z/OS, and SQL Server data sources, automatically capturing the data changes in real time, then replicating (moving) the changed data downstream to Apache Kafka distributed messaging layer. This helps keep your data set fresh without losing data while your enterprise data processes continue uninterrupted. SQL Server databases also support trigger-based data capture.

  • Collectively extracting, filtering, and copying large volumes of data from database sources to target database management systems (DBMS), file systems on-premise and in the cloud (including HDFS, Amazon S3, and local), and Apache Kafka distributed streams. Run data transfer jobs locally or on a Hadoop-determined node in the cluster.

    Use the Connect Portal to:
    • Share application data across your enterprise "from mainframes to the cloud." Transfer all source data (hundreds of tables at a time) or filter the data to exclude or include a subset of table columns and rows.

    • Simplify bulk data transfers and populate the data lake from hundreds of tables by automatically creating the target schemas and mapping fields.
    • Add customized data copy (ETL) or replicate (CDC) tasks, known as data flows.
    • Manage the data connections, data flows, metabases, and servers used in your CDC and ETL projects.
    • Monitor the status of replication activities and copy jobs.
    • Export a JSON configuration file to run data flows from the command line. This automatically generates corresponding Connect jobs and tasks that can be run in the Connect ETL and Connect for Big Data applications.

Connect Portal supports connecting to a variety of databases and environments, including Hive, Oracle, Apache Kafka, IBM Db2 for z/OS, Netezza, Microsoft SQL Server, Teradata, HDFS, S3, Amazon Redshift, IBM Db2 for Linux, UNIX, and Windows, and local. Connect Portal supports the following browsers:
  • Firefox
  • Google Chrome
  • Internet Explorer versions 10 and 11
  • Microsoft Edge

For more information about how to use Connect Portal, see the Connect Portal help available from the application. To get started with Precisely Connect Portal, you need to install Connect ETL and MIMIX Share (CDC).

9.8.7
Connect ETL Data Lineage REST API (first customer shipment): The Connect ETL REST API provides web-based request support for Data Lineage metadata published to the Connect ETL management server. REST resources provide access over HTTP to the following data lineage and related resources in JSON format:
  • Job Status
  • Logging
  • Lineage data
9.8.7

Oracle and SQL Server targets through JDBC connections supported in Hadoop Single Node Execution - Connect ETL offers the ability to move data from Hadoop distributed file system and database sources to Oracle and SQL Server targets in a Single Node Execution Connect ETL job. Please see the on-line help topics, Defining Connect for Big Data ETL Sources and Targets, Connecting to JDBC Sources and Targets, and Running a job from the command prompt for more information.

9.8 Add support for brokers in the Kafka connection. Please refer to the section "Kafka connections" of the "Apache Kafka" documentation.
9.8 Custom installation location for Ambari packages. Connect for Big Data can be installed under a custom directory prefix when installing via an Ambari package. Refer to the Installation Guide.
9.8 Apply license keys while Connect is running. It is no longer necessary to stop any running Connect ETL/Connect for Big Data jobs or services to apply a new license key.
9.8

Separate software and license packages for Connect for Big Data. Previously, customer-specific Connect for Big Data installation packages with embedded license keys were generated for RPM, Cloudera parcel, and Ambari package installations. Now, the software and your license keys are provided in separate packages. This allows you to apply a new license key to all cluster nodes without reinstalling Connect for Big Data. Refer to the Installation Guide.

9.7.32

Abort on Rejected Records to Kafka and MapR streams. The new "Abort if any record is rejected" parameter configures Connect ETL to abort tasks that update Kafka targets upon rejecting a record. The default behavior to process remaining records remains unchanged.

9.7.32

Distributed Parallel Data Ingestion (First customer shipment). Connect Portal offers distributed parallel ingestion of large database sources for enabled jobs to Hadoop targets, which enables parallelized map jobs to any number of Hadoop targets (HDFS, Hive, or HCatalog). With distributed parallel ingestion, DMX automatically splits tables across mappers to optimize load performance based on table and cluster size. For more details, see the DMX help topic, "High-speed parallel ingestion of large database sources to Hadoop."

9.7.23

Connect ETL Change Data Capture for DB2 z/OS. Added READ authority information and missing SYSIBM table names for DB2 Catalog Tables in DB2GRANT JCL . Refer to the "Identify/Authorize z/ OS Users and Started Tasks" section in the DB2 z/OS Change Data Capture Reference.

9.7.23

Support for COBOL-IT line sequential files on SunOS SPARC 64-bit. Connect ETL/Connect for Big Data provides support for COBOL-IT line sequential files. See the Connect ETL/Connect for Big Data help topic, "Installing support for COBOL-IT."

9.7.23 Connect Portal enhancements. The Connect Portal configuration file now allows the following:
  • Checking whether a column is the first or last one in the source table, in the predicate of expression_construction transformation rules
  • Creating target files in subdirectories named for each source table
  • Customizing the file extension of target files
  • Customizing the file extension of exported DTL task files

See the Connect ETL/Connect for Big Data help topic, "DMXDFNL configuration file."

9.7.19

Runtime field-level data lineage. Connect ETL/Connect for Big Data offers a runtime data lineage solution that enables you to track field-level metadata from source to target through the Connect ETL/Connect for Big Data task and job processes that transform it. As part of your overall data governance strategy, you can leverage the generated lineage for debugging, auditing, and regulatory reporting. You can further extend these capabilities through end-to-end visualization by securely publishing the lineage to Cloudera Navigator as a compliance-ready solution. See the Connect ETL help topic, Connect ETL/Connect for Big Data runtime data lineage. Connect ETL/Connect for Big Data lineage is connected to all other data lineage in the cluster, which offers a full view of what is happening to the data within and without Connect ETL/Connect for Big Data.

9.7.3
Open System Support for Connect Change Data Capture (CDC) for z/OS. Connect CDC for z/OS synchronizes data changes between data sources and targets while your enterprise data processes continue with no impact on performance. Connect CDC for z/OS open system components add support for the following database sources for CDC synchronization:
  • Oracle and Oracle Real Application Clusters (RACs)
  • Microsoft SQL Server
  • IBM DB2 for i
  • IBM DB2 For Linux/UNIX/Windows (LUW)
  • IBM Informix
  • Sybase
9.7.3 GUID database datatypes. Connect ETL/Connect for Big Data supports GUID database datatypes, such as SQL Server uniqueidentifier and Access guid datatypes.
9.7

VSAM source support for Connect CDC for z/OS. In addition to DB2 for z/OS sources, Connect CDC for z/OS supports VSAM data sets as sources. Changes written to VSAM data sets can be captured and written to target files and applied to Hive target database tables through generated downstream Connect for Big Data tasks. See the Connect ETL/Connect for Big Data help topic, "Connect Change Data Capture."

9.6.22 Enhanced target support for Hive complex types. Hive target structure and array support is enhanced to include the following:
  • Additional array selection option, which enables mapping all elements of any array field within a chain of fields into separate fields.
  • Ability to create tables in which the hierarchy of composite fields is either flattened or maintained.

See the Connect help topics, "Array Element Selection dialog" and "Target Database Table dialog."

9.6.22

Apache Impala databases. Through JDBC connectivity, Connect for Big Data supports Impala databases as sources and targets when running on the ETL server/edge node, in the cluster, and on a framework-determined node in the cluster. See the Connect help topic, "Connecting to Impala."

9.6.22 Enhanced target support for Hive complex types. Hive target structure and array support is enhanced to include the following:
  • Additional array selection option, which enables mapping all elements of any array field within a chain of fields into separate fields.
  • Ability to create tables in which the hierarchy of composite fields is either flattened or maintained.

See the Connect help topics, "Array Element Selection dialog" and "Target Database Table dialog."

9.6.15

Connect Enterprise architecture (First customer shipment). In addition to the classic Connect client-server architecture in which Connect ETL/Connect for Big Data jobs and tasks are run, Connect ETL also supports an enterprise web-server architecture, which extends Connect ETL processing power and currently supports running Connect Portal projects through a browser-based user interface (UI).

At the core of the enterprise architecture is the Connect ETL central management server, which houses the management service, serves REST APIs, and coordinates all activities among client workstations, job run servers, and central file and database repositories:
  • Client workstations use your client web browser to submit Connect Portal projects through the Connect Portal UI.
  • Job run servers include the service for management service and the Connect ETL engine, through which Connect Portal projects are run.
  • The central file repository stores Connect Portal job logs and operational metadata; the central database repository stores Connect Portal job definitions and runtime connection.

See the Connect ETL help topic, "Connect ETL Enterprise architecture."

9.6.15

Connect Portal UI. Connect Portal UI is a secure, high-speed, browser-based application from which you collectively extract, filter, transfer, and copy large volumes of data from database sources to target database management systems (DBMS) or file systems (including HDFS, S3, and local).

As an extension of Connect ETL/, Connect Portal UI enables you to automatically run data transfers in bulk rather than manually creating dozens, hundreds, or even thousands of individual transfer tasks, one task for each table in the schema you want to transfer. Instead, Connect Portal UI gives you the tools to build collections of data transfer tasks called data flows. You group data flows in projects. When a project is executed, each customized data flow automatically transfers your data in bulk to a specified target database or file system.

Use Connect Portal UI to:
  • Transfer all source data (hundreds of tables at a time), or filter the data to exclude or include a sub-set of table columns and rows. Tables are automatically transferred in parallel at an appropriate level of parallelism.

  • Automatically create new Hive target tables based on equivalent source tables.

  • Build, manage, and maintain a centralized repository of reusable data connections, data flows, and projects.
  • Search and examine summary information about Connect Portal objects.
  • Export a JSON configuration file to execute data flows from the command line.
  • Connect Portal UI includes wizards that guide you step-by-step to:
    • Build unique, secure data connections to DBMS sources and target databases and file systems.
    • Configure how data is copied to target databases and file systems.
    • Run one or more projects at a time and enable job logging.
Connect Portal UI supports connecting to a variety of databases and environments, including Hive, Oracle, IBM Db2 for Linux, Unix, and Windows, IBM Db2 for z/OS, Netezza, Microsoft SQL Server, Teradata, HDFS, S3, Amazon Redshift, and local. Connect Portal UI supports the following browsers:
  • Internet Explorer versions 10 and 11
  • Google Chrome
  • Microsoft Edge

For more information about how to use Connect Portal UI, access the online help, which is available from the Help menu in Connect Portal UI.

9.6.15

Enhanced Connect ETL/Connect for Big Data installation. The Connect ETL/Connect for Big Data installation process includes component-based options, which enable standard, full, classic, and custom installations. From among these options, you can tailor the installation to meet your requirements; for example, you can install a client-server configuration for running Connect ETL/Connect for Big Data jobs and tasks through the Job and Task Editors, or you can install a web-server configuration for running Connect Portal projects through the Connect Portal UI, or you can install both. See the Connect ETL/Connect for Big Data help topics, "Connect ETL/Connect for Big Data installation component options" and "Connect ETL/Connect for Big Data Enterprise architecture."

9.6.11

Target support for Hive complex types. Connect for Big Data supports writing elementary, composite, and array fields from mainframe and open system database table and file sources to Hive target structures and arrays. See the Connect ETL/Connect for Big Data help topic, "Target Database Table dialog."

9.6.7

Hive merge statement support. Connect for Big Data provides Hive merge statement support through JDBC for Hive target database tables that are configured to support ACID and that are specified as having the upsert or update disposition. See the Connect ETL/Connect for Big Data help topic, "Hive merge statement support for upsert and update dispositions."

9.6.4

User-defined SQL as lookup source. In addition to supporting entire database tables as lookup sources, Connect ETL/Connect for Big Data supports user-defined SQL statements as lookup sources. See the Connect ETL/Connect for Big Data help topic, "Lookup Source Database Table dialog."

9.5

DMX Change Data Capture. DMX Change Data Capture (CDC) is a new add-on product that keeps data on a mainframe source system in sync with Hadoop without overloading networks and incurring a high MIPS cost on the mainframe. Connect CDC for z/OS detects changes in the source system and either applies or writes the changed data to a specified Hadoop target. Currently, DB2 for z/OS is supported as source and Hive and HDFS are supported as targets. See the Connect ETL/Connect for Big Data help topic, "DMX Change Data Capture."

9.4.28
Custom task enhancement. Along with the current extended task support, Connect for Big Data supports running custom tasks or groups of directly connected custom tasks on the cluster with all Connect for Big Data tasks of an IX job when the following conditions are met:
  • All Connect for Big Data tasks are cluster eligible.
  • Custom tasks have piped inputs from previous Connect for Big Data tasks and piped outputs to following Connect for Big Data tasks.

See the Connect ETL/Connect for Big Data help topic, "Developing Intelligent Execution Jobs."

9.4.28

Connect Portal support for Amazon S3 and Amazon Redshift. Connect Portal supports Amazon S3 as target and Amazon Redshift as source and target. See the Connect ETL/Connect for Big Data help topics, "Connect Portal," "DMXDFNL configuration file," and "Connections object: amazon_s3 member."

9.4.26

Enhanced Hive support. Connect for Big Data supports Hive database table lookup sources, the Boolean data type for Hive JDBC source and target connections, and updates applicable to the integration with Apache Ranger. See the Connect for Big Data help topics, "Connecting to Hive data warehouses," "Conversion between Hive data types and Connect for Big Data data types," and "Apache Sentry and Apache Ranger authorization."

9.4.26

Connect ETL/Connect for Big Data job progress monitoring and reporting. Support for real-time progress monitoring of jobs and their subjobs and tasks and report generation on this progress for jobs run in both Connect ETL and in MapReduce/Spark. See the Connect ETL/Connect for Big Data help topic, "Connect ETL/Connect for Big Data progress monitor."

9.4.26

Key break processing functions. Connect ETL/Connect for Big Data supports key break processing functions for sort, merge, and aggregate tasks. While cross-record processing can be achieved with user defined values, key break processing functions provide a simplified means of enabling cross-record processing with sort keys, merge keys, or group by fields, which are specified as arguments in these functions. Depending on the executed function, the returned value can represent changes in argument values or can be counters or running totals. See the Connect ETL/Connect for Big Data help topic, "Key break processing functions."

9.4.6

Support for Spark cluster deploy mode. Connect for Big Data supports running jobs on Spark in cluster deploy mode as well as in client deploy mode. See the Connect ETL/Connect for Big Data help topic, "Connect for Big Data Configuration Settings."

9.3.16

Access method support in Connect Portal, including Hive JDBC. Users can now specify the access method for each database connection in Connect Portal. This includes the JDBC access method for Hive connections, which is now recommended. See the definition of the "access_method" parameter in the Help topic "Connections object: database member".

9.3.16 External metadata support through Connect:Direct. You can now link to remote COBOL copybooks or other external metadata through Connect:Direct when designing a task in the Task Editor.
9.3.5

Hive target statistics. Upon loading to Hive target tables, Connect for Big Data can now analyze table-level and column-level statistics for the target table, which optimizes subsequent Hive query performance. See the Connect ETL/Connect for Big Data help topic, "Connecting to Hive data warehouses."

9.3.3

DTL metadata export. Connect ETL/Connect for Big Data supports the export of metadata, which includes values, conditions, layouts, connections, and collating sequences, to DTL command options files. See the Connect ETL/Connect for Big Data help topic, "Export Connect ETL/Connect for Big Data job and task files into DTL command options files."

9.3.1 Support for Spark 2.0. Connect for Big Data supports Spark 2.0 on YARN, Mesos, and standalone. For details, see the Connect ETL/Connect for Big Data help topic, "Connect for Big Data."
9.3.1

Google Cloud Storage sources and targets. Connect ETL/Connect for Big Data supports direct connections to Google Cloud Storage buckets for reading and writing. See the Connect ETL/Connect for Big Data help topic, "Connecting to Google Cloud Storage from Connect ETL/Connect for Big Data."

9.2.8

Ability to create target tables in Hive at design-time. It is now possible to create target tables in Hive at design-time for access via JDBC. This can be achieved using the "Create Database Table" dialog. See the Connect ETL/Connect for Big Data help topic, "Create Database Table dialog."

9.2.5

Record number in warning message. When a data issue is encountered in processing records, Connect ETL/Connect for Big Data provides you with the ability to view the corresponding record number, source number, and join side (if applicable) in the generated warning message. See the Connect ETL/Connect for Big Data help topic, "Connect ETL/Connect for Big Data debugging."

9.2.5

Distributed Hive sources and targets. Through JDBC connectivity, Connect for Big Data supports reading from Hive sources and writing to Hive targets in the cluster. In particular, this allows Parquetbacked Hive tables, which could only be read and written on the edge node previously, to be read and written by jobs run in the cluster. See the Connect ETL/Connect for Big Data help topic, "Connecting to Hive data warehouses."

9.2.2

Integrated workflow. Users can now specify where to run individual task and subjob components of a Connect for Big Data job - on a cluster, single node of a cluster, or single server. This orchestration capability allows a single job to perform both data ingestion from the edge node and distributed processing in Spark and MapReduce. See "Integrated workflow" in the Connect ETL/Connect for Big Data help.

9.2.1

Support for COBOL-IT line sequential files. Connect ETL/Connect for Big Data provides support for COBOL-IT line sequential files. See the Connect ETL/Connect for Big Data help topic, "Installing support for COBOL-IT."

9.1.15

Connect Portal (DMXDFNL) table creation support. Leveraging customizable metadata translation, DMXDFNL can now automatically create tables for Hive/HCatalog targets based on source tables from the following databases: Oracle, DB2 for z/OS, Netezza, Microsoft SQL Server, and Teradata. Generated table definitions (DDL/CREATE TABLE statements) can be exported to files on disk using the ddl_export_dir configuration parameter. See the Connect ETL/Connect for Big Data help topics: "Connect Portal" and "Subunit alias object: table_creation member."

9.1.15

DMXDFNL inflight transformation support. DMXDFNL can now map sources to targets, transform columns based upon their properties, and map source expressions to target fields dynamically. Transformation rules, which are referenced in the data transfer task, are programmatically defined at the root level of the DMXDFNL configuration file. See the Connect ETL/Connect for Big Data help topics: "Connect Portal," "DMXDFNL configuration file," and "Subunit alias object: data_transfer member."

9.1.15 DMXDFNL enhancements. DMXDFNL enhancements include the following:
  • When output to a Linux/UNIX terminal, log messages now display in color: errors display in red and warnings display in yellow.
  • Tasks no longer time out by default. A timeout can be specified globally with the root-level default_timeout_minutes configuration parameter or at the subunit_order level with the timeout_minutes parameter.

  • The no_execute configuration parameter now supports disabling the execution of one or more task types.
  • DMXDFNL configuration file updates include the following:
    • Tasks are separated into table_creation, data_transfer, and existing_dtl categories in each subunit.
    • Sources and targets are defined at the root level.

The previous configuration styles are deprecated, but remain supported. See the Connect ETL/Connect for Big Data help topic, "DMXDFNL configuration file."

9.1.4

Google Cloud Storage sources and targets. Connect ETL/Connect for Big Data supports reading from and writing to Google Cloud Storage buckets in and out of Hadoop. See the Connect ETL/Connect for Big Data help topic "Connecting to Google Cloud Storage from Connect ETL/Connect for Big Data."

9.0.10

MapR Streams sources and targets (GA). Connect for Big Data supports MapR Streams as a message queue source when running from an edge or single cluster node, and as a message queue target when running from an edge node, single cluster node, or MapR cluster. Kerberos is supported for connectivity to MapR Streams. See "MapR Streams" in the Connect ETL/Connect for Big Data help.

9.0.10

DB2 ODBC driver support for Windows 32-bit. Connect ETL/Connect for Big Data now provides DB2 ODBC driver support for Windows 32-bit installations in addition to the support for UNIX/Linux installations. See the Connect ETL/Connect for Big Data help topic, "Connecting to DB2 databases."

9.0.7

Connect ETL/Connect for Big Data Report enhancements. Connect ETL/Connect for Big Data supports the generation of Connect ETL/Connect for Big Data job and task reports in machine processable, comma separated values (csv) format. Through command-line dmxreport invocation and wildcard support for input job and task files, you can generate data from an unlimited number of job definitions, interpret the data with data analysis tools, and load the data into databases and into other metadata repositories. See the Connect ETL/Connect for Big Data help topic, "dmxreport: Generating Connect ETL/Connect for Big Data job and task reports."

9.0.7

Hive targets via JDBC. Hive targets can now be written via a JDBC connection, allowing for easier configuration, particularly when running on a single cluster node. See "Connecting to Hive data warehouses" in the Connect ETL/Connect for Big Data help.

9.0.7

Hive targets support truncate and insert. Hive targets now support the "Truncate table and insert rows" disposition. See "Connecting to Hive data warehouses" in the Connect ETL/Connect for Big Data help.

9.0.7

Repository passwords. Connect ETL/Connect for Big Data now supports storage of password and secret key values in a secure repository under a user-provided name. This name can subsequently be referenced while creating Connect ETL/Connect for Big Data tasks or Connect Portal jobs. See "Repository passwords" in the Connect ETL/Connect for Big Data help.

9.0.6

MapR Streams sources and targets (Pre-release). Connect ETL/Connect for Big Data supports MapR Streams as a message queue source when running from an edge or single cluster node, and as a message queue target when running from an edge node, single cluster node, or MapR cluster. Kerberos support is not yet available for connectivity to MapR Streams. See "MapR Streams" in the Connect ETL/Connect for Big Data help.

9.0.4

Support for international character sets for data and metadata. Connect ETL/Connect for Big Data now enhances support for globalization through ICU character sets and multi-byte COBOL copybooks. See the Connect ETL/Connect for Big Data help topics "Data encoding" and "External metadata in a COBOL copybook."

9.0.2

DMXMMSRT supports NULL decimal fields. The DMXMMSRT utility now supports an option to treat empty or whitespace-only decimal fields (except Packed Decimal) as NULL. See "Mainframe migration sort component DMXMMSRT" in the Connect ETL/Connect for Big Data help.