Connect for Big Data writes supported Hive data types to Hive targets using different methods depending on whether the connection is via JDBC or ODBC. JDBC is recommended over ODBC. For jobs run in the cluster, Connect for Big Data supports reading from Hive sources using JDBC only. Consider the following:
- JDBC - When Connect for Big Data writes to a Hive table via JDBC, the data is generally
loaded directly into target tables. Writes are temporarily staged in compressed or
non-compressed format to a text-backed Hive table. only when one of the following conditions limits direct access:
- A target table is an ACID table
- A target table has one or more partitions
- A target table has any complex type column(s)
- The target table performs Truncate, Upsert, or Upsert and Apply change (CDC) dispositions
- The job runs on localnode or singleclusternode
- A user forces Connect for Big Data to stage data by setting the environment variable DMX_HIVE_TARGET_FORCE_STAGING to 1, which uses the two-step process implemented in previous versions of Connect
- ODBC - based on the file format and whether the
Hive table is partitioned, one of the following methods is used to write to
Hive:
- Method 1 - When Connect for Big Data writes to a Hive table via ODBC, the data is temporarily staged in parallel streams in compressed or non-compressed format to a text-backed Hive table.
- Method 2 - Connect for Big Data writes to Hive by loading the data in parallel streams directly to the Hadoop file system for optimal performance.
File Format | Partitioned | Non-partitioned |
---|---|---|
Apache Avro, Apache Parquet, or delimited text files | Method 1 | Method 2 |
Other file formats | Method 1 | Method 1 |