Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis of large datasets stored in Hadoop's Distributed File System (HDFS) and other compatible file systems. Hive includes HiveQL, a query language useful for real-time analytics in Hadoop.
Connect for Big Data can connect to Hive data warehouses as:
- sources when running on the ETL server/edge node or in the cluster
- targets when running on the ETL server/edge node or in the cluster
Hive tables can also be accessed as HCatalog sources and targets.