HDFS is a distributed Java-based file system typically used to store large volumes of data. HDFS is one of several components that make up the framework called Hadoop; which consists of several subsystems that provide for parallel and distributed computation on large datasets. Similar to Kafka, HDFS operates only on commodity Linux based hardware. Connect CDC (SQData) bridges the gap between z/OS and HDFS with its DB2 Change Data Capture on Z/OS and Replicator Engine running on Linux. This provides a hybrid solution that effectively supports the HDFS distributed file system that utilizes a cluster of machines to provide high-throughput access for Big Data applications.
Connect CDC (SQData) high performance Change Data Capture can be used to detect business events that "Change Data" in real time and/or capture all changes that occur in a DB2 source database. The resulting CDC data is processed by the Replicator Engine running on Linux that automatically generates JSON Schemas, translates all source data to UTF-8 and writes that to HDFS using native API's.
Connect CDC (SQData) also seamlessly supports AVRO formatted data by automating the generation of AVRO Schema's for new Tables and updating those schemas when they change in the Relational source database.