Brief Hadoop Overview - Connect_ETL - 9.13

Connect ETL for Big Data Sort User Guide

Product type
Software
Portfolio
Integrate
Product family
Connect
Product
Connect > Connect (ETL, Sort, AppMod, Big Data)
Version
9.13
Language
English
Product name
Connect ETL
Title
Connect ETL for Big Data Sort User Guide
Copyright
2023
First publish date
2003
Last updated
2023-09-11
Published on
2023-09-11T19:03:59.237517

Apache Hadoop is an open-source software framework that supports distributed storage and processing of large amounts of data on scalable clusters of commodity hardware. It consists of two primary components:

  • HDFS – the Hadoop Distributed File System, which breaks up data files into small blocks stored on multiple nodes in the cluster
  • MapReduce – a programming model for dividing the data processing among many nodes on the cluster and then bringing the data results back together to achieve higher throughput via parallelization. MapReduce consists of two main programmer-written functions, map and reduce.

The Hadoop framework handles data distribution, job execution, and job/node failures, so that programmers writing MapReduce jobs only need to be concerned with the actual processing of the data.