HADOOP – Big Data – Content
Hadoop
- What is big Data
- What is Hadoop
- Relation between Big data and Hadoop
- Need of going ahead with Hadoop
- Challenges with Big Data
- Storage
- Processing
- Comparison with other Technologies
- RDBMS
- DATA WAREHOUSE
- TERADATA
Components of Hadoop Echo System
- Storage Components
- Processing Components
HDFS (Hadoop Distributed file System)
- What is a cluster environment
- Cluster Vs Hadoop cluster
- Features of HDFS
- Storage aspects of HDFS
-
- Block
Configuring the Block size
-
- Why HDFS Block size is so large
- Design Principles of Block size
HDFS Architecture – 5 Daemons of Hadoop
- Name Node
- Data Node
- Secondary Name Node
- Job Tracker
- Task Tracker
Replication in Hadoop – Fail over Mechanism
- Data Storage in Data Nodes
- Replication
- Custom Replication
- Why Map Reduce is essential in Hadoop
- Processing Daemons of Hadoop
-
- Roles of Job Tracker
- How to configure Job Tracker in Hadoop
Job Tracker
Task Tracker
- Roles of Task Tracker
- Drawbacks W.R.T failure in cluster
Input Split
- Need of Input Split
- Input Split Size
- Input split size Vs block size
- Input Split Vs Mappers
Map Reduce Programming Model
- Different phases of Map Reduce Algorithm
- Data Types in Map Reduce
- Driver code
- Mapper Code
- Reducer Code
Basis Map Reduce program
Combiner in Map Reduce
Practitioner in Map Reduce
Joins in map Reduce
- Map side join
- Reduce side join
- Performance trade off
Map Reduce Streaming
- Introduction to PIG
- Map Reduce Vs PIG
- SQL Vs PIG
- Data Types in PIG
- Execution Modes of Pig ( Local/Distributed)
- Execution Mechanism { Grunt Shell, Script }
- Writing Simple pig script
- Bags, Tuples, and Fields in PIG
- UDF’s in PIG
Apache PIG
- Need of Apache Hive
- HIVE Architecture [ Driver, Compiler, Executer ]
- HIVE Query language
- SQL Vs HIVE QL
- Collection Data types in Hive [ Array, Struct, Map ]
- UDF’s in HIVE
- UDAFs
- UDTFs
- SerDe [ Hive serializer / Deserializer ]
HIVE
- Introduction
- MySQL Initialization
- Connecting RDBMS using SQOOP
- Sqoop Commands
SQOOP
- Introduction
- HDFS Vs HBase
- HBase Architecture
- MapReduce over HBase
HBASE
-
-
-
Pre Requisites: Core JAVA + Linux Commands
Time Duration: 5 Weeks [30 Hrs + lab]
-
-
MapReduce