Hadoop -Bigdata

Hadoop

    1. What is big Data
    2. What is Hadoop
    3. Relation between Big data and Hadoop
    4. Need of going ahead with Hadoop
    5. Challenges with Big Data
    6. Storage
    7. Processing
    8. Comparison with other Technologies
    9. RDBMS
    10. DATA WAREHOUSE
    11. TERADATA

Components of Hadoop Echo System

  1. Storage Components
  2. Processing Components

HDFS (Hadoop Distributed file System)

  1. What is a cluster environment
  2. Cluster Vs Hadoop cluster
  3. Features of HDFS
  4. Storage aspects of HDFS
      1. Block

Configuring the Block size

  1. Why HDFS Block size is so large
  2. Design Principles of Block size

HDFS Architecture – 5 Daemons of Hadoop

  1. Name Node
  2. Data Node
  3. Secondary Name Node
  4. Job Tracker
  5. Task Tracker

Replication in Hadoop – Fail over Mechanism

  1. Data Storage in Data Nodes
  2. Replication
  3. Custom Replication

MapReduce

  1. Why Map Reduce is essential in Hadoop
  2. Processing Daemons of Hadoop
  3. Job Tracker
    1. Roles of Job Tracker
    2. How to configure Job Tracker in Hadoop
  4. Task Tracker
    1. Roles of Task Tracker
    2. Drawbacks W.R.T failure in cluster
  5. Input Split
    1. Need of Input Split
    2. Input Split Size
    3. Input split size Vs block size
    4. Input Split Vs Mappers
  6. Map Reduce Programming Model
    1. Different phases of Map Reduce Algorithm
    2. Data Types in Map Reduce
  7. Basis Map Reduce program
    1. Driver code
    2. Mapper Code
    3. Reducer Code
  8. Combiner in Map Reduce
  9. Practitioner in Map Reduce
  10. Joins in map Reduce
    1. Map side join
    2. Reduce side join
    3. Performance trade off

Apache PIG

  1. Introduction to PIG
  2. Map Reduce Vs PIG
  3. SQL Vs PIG
  4. Data Types in PIG
  5. Execution Modes of Pig ( Local/Distributed)
  6. Execution Mechanism { Grunt Shell, Script }
  7. Writing Simple pig script
  8. Bags, Tuples, and Fields in PIG
  9. UDF’s in PIG

HIVE

  1. Need of Apache Hive
  2. HIVE Architecture [ Driver, Compiler, Executer ]
  3. HIVE Query language
  4. SQL Vs HIVE QL
  5. Collection Data types in Hive [ Array, Struct, Map ]
  6. UDF’s in HIVE
  7. UDAFs
  8. UDTFs
  9. SerDe [ Hive serializer / Deserializer ]

SQOOP

  1. Introduction
  2. MySQL Initialization
  3. Connecting RDBMS using SQOOP
  4. Sqoop Commands

HBASE

  1. Introduction
  2. HDFS Vs HBase
  3. HBase Architecture
  4. MapReduce over HBase

> Pre Requisites: Core JAVA + Linux Commands

> Time Duration: 5 Weeks [30 Hrs + lab]