Hadoop Online Training

The world class Big data – Hadoop training through online. This training covers all the following topics including project discussion. Topics including MapReducre, HDFS, Pig , Hive, Scoop, Spark, YARN and much more.


Contents

 

  1. Introduction to Big Data
    1. What is Big data
    2. Importance of Data
    3. Structured vs Unstructured data
    4. Data processing?
    5. Distributed Data Processing?
    6. Big Data Users & Scenarios
    7. Big Data opportunities
    8. Big Data Challenges
  2. Hadoop
    1. What is Hadoop?
    2. History
    3. Comparing Hadoop & SQL
    4. Hadoop Ecosystem
    5. Hadoop Distributed File System
    6. Map Reduce & HDFS.
    7. Data Locality.
    8. Hadoop Architecture.
  3. HDFS
    1. What is Hadoop Distributed File System.
    2. Importance of HDFS
    3. HDFS Design & Concepts
    4. Blocks, Name nodes and Data nodes
    5. HDFS Command-Line Interface
    6. Hadoop Configuration files
    7. Understanding Hadoop Cluster configuration
    8. Name Nodes & Safe Modes
    9. Adding New Data Node dynamically.
    10. Releasing Data Node dynamically.
    11. FSCK Utility
    12. HDFS Federation
    13. Data Ingestion to HDFS
  4. MapReduce
    1. Introduction to MapReduce
    2. WordCount Algorithm.
    3. Traditional approach – Drawbacks
    4. Traditional approach on a Distributed system
    5. Traditional approach – Drawbacks
    6. Input & Output Forms of a MapReduce program
    7. Workflow & Transformation of Data
    8. Map, Shuffle & Sort, Reduce Phases
    9. Input Split & HDFS Block
    10. Relation between Split & Block
    11. MR Flow with Single Reduce Task
    12. MR flow with multiple Reducers
    13. Data locality Optimization
    14. Speculative Execution
  5. Advanced MapReduce
    1. Combiner
    2. Partitioner
    3. Counters
    4. Hadoop Data Types
    5. Custom Data Types
    6. Input Format & Hierarchy
    7. Output Format & Hierarchy
    8. Side Data distribution – Distributed cache
    9. Joins
    10. Map side Join using Distributed cache
    11. Reduce side Join
    12. MR Unit – An Unit testing framework
  6. Pig
    1. What is Pig?
    2. Pig vs Sql
    3. Execution Types or Modes
    4. Running Pig
    5. Pig Data types
    6. Pig Latin relational Operators
    7. Multi Query execution
    8. Pig Latin Diagnostic Operators
    9. Pig Latin Macro & UDF statements
    10. Pig Latin Commands
    11. Pig Latin Expressions
    12. Schemas
    13. Pig Functions
    14. Pig Latin File Loaders
    15. Pig UDF & executing a Pig UDF
  7. Hive
    1. Introduction to Hive
    2. Pig Vs Hive
    3. Hive Limitations & Possibilities
    4. Hive Architecture
    5. Metastore
    6. Hive Data Organization
    7. Hive QL
    8. Sql vs Hive QL
    9. Hive Data types
    10. Data Storage
    11. Managed & External Tables
    12. Partitions & Buckets
    13. Storage Formats
    14. Built-in Serdes
    15. Importing Data
    16. Alter & Drop Commands
    17. Data Querying
    18. Using MR Scripts
    19. Hive Joins
    20. Sub Queries
    21. Views
    22. UDFs
  8. HBase
    1. Introduction to NoSql & HBase
    2. Row & Column oriented storage
    3. Characteristics of a huge DB
    4. What is HBase?
    5. HBase Data-Model
    6. HBase vs RDBMS
    7. HBase architecture
    8. HBase in operation
    9. Loading Data into HBase
    10. HBase shell commands
    11. HBase operations through Java
    12. HBase operations through MR
  9. ZooKeeper & Oozie
    1. Introduction to Zookeeper
    2. Distributed Co-ordination
    3. Zookeeper Data Model
    4. Zookeeper Service
    5. Zookeeper in HBase
    6. Introduction to Oozie
    7. Oozie workflow
  10. Sqoop
    1. Introduction to Sqoop
    2. Sqoop design
    3. Sqoop Commands
    4. Sqoop Import & Export Commands
    5. Sqoop Incremental load Commands
  11. Hadoop 2.0 & YARN
    1. Hadoop 1 Limitations
    2. HDFS Federation
    3. NameNode High Availability
    4. Introduction to YARN
    5. YARN Applications
    6. YARN Architecture
    7. Anatomy of an YARN application
  12. Project Discussion
    1. Java to MapReduce Conversion
    2. MapReduce Project
Share this