Course Code: hadoopdev
Duration: 28 hours
Prerequisites:
  • comfortable with Java programming language (most programming exercises are in java)
  • comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano)

Lab environment

Zero Install : There is no need to install Hadoop software on students’ machines! A working Hadoop cluster will be provided for students.

Students will need the following

  • an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
  • a browser to access the cluster, Firefox recommended
Overview:

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.

Course Outline:

Section 1: Introduction to Hadoop

  • Hadoop history, concepts
  • eco system
  • distributions
  • high level architecture
  • Hadoop myths
  • Hadoop challenges
  • hardware / software
  • lab : first look at Hadoop

Section 2: HDFS

  • Design and architecture
  • concepts (horizontal scaling, replication, data locality, rack awareness)
  • Daemons : Namenode, Secondary namenode, Data node
  • communications / heart-beats
  • data integrity
  • read / write path
  • Namenode High Availability (HA), Federation
  • labs : Interacting with HDFS

Section 3 : Map Reduce

  • concepts and architecture
  • daemons (MRV1) : jobtracker / tasktracker
  • phases : driver, mapper, shuffle/sort, reducer
  • Map Reduce Version 1 and Version 2 (YARN)
  • Internals of Map Reduce
  • Introduction to Java Map Reduce program
  • labs : Running a sample MapReduce program

Section 4 : Pig

  • pig vs java map reduce
  • pig job flow
  • pig latin language
  • ETL with Pig
  • Transformations & Joins
  • User defined functions (UDF)
  • labs : writing Pig scripts to analyze data

Section 5: Hive

  • architecture and design
  • data types
  • SQL support in Hive
  • Creating Hive tables and querying
  • partitions
  • joins
  • text processing
  • labs : various labs on processing data with Hive

Section 6: HBase

  • concepts and architecture
  • HBase vs RDBMS vs Cassandra
  • HBase Java API
  • Time series data on HBase
  • schema design
  • labs : Interacting with HBase using shell;   programming in HBase Java API ; Schema design exercise
Sites Published:

United Arab Emirates - Hadoop for Developers (4 days)

Qatar - Hadoop for Developers (4 days)

Egypt - Hadoop for Developers (4 days)

Saudi Arabia - Hadoop for Developers (4 days)

South Africa - Hadoop for Developers (4 days)

Brasil - Hadoop for Developers (4 days)

Canada - Hadoop for Developers (4 days)

中国 - Hadoop for Developers (4 days)

香港 - Hadoop for Developers (4 days)

澳門 - Hadoop for Developers (4 days)

台灣 - Hadoop for Developers (4 days)

USA - Hadoop for Developers (4 days)

Österreich - Hadoop for Developers (4 days)

Schweiz - Hadoop for Developers (4 days)

Deutschland - Hadoop for Developers (4 days)

Czech Republic - Hadoop for Developers (4 days)

Denmark - Hadoop for Developers (4 days)

Estonia - Hadoop for Developers (4 days)

Finland - Hadoop for Developers (4 days)

Greece - Hadoop for Developers (4 days)

Magyarország - Hadoop for Developers (4 days)

Ireland - Hadoop for Developers (4 days)

Luxembourg - Hadoop for Developers (4 days)

Latvia - Hadoop for Developers (4 days)

España - Hadoop para Desarrolladores (4 días)

Italia - Hadoop for Developers (4 days)

Lithuania - Hadoop for Developers (4 days)

Nederland - Hadoop for Developers (4 days)

Norway - Hadoop for Developers (4 days)

Portugal - Hadoop for Developers (4 days)

România - Hadoop for Developers (4 days)

Sverige - Hadoop for Developers (4 days)

Türkiye - Hadoop for Developers (4 days)

Malta - Hadoop for Developers (4 days)

Belgique - Hadoop for Developers (4 days)

France - Hadoop for Developers (4 days)

日本 - Hadoop for Developers (4 days)

Australia - Hadoop for Developers (4 days)

Malaysia - Hadoop for Developers (4 days)

New Zealand - Hadoop for Developers (4 days)

Philippines - Hadoop for Developers (4 days)

Singapore - Hadoop for Developers (4 days)

Thailand - Hadoop for Developers (4 days)

Vietnam - Hadoop for Developers (4 days)

India - Hadoop for Developers (4 days)

Argentina - Hadoop para Desarrolladores (4 días)

Chile - Hadoop para Desarrolladores (4 días)

Costa Rica - Hadoop para Desarrolladores (4 días)

Ecuador - Hadoop para Desarrolladores (4 días)

Guatemala - Hadoop para Desarrolladores (4 días)

Colombia - Hadoop para Desarrolladores (4 días)

México - Hadoop para Desarrolladores (4 días)

Panama - Hadoop para Desarrolladores (4 días)

Peru - Hadoop para Desarrolladores (4 días)

Uruguay - Hadoop para Desarrolladores (4 días)

Venezuela - Hadoop para Desarrolladores (4 días)

Polska - Hadoop for Developers (4 days)

United Kingdom - Hadoop for Developers (4 days)

South Korea - Hadoop for Developers (4 days)

Pakistan - Hadoop for Developers (4 days)

Sri Lanka - Hadoop for Developers (4 days)

Bulgaria - Hadoop for Developers (4 days)

Bolivia - Hadoop para Desarrolladores (4 días)

Indonesia - Hadoop for Developers (4 days)

Kazakhstan - Hadoop for Developers (4 days)

Moldova - Hadoop for Developers (4 days)

Morocco - Hadoop for Developers (4 days)

Tunisia - Hadoop for Developers (4 days)

Kuwait - Hadoop for Developers (4 days)

Oman - Hadoop for Developers (4 days)

Slovakia - Hadoop for Developers (4 days)

Kenya - Hadoop for Developers (4 days)

Nigeria - Hadoop for Developers (4 days)

Botswana - Hadoop for Developers (4 days)

Slovenia - Hadoop for Developers (4 days)

Croatia - Hadoop for Developers (4 days)

Serbia - Hadoop for Developers (4 days)

Bhutan - Hadoop for Developers (4 days)

Nepal - Hadoop for Developers (4 days)

Uzbekistan - Hadoop for Developers (4 days)