- comfortable with Java programming language (most programming exercises are in java)
- comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano)
Lab environment
Zero Install : There is no need to install Hadoop software on students’ machines! A working Hadoop cluster will be provided for students.
Students will need the following
- an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
- a browser to access the cluster, Firefox recommended
Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.
Section 1: Introduction to Hadoop
- Hadoop history, concepts
- eco system
- distributions
- high level architecture
- Hadoop myths
- Hadoop challenges
- hardware / software
- lab : first look at Hadoop
Section 2: HDFS
- Design and architecture
- concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons : Namenode, Secondary namenode, Data node
- communications / heart-beats
- data integrity
- read / write path
- Namenode High Availability (HA), Federation
- labs : Interacting with HDFS
Section 3 : Map Reduce
- concepts and architecture
- daemons (MRV1) : jobtracker / tasktracker
- phases : driver, mapper, shuffle/sort, reducer
- Map Reduce Version 1 and Version 2 (YARN)
- Internals of Map Reduce
- Introduction to Java Map Reduce program
- labs : Running a sample MapReduce program
Section 4 : Pig
- pig vs java map reduce
- pig job flow
- pig latin language
- ETL with Pig
- Transformations & Joins
- User defined functions (UDF)
- labs : writing Pig scripts to analyze data
Section 5: Hive
- architecture and design
- data types
- SQL support in Hive
- Creating Hive tables and querying
- partitions
- joins
- text processing
- labs : various labs on processing data with Hive
Section 6: HBase
- concepts and architecture
- HBase vs RDBMS vs Cassandra
- HBase Java API
- Time series data on HBase
- schema design
- labs : Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise
United Arab Emirates - Hadoop for Developers (4 days)
Qatar - Hadoop for Developers (4 days)
Egypt - Hadoop for Developers (4 days)
Saudi Arabia - Hadoop for Developers (4 days)
South Africa - Hadoop for Developers (4 days)
Brasil - Hadoop for Developers (4 days)
Canada - Hadoop for Developers (4 days)
中国 - Hadoop for Developers (4 days)
香港 - Hadoop for Developers (4 days)
澳門 - Hadoop for Developers (4 days)
台灣 - Hadoop for Developers (4 days)
USA - Hadoop for Developers (4 days)
Österreich - Hadoop for Developers (4 days)
Schweiz - Hadoop for Developers (4 days)
Deutschland - Hadoop for Developers (4 days)
Czech Republic - Hadoop for Developers (4 days)
Denmark - Hadoop for Developers (4 days)
Estonia - Hadoop for Developers (4 days)
Finland - Hadoop for Developers (4 days)
Greece - Hadoop for Developers (4 days)
Magyarország - Hadoop for Developers (4 days)
Ireland - Hadoop for Developers (4 days)
Luxembourg - Hadoop for Developers (4 days)
Latvia - Hadoop for Developers (4 days)
España - Hadoop para Desarrolladores (4 días)
Italia - Hadoop for Developers (4 days)
Lithuania - Hadoop for Developers (4 days)
Nederland - Hadoop for Developers (4 days)
Norway - Hadoop for Developers (4 days)
Portugal - Hadoop for Developers (4 days)
România - Hadoop for Developers (4 days)
Sverige - Hadoop for Developers (4 days)
Türkiye - Hadoop for Developers (4 days)
Malta - Hadoop for Developers (4 days)
Belgique - Hadoop for Developers (4 days)
France - Hadoop for Developers (4 days)
日本 - Hadoop for Developers (4 days)
Australia - Hadoop for Developers (4 days)
Malaysia - Hadoop for Developers (4 days)
New Zealand - Hadoop for Developers (4 days)
Philippines - Hadoop for Developers (4 days)
Singapore - Hadoop for Developers (4 days)
Thailand - Hadoop for Developers (4 days)
Vietnam - Hadoop for Developers (4 days)
India - Hadoop for Developers (4 days)
Argentina - Hadoop para Desarrolladores (4 días)
Chile - Hadoop para Desarrolladores (4 días)
Costa Rica - Hadoop para Desarrolladores (4 días)
Ecuador - Hadoop para Desarrolladores (4 días)
Guatemala - Hadoop para Desarrolladores (4 días)
Colombia - Hadoop para Desarrolladores (4 días)
México - Hadoop para Desarrolladores (4 días)
Panama - Hadoop para Desarrolladores (4 días)
Peru - Hadoop para Desarrolladores (4 días)
Uruguay - Hadoop para Desarrolladores (4 días)
Venezuela - Hadoop para Desarrolladores (4 días)
Polska - Hadoop for Developers (4 days)
United Kingdom - Hadoop for Developers (4 days)
South Korea - Hadoop for Developers (4 days)
Pakistan - Hadoop for Developers (4 days)
Sri Lanka - Hadoop for Developers (4 days)
Bulgaria - Hadoop for Developers (4 days)
Bolivia - Hadoop para Desarrolladores (4 días)
Indonesia - Hadoop for Developers (4 days)
Kazakhstan - Hadoop for Developers (4 days)
Moldova - Hadoop for Developers (4 days)
Morocco - Hadoop for Developers (4 days)
Tunisia - Hadoop for Developers (4 days)
Kuwait - Hadoop for Developers (4 days)
Oman - Hadoop for Developers (4 days)
Slovakia - Hadoop for Developers (4 days)
Kenya - Hadoop for Developers (4 days)
Nigeria - Hadoop for Developers (4 days)
Botswana - Hadoop for Developers (4 days)
Slovenia - Hadoop for Developers (4 days)
Croatia - Hadoop for Developers (4 days)
Serbia - Hadoop for Developers (4 days)
Bhutan - Hadoop for Developers (4 days)