- comfortable with basic Linux system administration
- basic scripting skills
Knowledge of Hadoop and Distributed Computing is not required, but will be introduced and explained in the course.
Lab environment
Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.
Students will need the following
- an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
- a browser to access the cluster. We recommend Firefox browser with FoxyProxy extension installed
Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan cluster deployment and growth, how to install, maintain, monitor, troubleshoot and optimize Hadoop. They will also practice cluster bulk data load, get familiar with various Hadoop distributions, and practice installing and managing Hadoop ecosystem tools. The course finishes off with discussion of securing cluster with Kerberos.
“…The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organized”
— Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising
Audience
Hadoop administrators
Format
Lectures and hands-on labs, approximate balance 60% lectures, 40% labs.
-
Introduction
- Hadoop history, concepts
- Ecosystem
- Distributions
- High level architecture
- Hadoop myths
- Hadoop challenges (hardware / software)
- Labs: discuss your Big Data projects and problems
-
Planning and installation
- Selecting software, Hadoop distributions
- Sizing the cluster, planning for growth
- Selecting hardware and network
- Rack topology
- Installation
- Multi-tenancy
- Directory structure, logs
- Benchmarking
- Labs: cluster install, run performance benchmarks
-
HDFS operations
- Concepts (horizontal scaling, replication, data locality, rack awareness)
- Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
- Health monitoring
- Command-line and browser-based administration
- Adding storage, replacing defective drives
- Labs: getting familiar with HDFS command lines
-
Data ingestion
- Flume for logs and other data ingestion into HDFS
- Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
- Hadoop data warehousing with Hive
- Copying data between clusters (distcp)
- Using S3 as complementary to HDFS
- Data ingestion best practices and architectures
- Labs: setting up and using Flume, the same for Sqoop
-
MapReduce operations and administration
- Parallel computing before mapreduce: compare HPC vs Hadoop administration
- MapReduce cluster loads
- Nodes and Daemons (JobTracker, TaskTracker)
- MapReduce UI walk through
- Mapreduce configuration
- Job config
- Optimizing MapReduce
- Fool-proofing MR: what to tell your programmers
- Labs: running MapReduce examples
-
YARN: new architecture and new capabilities
- YARN design goals and implementation architecture
- New actors: ResourceManager, NodeManager, Application Master
- Installing YARN
- Job scheduling under YARN
- Labs: investigate job scheduling
-
Advanced topics
- Hardware monitoring
- Cluster monitoring
- Adding and removing servers, upgrading Hadoop
- Backup, recovery and business continuity planning
- Oozie job workflows
- Hadoop high availability (HA)
- Hadoop Federation
- Securing your cluster with Kerberos
- Labs: set up monitoring
-
Optional tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5)
- Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)
United Arab Emirates - Hadoop For Administrators
Qatar - Hadoop For Administrators
Egypt - Hadoop For Administrators
Saudi Arabia - Hadoop For Administrators
South Africa - Hadoop For Administrators
Brasil - Hadoop For Administrators
Canada - Hadoop For Administrators
中国 - Hadoop For Administrators
香港 - Hadoop For Administrators
澳門 - Hadoop For Administrators
台灣 - Hadoop For Administrators
USA - Hadoop For Administrators
Österreich - Hadoop For Administrators
Schweiz - Hadoop For Administrators
Deutschland - Hadoop For Administrators
Czech Republic - Hadoop For Administrators
Denmark - Hadoop For Administrators
Estonia - Hadoop For Administrators
Finland - Hadoop For Administrators
Greece - Hadoop For Administrators
Magyarország - Hadoop For Administrators
Ireland - Hadoop For Administrators
Luxembourg - Hadoop For Administrators
Latvia - Hadoop For Administrators
España - Hadoop para Administradores
Italia - Hadoop For Administrators
Lithuania - Hadoop For Administrators
Nederland - Hadoop For Administrators
Norway - Hadoop For Administrators
Portugal - Hadoop For Administrators
România - Hadoop For Administrators
Sverige - Hadoop For Administrators
Türkiye - Hadoop For Administrators
Malta - Hadoop For Administrators
Belgique - Hadoop pour Administrateurs
France - Hadoop pour Administrateurs
日本 - Hadoop For Administrators
Australia - Hadoop For Administrators
Malaysia - Hadoop For Administrators
New Zealand - Hadoop For Administrators
Philippines - Hadoop For Administrators
Singapore - Hadoop For Administrators
Thailand - Hadoop For Administrators
Vietnam - Hadoop For Administrators
India - Hadoop For Administrators
Argentina - Hadoop para Administradores
Chile - Hadoop para Administradores
Costa Rica - Hadoop para Administradores
Ecuador - Hadoop para Administradores
Guatemala - Hadoop para Administradores
Colombia - Hadoop para Administradores
México - Hadoop para Administradores
Panama - Hadoop para Administradores
Peru - Hadoop para Administradores
Uruguay - Hadoop para Administradores
Venezuela - Hadoop para Administradores
Polska - Hadoop For Administrators
United Kingdom - Hadoop For Administrators
South Korea - Hadoop For Administrators
Pakistan - Hadoop For Administrators
Sri Lanka - Hadoop For Administrators
Bulgaria - Hadoop For Administrators
Bolivia - Hadoop para Administradores
Indonesia - Hadoop For Administrators
Kazakhstan - Hadoop For Administrators
Moldova - Hadoop For Administrators
Morocco - Hadoop For Administrators
Tunisia - Hadoop For Administrators
Kuwait - Hadoop For Administrators
Oman - Hadoop For Administrators
Slovakia - Hadoop For Administrators
Kenya - Hadoop For Administrators
Nigeria - Hadoop For Administrators
Botswana - Hadoop For Administrators
Slovenia - Hadoop For Administrators
Croatia - Hadoop For Administrators
Serbia - Hadoop For Administrators
Bhutan - Hadoop For Administrators