Course Code: hadoopadm1
Duration: 21 hours
Prerequisites:
  • comfortable with basic Linux system administration
  • basic scripting skills

Knowledge of Hadoop and Distributed Computing is not required, but will be introduced and explained in the course.

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.

Students will need the following

  • an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
  • a browser to access the cluster. We recommend Firefox browser with FoxyProxy extension installed
Overview:

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan cluster deployment and growth, how to install, maintain, monitor, troubleshoot and optimize Hadoop. They will also practice cluster bulk data load, get familiar with various Hadoop distributions, and practice installing and managing Hadoop ecosystem tools. The course finishes off with discussion of securing cluster with Kerberos.

“…The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organized”
— Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising

Audience

Hadoop administrators

Format

Lectures and hands-on labs, approximate balance 60% lectures, 40% labs.

Course Outline:
  • Introduction
    • Hadoop history, concepts
    • Ecosystem
    • Distributions
    • High level architecture
    • Hadoop myths
    • Hadoop challenges (hardware / software)
    • Labs: discuss your Big Data projects and problems
  • Planning and installation
    • Selecting software, Hadoop distributions
    • Sizing the cluster, planning for growth
    • Selecting hardware and network
    • Rack topology
    • Installation
    • Multi-tenancy
    • Directory structure, logs
    • Benchmarking
    • Labs: cluster install, run performance benchmarks
  • HDFS operations
    • Concepts (horizontal scaling, replication, data locality, rack awareness)
    • Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
    • Health monitoring
    • Command-line and browser-based administration
    • Adding storage, replacing defective drives
    • Labs: getting familiar with HDFS command lines
  • Data ingestion
    • Flume for logs and other data ingestion into HDFS
    • Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
    • Hadoop data warehousing with Hive
    • Copying data between clusters (distcp)
    • Using S3 as complementary to HDFS
    • Data ingestion best practices and architectures
    • Labs: setting up and using Flume, the same for Sqoop
  • MapReduce operations and administration
    • Parallel computing before mapreduce: compare HPC vs Hadoop administration
    • MapReduce cluster loads
    • Nodes and Daemons (JobTracker, TaskTracker)
    • MapReduce UI walk through
    • Mapreduce configuration
    • Job config
    • Optimizing MapReduce
    • Fool-proofing MR: what to tell your programmers
    • Labs: running MapReduce examples
  • YARN: new architecture and new capabilities
    • YARN design goals and implementation architecture
    • New actors: ResourceManager, NodeManager, Application Master
    • Installing YARN
    • Job scheduling under YARN
    • Labs: investigate job scheduling
  • Advanced topics
    • Hardware monitoring
    • Cluster monitoring
    • Adding and removing servers, upgrading Hadoop
    • Backup, recovery and business continuity planning
    • Oozie job workflows
    • Hadoop high availability (HA)
    • Hadoop Federation
    • Securing your cluster with Kerberos
    • Labs: set up monitoring
  • Optional tracks
    • Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5)
    • Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)
Sites Published:

United Arab Emirates - Hadoop For Administrators

Qatar - Hadoop For Administrators

Egypt - Hadoop For Administrators

Saudi Arabia - Hadoop For Administrators

South Africa - Hadoop For Administrators

Brasil - Hadoop For Administrators

Canada - Hadoop For Administrators

中国 - Hadoop For Administrators

香港 - Hadoop For Administrators

澳門 - Hadoop For Administrators

台灣 - Hadoop For Administrators

USA - Hadoop For Administrators

Österreich - Hadoop For Administrators

Schweiz - Hadoop For Administrators

Deutschland - Hadoop For Administrators

Czech Republic - Hadoop For Administrators

Denmark - Hadoop For Administrators

Estonia - Hadoop For Administrators

Finland - Hadoop For Administrators

Greece - Hadoop For Administrators

Magyarország - Hadoop For Administrators

Ireland - Hadoop For Administrators

Luxembourg - Hadoop For Administrators

Latvia - Hadoop For Administrators

España - Hadoop para Administradores

Italia - Hadoop For Administrators

Lithuania - Hadoop For Administrators

Nederland - Hadoop For Administrators

Norway - Hadoop For Administrators

Portugal - Hadoop For Administrators

România - Hadoop For Administrators

Sverige - Hadoop For Administrators

Türkiye - Hadoop For Administrators

Malta - Hadoop For Administrators

Belgique - Hadoop pour Administrateurs

France - Hadoop pour Administrateurs

日本 - Hadoop For Administrators

Australia - Hadoop For Administrators

Malaysia - Hadoop For Administrators

New Zealand - Hadoop For Administrators

Philippines - Hadoop For Administrators

Singapore - Hadoop For Administrators

Thailand - Hadoop For Administrators

Vietnam - Hadoop For Administrators

India - Hadoop For Administrators

Argentina - Hadoop para Administradores

Chile - Hadoop para Administradores

Costa Rica - Hadoop para Administradores

Ecuador - Hadoop para Administradores

Guatemala - Hadoop para Administradores

Colombia - Hadoop para Administradores

México - Hadoop para Administradores

Panama - Hadoop para Administradores

Peru - Hadoop para Administradores

Uruguay - Hadoop para Administradores

Venezuela - Hadoop para Administradores

Polska - Hadoop For Administrators

United Kingdom - Hadoop For Administrators

South Korea - Hadoop For Administrators

Pakistan - Hadoop For Administrators

Sri Lanka - Hadoop For Administrators

Bulgaria - Hadoop For Administrators

Bolivia - Hadoop para Administradores

Indonesia - Hadoop For Administrators

Kazakhstan - Hadoop For Administrators

Moldova - Hadoop For Administrators

Morocco - Hadoop For Administrators

Tunisia - Hadoop For Administrators

Kuwait - Hadoop For Administrators

Oman - Hadoop For Administrators

Slovakia - Hadoop For Administrators

Kenya - Hadoop For Administrators

Nigeria - Hadoop For Administrators

Botswana - Hadoop For Administrators

Slovenia - Hadoop For Administrators

Croatia - Hadoop For Administrators

Serbia - Hadoop For Administrators

Bhutan - Hadoop For Administrators

Nepal - Hadoop For Administrators

Uzbekistan - Hadoop For Administrators