Course Code: apacheh
Duration: 35 hours
Prerequisites:
  • Basic Linux administration skills
  • Basic programming skills
Overview:

Audience:

The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment

Goal:

Deep knowledge on Hadoop cluster administration.

Course Outline:

1: HDFS (17%)

  • Describe the function of HDFS Daemons
  • Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing.
  • Identify current features of computing systems that motivate a system like Apache Hadoop.
  • Classify major goals of HDFS Design
  • Given a scenario, identify appropriate use case for HDFS Federation
  • Identify components and daemon of an HDFS HA-Quorum cluster
  • Analyze the role of HDFS security (Kerberos)
  • Determine the best data serialization choice for a given scenario
  • Describe file read and write paths
  • Identify the commands to manipulate files in the Hadoop File System Shell

2: YARN and MapReduce version 2 (MRv2) (17%)

  • Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects cluster settings
  • Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons
  • Understand basic design strategy for MapReduce v2 (MRv2)
  • Determine how YARN handles resource allocations
  • Identify the workflow of MapReduce job running on YARN
  • Determine which files you must change and how in order to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) running on YARN.

3: Hadoop Cluster Planning (16%)

  • Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster.
  • Analyze the choices in selecting an OS
  • Understand kernel tuning and disk swapping
  • Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario
  • Given a scenario, determine the ecosystem components your cluster needs to run in order to fulfill the SLA
  • Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O
  • Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster
  • Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario

4: Hadoop Cluster Installation and Administration (25%)

  • Given a scenario, identify how the cluster will handle disk and machine failures
  • Analyze a logging configuration and logging configuration file format
  • Understand the basics of Hadoop metrics and cluster health monitoring
  • Identify the function and purpose of available tools for cluster monitoring
  • Be able to install all the ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig
  • Identify the function and purpose of available tools for managing the Apache Hadoop file system

5: Resource Management (10%)

  • Understand the overall design goals of each of Hadoop schedulers
  • Given a scenario, determine how the FIFO Scheduler allocates cluster resources
  • Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN
  • Given a scenario, determine how the Capacity Scheduler allocates cluster resources

6: Monitoring and Logging (15%)

  • Understand the functions and features of Hadoop’s metric collection abilities
  • Analyze the NameNode and JobTracker Web UIs
  • Understand how to monitor cluster Daemons
  • Identify and monitor CPU usage on master nodes
  • Describe how to monitor swap and memory allocation on all nodes
  • Identify how to view and manage Hadoop’s log files
  • Interpret a log file
Sites Published:

United Arab Emirates - Administrator Training for Apache Hadoop

Qatar - Administrator Training for Apache Hadoop

Egypt - Administrator Training for Apache Hadoop

Saudi Arabia - Administrator Training for Apache Hadoop

South Africa - Administrator Training for Apache Hadoop

Brasil - Treinamento de Administrador para Apache Hadoop

Canada - Administrator Training for Apache Hadoop

中国 - Administrator Training for Apache Hadoop

香港 - Administrator Training for Apache Hadoop

澳門 - Administrator Training for Apache Hadoop

台灣 - Administrator Training for Apache Hadoop

USA - Administrator Training for Apache Hadoop

Österreich - Administrator Training for Apache Hadoop

Schweiz - Administrator Training for Apache Hadoop

Deutschland - Administrator Training for Apache Hadoop

Czech Republic - Administrator Training for Apache Hadoop

Denmark - Administrator Training for Apache Hadoop

Estonia - Administrator Training for Apache Hadoop

Finland - Administrator Training for Apache Hadoop

Greece - Administrator Training for Apache Hadoop

Magyarország - Administrator Training for Apache Hadoop

Ireland - Administrator Training for Apache Hadoop

Luxembourg - Administrator Training for Apache Hadoop

Latvia - Administrator Training for Apache Hadoop

España - Capacitación de Administrador para Apache Hadoop

Italia - Administrator Training for Apache Hadoop

Lithuania - Administrator Training for Apache Hadoop

Nederland - Administrator Training for Apache Hadoop

Norway - Administrator Training for Apache Hadoop

Portugal - Treinamento de Administrador para Apache Hadoop

România - Administrator Training for Apache Hadoop

Sverige - Administrator Training for Apache Hadoop

Türkiye - Administrator Training for Apache Hadoop

Malta - Administrator Training for Apache Hadoop

Belgique - Administrator Training for Apache Hadoop

France - Administrator Training for Apache Hadoop

日本 - Administrator Training for Apache Hadoop

Australia - Administrator Training for Apache Hadoop

Malaysia - Administrator Training for Apache Hadoop

New Zealand - Administrator Training for Apache Hadoop

Philippines - Administrator Training for Apache Hadoop

Singapore - Administrator Training for Apache Hadoop

Thailand - Administrator Training for Apache Hadoop

Vietnam - Administrator Training for Apache Hadoop

India - Administrator Training for Apache Hadoop

Argentina - Capacitación de Administrador para Apache Hadoop

Chile - Capacitación de Administrador para Apache Hadoop

Costa Rica - Capacitación de Administrador para Apache Hadoop

Ecuador - Capacitación de Administrador para Apache Hadoop

Guatemala - Capacitación de Administrador para Apache Hadoop

Colombia - Capacitación de Administrador para Apache Hadoop

México - Capacitación de Administrador para Apache Hadoop

Panama - Capacitación de Administrador para Apache Hadoop

Peru - Capacitación de Administrador para Apache Hadoop

Uruguay - Capacitación de Administrador para Apache Hadoop

Venezuela - Capacitación de Administrador para Apache Hadoop

Polska - Administrator Training for Apache Hadoop

United Kingdom - Administrator Training for Apache Hadoop

South Korea - Administrator Training for Apache Hadoop

Pakistan - Administrator Training for Apache Hadoop

Sri Lanka - Administrator Training for Apache Hadoop

Bulgaria - Administrator Training for Apache Hadoop

Bolivia - Capacitación de Administrador para Apache Hadoop

Indonesia - Administrator Training for Apache Hadoop

Kazakhstan - Administrator Training for Apache Hadoop

Moldova - Administrator Training for Apache Hadoop

Morocco - Administrator Training for Apache Hadoop

Tunisia - Administrator Training for Apache Hadoop

Kuwait - Administrator Training for Apache Hadoop

Oman - Administrator Training for Apache Hadoop

Slovakia - Administrator Training for Apache Hadoop

Kenya - Administrator Training for Apache Hadoop

Nigeria - Administrator Training for Apache Hadoop

Botswana - Administrator Training for Apache Hadoop

Slovenia - Administrator Training for Apache Hadoop

Croatia - Administrator Training for Apache Hadoop

Serbia - Administrator Training for Apache Hadoop

Bhutan - Administrator Training for Apache Hadoop

Nepal - Administrator Training for Apache Hadoop

Uzbekistan - Administrator Training for Apache Hadoop