Course Code: hadoopsparkforadmin
Duration: 35 hours
Prerequisites:
  • System administration experience
  • Experience with Linux command line
  • An understanding of big data concepts

Audience

  • System administrators
  • DBAs
Overview:

Apache Hadoop is a popular data processing framework for processing large data sets across many computers.

This instructor-led, live training (online or onsite) is aimed at system administrators who wish to learn how to set up, deploy and manage Hadoop clusters within their organization.

By the end of this training, participants will be able to:

  • Install and configure Apache Hadoop.
  • Understand the four major components in the Hadoop ecoystem: HDFS, MapReduce, YARN, and Hadoop Common.
  • Use Hadoop Distributed File System (HDFS) to scale a cluster to hundreds or thousands of nodes.  
  • Set up HDFS to operate as storage engine for on-premise Spark deployments.
  • Set up Spark to access alternative storage solutions such as Amazon S3 and NoSQL database systems such as Redis, Elasticsearch, Couchbase, Aerospike, etc.
  • Carry out administrative tasks such as provisioning, management, monitoring and securing an Apache Hadoop cluster.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.
Course Outline:

Introduction

  • Introduction to Cloud Computing and Big Data solutions
  • Overview of Apache Hadoop Features and Architecture

Setting up Hadoop

  • Planning a Hadoop cluster (on-premise, cloud, etc.)
  • Selecting the OS and Hadoop distribution
  • Provisioning resources (hardware, network, etc.)
  • Downloading and installing the software
  • Sizing the cluster for flexibility

Working with HDFS

  • Understanding the Hadoop Distributed File System (HDFS)
  • Overview of HDFS Command Reference
  • Accessing HDFS
  • Performing Basic File Operations on HDFS
  • Using S3 as a complement to HDFS

Overview of the MapReduce

  • Understanding Data Flow in the MapReduce Framework
  • Map, Shuffle, Sort and Reduce
  • Demo: Computing Top Salaries

Working with YARN

  • Understanding resource management in Hadoop
  • Working with ResourceManager, NodeManager, Application Master
  • Scheduling jobs under YARN
  • Scheduling for large numbers of nodes and clusters
  • Demo: Job scheduling

Integrating Hadoop with Spark

  • Setting up storage for Spark (HDFS, Amazon, S3, NoSQL, etc.)
  • Understanding Resilient Distributed Datasets (RDDs)
  • Creating an RDD
  • Implementing RDD Transformations
  • Demo: Implementing a Text Search Program for Movie Titles

Managing a Hadoop Cluster

  • Monitoring Hadoop
  • Securing a Hadoop cluster
  • Adding and removing nodes
  • Running a performance benchmark
  • Tuning a Hadoop cluster to optimizing performance
  • Backup, recovery and business continuity planning
  • Ensuring high availability (HA)

Upgrading and Migrating a Hadoop Cluster

  • Assessing workload requirements
  • Upgrading Hadoop
  • Moving from on-premise to cloud and vice-versa
  • Recovering from failures

Troubleshooting

Summary and Conclusion

Sites Published:

United Arab Emirates - Hadoop and Spark for Administrators

Qatar - Hadoop and Spark for Administrators

Egypt - Hadoop and Spark for Administrators

Saudi Arabia - Hadoop and Spark for Administrators

South Africa - Hadoop and Spark for Administrators

Brasil - Hadoop and Spark for Administrators

Canada - Hadoop and Spark for Administrators

中国 - Hadoop and Spark for Administrators

香港 - Hadoop and Spark for Administrators

澳門 - Hadoop and Spark for Administrators

台灣 - Hadoop and Spark for Administrators

USA - Hadoop and Spark for Administrators

Österreich - Hadoop and Spark for Administrators

Schweiz - Hadoop and Spark for Administrators

Deutschland - Hadoop and Spark for Administrators

Czech Republic - Hadoop and Spark for Administrators

Denmark - Hadoop and Spark for Administrators

Estonia - Hadoop and Spark for Administrators

Finland - Hadoop and Spark for Administrators

Greece - Hadoop and Spark for Administrators

Magyarország - Hadoop and Spark for Administrators

Ireland - Hadoop and Spark for Administrators

Luxembourg - Hadoop and Spark for Administrators

Latvia - Hadoop and Spark for Administrators

España - Hadoop and Spark for Administrators

Italia - Hadoop and Spark for Administrators

Lithuania - Hadoop and Spark for Administrators

Nederland - Hadoop and Spark for Administrators

Norway - Hadoop and Spark for Administrators

Portugal - Hadoop and Spark for Administrators

România - Hadoop and Spark for Administrators

Sverige - Hadoop and Spark for Administrators

Türkiye - Hadoop and Spark for Administrators

Malta - Hadoop and Spark for Administrators

Belgique - Hadoop and Spark for Administrators

France - Hadoop and Spark for Administrators

日本 - Hadoop and Spark for Administrators

Australia - Hadoop and Spark for Administrators

Malaysia - Hadoop and Spark for Administrators

New Zealand - Hadoop and Spark for Administrators

Philippines - Hadoop and Spark for Administrators

Singapore - Hadoop and Spark for Administrators

Thailand - Hadoop and Spark for Administrators

Vietnam - Hadoop and Spark for Administrators

India - Hadoop and Spark for Administrators

Argentina - Hadoop and Spark for Administrators

Chile - Hadoop and Spark for Administrators

Costa Rica - Hadoop and Spark for Administrators

Ecuador - Hadoop and Spark for Administrators

Guatemala - Hadoop and Spark for Administrators

Colombia - Hadoop and Spark for Administrators

México - Hadoop and Spark for Administrators

Panama - Hadoop and Spark for Administrators

Peru - Hadoop and Spark for Administrators

Uruguay - Hadoop and Spark for Administrators

Venezuela - Hadoop and Spark for Administrators

Polska - Hadoop and Spark for Administrators

United Kingdom - Hadoop and Spark for Administrators

South Korea - Hadoop and Spark for Administrators

Pakistan - Hadoop and Spark for Administrators

Sri Lanka - Hadoop and Spark for Administrators

Bulgaria - Hadoop and Spark for Administrators

Bolivia - Hadoop and Spark for Administrators

Indonesia - Hadoop and Spark for Administrators

Kazakhstan - Hadoop and Spark for Administrators

Moldova - Hadoop and Spark for Administrators

Morocco - Hadoop and Spark for Administrators

Tunisia - Hadoop and Spark for Administrators

Kuwait - Hadoop and Spark for Administrators

Oman - Hadoop and Spark for Administrators

Slovakia - Hadoop and Spark for Administrators

Kenya - Hadoop and Spark for Administrators

Nigeria - Hadoop and Spark for Administrators

Botswana - Hadoop and Spark for Administrators

Slovenia - Hadoop and Spark for Administrators

Croatia - Hadoop and Spark for Administrators

Serbia - Hadoop and Spark for Administrators

Bhutan - Hadoop and Spark for Administrators

Nepal - Hadoop and Spark for Administrators

Uzbekistan - Hadoop and Spark for Administrators