Course Code: certhadoopadminbes
Duration: 35 hours
Prerequisites:
  • Participants are expected to have a basic understanding of OOPS Concept
  • Any previous experience with development or administration will be a plus
  • Experience with AWS or any other cloud based environment will be a plus

Audience

  • Database administrator
Course Outline:

What is Hadoop and basic introduction?

  • Introduction
  • Getting Started
  • Use cases
  • Machine Learning and future

ISMAC, Lambda Architecture, AWS

  • Introduction
  • Lambda Architecture Overview and Details
  • Use cases
  • Introduction To Amazon Web Services (AWS)
  • Signup and Billing (Important - Pricing related)
  • Zones and Regions
  • Launch EC2 Instance
  • Simple Storage Service (S3)
  • Login to EC2 Instance using Putty
  • EC2 AMI (Amazon Machine Image)
  • EC2 Spot Instances
  • Relational Data Service (RDS)

Pandas, Python, Github and stats using jupyter

  • Web Scraping. Regular Expressions. Data Reshaping. Data Cleanup. Pandas.
  • Exploratory Data Analysis
  • Scraping, Pandas, Python, and viz
  • Pandas, SQL, and the Grammar of Data
  • Statistical Models
  • Probability, Distributions, and Frequentist Statistics
  • Bias and Regression
  • Regression, Logistic Regression: in sklearn and statsmodels
  • Classification. kNN. Cross Validation. Dimensionality Reduction. PCA. MDS.

HDFS Basics and Cloudera

  • Introduction to HDFS, YARN and Mysql database setup and installation
  • Prepare AWS AMI for Cloudera Installation
  • Prepare AWS AMI for Cloudera Installation
  • Cloudera Installation Phases and Paths
  • Cloudera Manager Introduction and Overview
  • Parcels and Repository setup with Apachehttpd
  • Cloudera Installation Path B with local repository – AMI and prepare
  • Add Cluster, Add Service and Delete Cluster life cycle

Installing and using Hortonworks Framework

  • Installing a single node cluster of Hortonworks framework
  • Configure a local HDP repository
  • Install HDP using the Ambari install wizard
  • Decommission a node
  • Add a new node to an existing cluster
  • Add an HDP service to a cluster using Ambari
  • Change the configuration of a service using Ambari
  • Configure the location of log files for services

HDFS Basics shell commands and HDFS Manage - Balancer, Maintenance, Quota Management, Canary Test

  • Hdfs shell commands
  • HDFS Trash and HA concept Setup, Configure, Test, Verify, Remove
  • HDFS Balancer, Maintenance Mode, Quota Management, Canary Test, Rack Awareness

HDFS Checkpoint. Understand, Manage, Work with Edits, FSImage, Roll Edits,

  • HDFS Edits FSImage Introduction
  • Checkpoint Introduction, Offline Image View (OIV) and Offline Edits View (OEV), Roll Edits, Save Namespace

Snapshot, WebHDFS, Federation, Recovery, httpFS, Edge Node

  • HDFS Snapshot, Snapshot Policy, Edge Node, WebHDFS, httpFS
  • FSCK Utility, Recovery, Federation, Home Directory

Cloudera Manage - Commission, Decommission and LDAP - Install, Configure OS, phpLDAPAdmin Client

  • Cluster Commission and Decommission, Client Configuration
  • Cluster Host Template, phpLDAPAdmin - Installation and setup, user authentication with OpenLDAP

Yarn In-depth

  • Resource manager - Scheduler Types,
  • Types of Resource Manager, Static Service Pool, FIFO - First In First Out, Fair Scheduler, Capacity scheduler, Dynamic Resource Pool config, High Availability
  • Dynamic Resource Pool, High AvailabilityCheckpoint Introduction, Offline Image View (OIV) and Offline Edits View (OEV), Roll Edits, Save Namespace

Zookeeper, Hive, Oozie

  • Zookeeper - Introduction & Adding as Service
  • Apache Hive - Manage, Setup HA, Beeline, WebHCat, HCatalog, Warehouse dir config
  • Hive High Availability -HA
  • Hive Warehouse Directory and Metastore DB
  • Apache Oozie Introduction, Installation and Setup

Hadoop User Interface (HUE) - Setup,Install, LDAP Integration, Extended ACL and Sentry

  • Hadoop User Experience - Setup, Installation and Introduction
  • OpenLDAP Integration for Authentication
  • HDFS Extended Access Control List (ACL)
  • Sentry Introduction and Role Based Authorization
  • Sentry Installation and Configuration
  • HUE Security Module Configuration and Integration with Sentry for Authorization
  • Hive Table Authorization with Sentry
  • Cloudera Manager - OpenLDAP Integration for authentication

Hadoop Impala and Kerberos

  • Impala Introduction and Concepts
  • Impala Installation and configuration
  • Kerberos - Install, Configure, Verify
  • Kerberos Introduction, Architecture and Authentication
  • Kerberos Prepare Server and Client for Setup and config
  • Configure Cloudera to Kerberize the Cluster
  • Working with Keytab and Service Ticket

Sqoop ,HBase ,Hue ,Flume,Hive

  • Sqoop Introduction, Architecture Installation and Configuration
  • Sqoop Import and Export between HDFS, Hive, HBase and RDBMS
  • NoSQL Database, HBase Service, HUE Configuration to Work with HBase
  • Working with Tables in HUE Editor
  • Flume Installation and Configuration - Single Agent Scenario
  • Flume Multi Agent Configuration - Log collection from multiple nodes of Cluster
  • Use Hive tables
  • Hive database and tables manipulations (e.g. create, query, join, aggregation)
  • Connect Relational Database Management System (RDBMS) to Hadoop
  • Load RDBMS data into Hadoop
  • Executive Extraction, Transformation and Loading (ETL) on Hadoop
  • Use data visualization tool to connect Hadoop / Hive

HDFS Encryption, Spark and kafka introduction

  • HDFS - Encrypted Zone, Keystore configuration Sqoop Import
  • Apache Spark Installation, Configuration and Administration
  • Spark Submit Job - Standalone Cluster
  • Apache Kafka Installation, Configuration and Administration

Hadoop Benchmarking, Memory Management

  • HDFS Terasort, Teragen, Teravalidate
  • TestDFSIO
  • Memory Management - Container, JVM, Role, Node Memory and Performance Management
  • Reports, Charts and Dashboard

Build dashboards with data visualization tool

  • Effective Presentations
  • Tableau

Introduction to cloudera 6 and HDP platforms and other platforms you can use for deployment

Note : All the configuration will be done by free version of CDH and we will Learn HDP and various data frame.