- Participants are expected to have a basic understanding of OOPS Concept
- Any previous experience with development or administration will be a plus
- Experience with AWS or any other cloud based environment will be a plus
Audience
- Database administrator
What is Hadoop and basic introduction?
- Introduction
- Getting Started
- Use cases
- Machine Learning and future
ISMAC, Lambda Architecture, AWS
- Introduction
- Lambda Architecture Overview and Details
- Use cases
- Introduction To Amazon Web Services (AWS)
- Signup and Billing (Important - Pricing related)
- Zones and Regions
- Launch EC2 Instance
- Simple Storage Service (S3)
- Login to EC2 Instance using Putty
- EC2 AMI (Amazon Machine Image)
- EC2 Spot Instances
- Relational Data Service (RDS)
Pandas, Python, Github and stats using jupyter
- Web Scraping. Regular Expressions. Data Reshaping. Data Cleanup. Pandas.
- Exploratory Data Analysis
- Scraping, Pandas, Python, and viz
- Pandas, SQL, and the Grammar of Data
- Statistical Models
- Probability, Distributions, and Frequentist Statistics
- Bias and Regression
- Regression, Logistic Regression: in sklearn and statsmodels
- Classification. kNN. Cross Validation. Dimensionality Reduction. PCA. MDS.
HDFS Basics and Cloudera
- Introduction to HDFS, YARN and Mysql database setup and installation
- Prepare AWS AMI for Cloudera Installation
- Prepare AWS AMI for Cloudera Installation
- Cloudera Installation Phases and Paths
- Cloudera Manager Introduction and Overview
- Parcels and Repository setup with Apachehttpd
- Cloudera Installation Path B with local repository – AMI and prepare
- Add Cluster, Add Service and Delete Cluster life cycle
Installing and using Hortonworks Framework
- Installing a single node cluster of Hortonworks framework
- Configure a local HDP repository
- Install HDP using the Ambari install wizard
- Decommission a node
- Add a new node to an existing cluster
- Add an HDP service to a cluster using Ambari
- Change the configuration of a service using Ambari
- Configure the location of log files for services
HDFS Basics shell commands and HDFS Manage - Balancer, Maintenance, Quota Management, Canary Test
- Hdfs shell commands
- HDFS Trash and HA concept Setup, Configure, Test, Verify, Remove
- HDFS Balancer, Maintenance Mode, Quota Management, Canary Test, Rack Awareness
HDFS Checkpoint. Understand, Manage, Work with Edits, FSImage, Roll Edits,
- HDFS Edits FSImage Introduction
- Checkpoint Introduction, Offline Image View (OIV) and Offline Edits View (OEV), Roll Edits, Save Namespace
Snapshot, WebHDFS, Federation, Recovery, httpFS, Edge Node
- HDFS Snapshot, Snapshot Policy, Edge Node, WebHDFS, httpFS
- FSCK Utility, Recovery, Federation, Home Directory
Cloudera Manage - Commission, Decommission and LDAP - Install, Configure OS, phpLDAPAdmin Client
- Cluster Commission and Decommission, Client Configuration
- Cluster Host Template, phpLDAPAdmin - Installation and setup, user authentication with OpenLDAP
Yarn In-depth
- Resource manager - Scheduler Types,
- Types of Resource Manager, Static Service Pool, FIFO - First In First Out, Fair Scheduler, Capacity scheduler, Dynamic Resource Pool config, High Availability
- Dynamic Resource Pool, High AvailabilityCheckpoint Introduction, Offline Image View (OIV) and Offline Edits View (OEV), Roll Edits, Save Namespace
Zookeeper, Hive, Oozie
- Zookeeper - Introduction & Adding as Service
- Apache Hive - Manage, Setup HA, Beeline, WebHCat, HCatalog, Warehouse dir config
- Hive High Availability -HA
- Hive Warehouse Directory and Metastore DB
- Apache Oozie Introduction, Installation and Setup
Hadoop User Interface (HUE) - Setup,Install, LDAP Integration, Extended ACL and Sentry
- Hadoop User Experience - Setup, Installation and Introduction
- OpenLDAP Integration for Authentication
- HDFS Extended Access Control List (ACL)
- Sentry Introduction and Role Based Authorization
- Sentry Installation and Configuration
- HUE Security Module Configuration and Integration with Sentry for Authorization
- Hive Table Authorization with Sentry
- Cloudera Manager - OpenLDAP Integration for authentication
Hadoop Impala and Kerberos
- Impala Introduction and Concepts
- Impala Installation and configuration
- Kerberos - Install, Configure, Verify
- Kerberos Introduction, Architecture and Authentication
- Kerberos Prepare Server and Client for Setup and config
- Configure Cloudera to Kerberize the Cluster
- Working with Keytab and Service Ticket
Sqoop ,HBase ,Hue ,Flume,Hive
- Sqoop Introduction, Architecture Installation and Configuration
- Sqoop Import and Export between HDFS, Hive, HBase and RDBMS
- NoSQL Database, HBase Service, HUE Configuration to Work with HBase
- Working with Tables in HUE Editor
- Flume Installation and Configuration - Single Agent Scenario
- Flume Multi Agent Configuration - Log collection from multiple nodes of Cluster
- Use Hive tables
- Hive database and tables manipulations (e.g. create, query, join, aggregation)
- Connect Relational Database Management System (RDBMS) to Hadoop
- Load RDBMS data into Hadoop
- Executive Extraction, Transformation and Loading (ETL) on Hadoop
- Use data visualization tool to connect Hadoop / Hive
HDFS Encryption, Spark and kafka introduction
- HDFS - Encrypted Zone, Keystore configuration Sqoop Import
- Apache Spark Installation, Configuration and Administration
- Spark Submit Job - Standalone Cluster
- Apache Kafka Installation, Configuration and Administration
Hadoop Benchmarking, Memory Management
- HDFS Terasort, Teragen, Teravalidate
- TestDFSIO
- Memory Management - Container, JVM, Role, Node Memory and Performance Management
- Reports, Charts and Dashboard
Build dashboards with data visualization tool
- Effective Presentations
- Tableau
Introduction to cloudera 6 and HDP platforms and other platforms you can use for deployment
Note : All the configuration will be done by free version of CDH and we will Learn HDP and various data frame.