Course Code:
bsphadp
Duration:
7 hours
Overview:
a:0:{}
Course Outline:
Objectives
- Describe the use case for Hadoop
- Identify Hadoop Ecosystem architectural categories
- Data Management
- Data Access
- Data Governance and Integration
- Security
- Operations
- Detail the HDFS architecture
- Describe data ingestion options and frameworks for batch and real-time streaming
- Explain the fundamentals of parallel processing
- See popular data transformation and processing engines in action
- Apache Hive
- Apache Pig
- Apache Spark
- Detail the architecture and features of YARN
- Describe how to secure Hadoop
Demonstrations
- Operational overview with Ambari
- Loading data into HDFS
- Data manipulation with Hive
- Risk Analysis with Pig
- Risk Analysis with Spark and Zeppelin
- Securing Hive with Ranger