Course Code: bsphadp
Duration: 7 hours
Overview:
a:0:{}
Course Outline:

Objectives

  • Describe the use case for Hadoop
  • Identify Hadoop Ecosystem architectural categories
  • Data Management
  • Data Access
  • Data Governance and Integration
  • Security
  • Operations
  • Detail the HDFS architecture
  • Describe data ingestion options and frameworks for batch and real-time streaming
  • Explain the fundamentals of parallel processing
  • See popular data transformation and processing engines in action
  • Apache Hive
  • Apache Pig
  • Apache Spark
  • Detail the architecture and features of YARN
  • Describe how to secure Hadoop

Demonstrations

  • Operational  overview  with Ambari
  • Loading  data  into HDFS
  • Data  manipulation  with Hive
  • Risk  Analysis  with Pig
  • Risk Analysis with Spark and   Zeppelin
  • Securing  Hive  with Ranger