Course Code:
bdhat
Duration:
28 hours
Prerequisites:
This course is recommended for all data analysts, business analysts, programmers, and administrators who have experience with SQL and/or scripting languages. Prior knowledge of Apache Hadoop is not required before this training.
Overview:
Big Data Analyst Training is a practical course recommended for anyone aiming to become an expert Data Scientist. The course focuses on aspects necessary for a modern data analyst working with Big Data technology. During the course, tools that allow access, modification, transformation, and analysis of complex data structures in a Hadoop cluster are presented. The course covers topics within the Hadoop Ecosystem (Pig, Hive, Impala, ELK, and others).
- The functionality of Pig, Hive, Impala, ELK tools, enabling data collection, result storage, and analysis.
- How Pig, Hive, and Impala can enhance the performance of typical and daily analytical tasks.
- Performing real-time interactive analysis of large data sets to obtain valuable and useful business elements, and how to interpret the insights.
- Executing complex queries on very large data volumes.
Course Outline:
Fundamentals Hadoop.
Introduction to Pig.
Basic data analysis using the Pig tool.
Processing complex data with Pig.
Operations on multiple datasets using Pig.
Troubleshooting and optimizing Pig.
Introduction to Hive, Impala, ELK.
Executing queries in Hive, Impala, ELK.
Data management in Hive.
Data storage and performance.
Analyses using Hive and Impala tools.
Working with Impala and ELK tools.
Text and complex data type analysis.
Optimizing Hive, Pig, Impala, ELK.
Interoperability and workflow.
Questions, tasks, certification.
Sites Published: