Uczestnicy nie muszą posiadać żadnych konkretnych umiejętności.
Brak
1. Big Data & HDFS in detail - Architecture - HDFS (data storage & use) - Data Blocks 2. YARN & Resource Management - Architecture - Working with YARN 3. Sqoop (Sqoop 1 only) - Basic imports & exports - Result limitation - Sqoop performance optimisation 4. Hive on Spark - Basics - Spark processing details - Spark stages & tasks - Metastore - DDL & DML - Data Types - Spark Shell - Optimization (Partitioning, Bucketing) 5. Spark - Use cases - Spark SQL vs Hive on Spark - RDD in detail, partitioning, operations - RDD Lineage - SparkContext - Chaining, caching - Other optimizations 6. Using Scala for Spark apps - Building & running Spark apps in Scala - Settings/configuration - Logging