Spark for Developers ( sparkdev | 21 hours )

Prerequisites:

PRE-REQUISITES

familiarity with either Java / Scala / Python language (our labs in Scala and Python)
basic understanding of Linux development environment (command line navigation / editing files using VI or nano)

Overview:

OBJECTIVE:

This course will introduce Apache Spark. The students will learn how  Spark fits  into the Big Data ecosystem, and how to use Spark for data analysis.  The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX.

AUDIENCE :

Developers / Data Analysts

Course Outline:
  1. Scala primer

    • A quick introduction to Scala
    • Labs : Getting know Scala
  2. Spark Basics

    • Background and history
    • Spark and Hadoop
    • Spark concepts and architecture
    • Spark eco system (core, spark sql, mlib, streaming)
    • Labs : Installing and running Spark
  3. First Look at Spark

    • Running Spark in local mode
    • Spark web UI
    • Spark shell
    • Analyzing dataset – part 1
    • Inspecting RDDs
    • Labs: Spark shell exploration
  4. RDDs

    • RDDs concepts
    • Partitions
    • RDD Operations / transformations
    • RDD types
    • Key-Value pair RDDs
    • MapReduce on RDD
    • Caching and persistence
    • Labs : creating & inspecting RDDs;   Caching RDDs
  5. Spark API programming

    • Introduction to Spark API / RDD API
    • Submitting the first program to Spark
    • Debugging / logging
    • Configuration properties
    • Labs : Programming in Spark API, Submitting jobs
  6. Spark SQL

    • SQL support in Spark
    • Dataframes
    • Defining tables and importing datasets
    • Querying data frames using SQL
    • Storage formats : JSON / Parquet
    • Labs : Creating and querying data frames; evaluating data formats
  7. MLlib

    • MLlib intro
    • MLlib algorithms
    • Labs : Writing MLib applications
  8. GraphX

    • GraphX library overview
    • GraphX APIs
    • Labs : Processing graph data using Spark
  9. Spark Streaming

    • Streaming overview
    • Evaluating Streaming platforms
    • Streaming operations
    • Sliding window operations
    • Labs : Writing spark streaming applications
  10. Spark and Hadoop

    • Hadoop Intro (HDFS / YARN)
    • Hadoop + Spark architecture
    • Running Spark on Hadoop YARN
    • Processing HDFS files using Spark
  11. Spark Performance and Tuning

    • Broadcast variables
    • Accumulators
    • Memory management & caching
  12. Spark Operations

    • Deploying Spark in production
    • Sample deployment templates
    • Configurations
    • Monitoring
    • Troubleshooting
Sites Published:

United Arab Emirates - Spark for Developers

Qatar - Spark for Developers

Egypt - Spark for Developers

Saudi Arabia - Spark for Developers

South Africa - Spark for Developers

Brasil - Spark para Developers

Canada - Spark for Developers

中国 - Spark for Developers

香港 - Spark for Developers

澳門 - Spark for Developers

台灣 - Spark for Developers

USA - Spark for Developers

Österreich - Spark for Developers

Schweiz - Spark for Developers

Deutschland - Spark for Developers

Czech Republic - Spark for Developers

Denmark - Spark for Developers

Estonia - Spark for Developers

Finland - Spark for Developers

Greece - Spark for Developers

Magyarország - Spark for Developers

Ireland - Spark for Developers

Luxembourg - Spark for Developers

Latvia - Spark for Developers

España - Spark para Desarrolladores

Italia - Spark for Developers

Lithuania - Spark for Developers

Nederland - Spark for Developers

Norway - Spark for Developers

Portugal - Spark para Developers

România - Spark for Developers

Sverige - Spark for Developers

Türkiye - Spark for Developers

Malta - Spark for Developers

Belgique - Spark for Developers

France - Spark for Developers

日本 - Spark for Developers

Australia - Spark for Developers

Malaysia - Spark for Developers

New Zealand - Spark for Developers

Philippines - Spark for Developers

Singapore - Spark for Developers

Thailand - Spark for Developers

Vietnam - Spark for Developers

India - Spark for Developers

Argentina - Spark para Desarrolladores

Chile - Spark para Desarrolladores

Costa Rica - Spark para Desarrolladores

Ecuador - Spark para Desarrolladores

Guatemala - Spark para Desarrolladores

Colombia - Spark para Desarrolladores

México - Spark para Desarrolladores

Panama - Spark para Desarrolladores

Peru - Spark para Desarrolladores

Uruguay - Spark para Desarrolladores

Venezuela - Spark para Desarrolladores

Polska - Spark for Developers

United Kingdom - Spark for Developers

South Korea - Spark for Developers

Pakistan - Spark for Developers

Sri Lanka - Spark for Developers

Bulgaria - Spark for Developers

Bolivia - Spark para Desarrolladores

Indonesia - Spark for Developers

Kazakhstan - Spark for Developers

Moldova - Spark for Developers

Morocco - Spark for Developers

Tunisia - Spark for Developers

Kuwait - Spark for Developers

Oman - Spark for Developers

Slovakia - Spark for Developers

Kenya - Spark for Developers

Nigeria - Spark for Developers

Botswana - Spark for Developers

Slovenia - Spark for Developers

Croatia - Spark for Developers

Serbia - Spark for Developers

Bhutan - Spark for Developers

Nepal - Spark for Developers