Course Code: sparkcloud
Duration: 21 hours
Prerequisites:

Programing skills (preferably python, scala)

SQL basics

Overview:

Apache Spark's learning curve is slowly increasing at the begining, it needs a lot of effort to get the first return. This course aims to jump through the first tough part. After taking this course the participants will understand the basics of Apache Spark , they will clearly differentiate RDD from DataFrame, they will learn Python and Scala API, they will understand executors and tasks, etc.  Also following the best practices, this course strongly focuses on cloud deployment, Databricks and AWS. The students will also understand the differences between AWS EMR and AWS Glue, one of the lastest Spark service of AWS.  

AUDIENCE:

Data Engineer, DevOps, Data Scientist

Course Outline:

Introduction:

  • Apache Spark in Hadoop Ecosystem
  • Short intro for python, scala

Basics (theory):

  • Architecture
  • RDD
  • Transformation and Actions
  • Stage, Task, Dependencies

Using Databricks environment understand the basics (hands-on workshop):

  • Exercises using RDD API
  • Basic action and transformation functions
  • PairRDD
  • Join
  • Caching strategies
  • Exercises using DataFrame API
  • SparkSQL
  • DataFrame: select, filter, group, sort
  • UDF (User Defined Function)
  • Looking into DataSet API
  • Streaming

Using AWS environment understand the deployment (hands-on workshop):

  • Basics of AWS Glue
  • Understand differencies between AWS EMR and AWS Glue
  • Example jobs on both environment
  • Understand pros and cons

Extra:

  • Introduction to Apache Airflow orchestration
Sites Published:

United Arab Emirates - Apache Spark in the Cloud

Qatar - Apache Spark in the Cloud

Egypt - Apache Spark in the Cloud

Saudi Arabia - Apache Spark in the Cloud

South Africa - Apache Spark in the Cloud

Brasil - Apache Spark in the Cloud

Canada - Apache Spark in the Cloud

中国 - Apache Spark in the Cloud

香港 - Apache Spark in the Cloud

澳門 - Apache Spark in the Cloud

台灣 - Apache Spark in the Cloud

USA - Apache Spark in the Cloud

Österreich - Apache Spark in the Cloud

Schweiz - Apache Spark in the Cloud

Deutschland - Apache Spark in the Cloud

Czech Republic - Apache Spark in the Cloud

Denmark - Apache Spark in the Cloud

Estonia - Apache Spark in the Cloud

Finland - Apache Spark in the Cloud

Greece - Apache Spark in the Cloud

Magyarország - Apache Spark in the Cloud

Ireland - Apache Spark in the Cloud

Luxembourg - Apache Spark in the Cloud

Latvia - Apache Spark in the Cloud

España - Apache Spark in the Cloud

Italia - Apache Spark in the Cloud

Lithuania - Apache Spark in the Cloud

Nederland - Apache Spark in the Cloud

Norway - Apache Spark in the Cloud

Portugal - Apache Spark in the Cloud

România - Apache Spark in the Cloud

Sverige - Apache Spark in the Cloud

Türkiye - Apache Spark in the Cloud

Malta - Apache Spark in the Cloud

Belgique - Apache Spark in the Cloud

France - Apache Spark in the Cloud

日本 - Apache Spark in the Cloud

Australia - Apache Spark in the Cloud

Malaysia - Apache Spark in the Cloud

New Zealand - Apache Spark in the Cloud

Philippines - Apache Spark in the Cloud

Singapore - Apache Spark in the Cloud

Thailand - Apache Spark in the Cloud

Vietnam - Apache Spark in the Cloud

India - Apache Spark in the Cloud

Argentina - Apache Spark in the Cloud

Chile - Apache Spark in the Cloud

Costa Rica - Apache Spark in the Cloud

Ecuador - Apache Spark in the Cloud

Guatemala - Apache Spark in the Cloud

Colombia - Apache Spark in the Cloud

México - Apache Spark in the Cloud

Panama - Apache Spark in the Cloud

Peru - Apache Spark in the Cloud

Uruguay - Apache Spark in the Cloud

Venezuela - Apache Spark in the Cloud

Polska - Apache Spark in the Cloud

United Kingdom - Apache Spark in the Cloud

South Korea - Apache Spark in the Cloud

Pakistan - Apache Spark in the Cloud

Sri Lanka - Apache Spark in the Cloud

Bulgaria - Apache Spark in the Cloud

Bolivia - Apache Spark in the Cloud

Indonesia - Apache Spark in the Cloud

Kazakhstan - Apache Spark in the Cloud

Moldova - Apache Spark in the Cloud

Morocco - Apache Spark in the Cloud

Tunisia - Apache Spark in the Cloud

Kuwait - Apache Spark in the Cloud

Oman - Apache Spark in the Cloud

Slovakia - Apache Spark in the Cloud

Kenya - Apache Spark in the Cloud

Nigeria - Apache Spark in the Cloud

Botswana - Apache Spark in the Cloud

Slovenia - Apache Spark in the Cloud

Croatia - Apache Spark in the Cloud

Serbia - Apache Spark in the Cloud

Bhutan - Apache Spark in the Cloud

Nepal - Apache Spark in the Cloud

Uzbekistan - Apache Spark in the Cloud