Programing skills (preferably python, scala)
SQL basics
Apache Spark's learning curve is slowly increasing at the begining, it needs a lot of effort to get the first return. This course aims to jump through the first tough part. After taking this course the participants will understand the basics of Apache Spark , they will clearly differentiate RDD from DataFrame, they will learn Python and Scala API, they will understand executors and tasks, etc. Also following the best practices, this course strongly focuses on cloud deployment, Databricks and AWS. The students will also understand the differences between AWS EMR and AWS Glue, one of the lastest Spark service of AWS.
AUDIENCE:
Data Engineer, DevOps, Data Scientist
Introduction:
- Apache Spark in Hadoop Ecosystem
- Short intro for python, scala
Basics (theory):
- Architecture
- RDD
- Transformation and Actions
- Stage, Task, Dependencies
Using Databricks environment understand the basics (hands-on workshop):
- Exercises using RDD API
- Basic action and transformation functions
- PairRDD
- Join
- Caching strategies
- Exercises using DataFrame API
- SparkSQL
- DataFrame: select, filter, group, sort
- UDF (User Defined Function)
- Looking into DataSet API
- Streaming
Using AWS environment understand the deployment (hands-on workshop):
- Basics of AWS Glue
- Understand differencies between AWS EMR and AWS Glue
- Example jobs on both environment
- Understand pros and cons
Extra:
- Introduction to Apache Airflow orchestration
United Arab Emirates - Apache Spark in the Cloud
Qatar - Apache Spark in the Cloud
Egypt - Apache Spark in the Cloud
Saudi Arabia - Apache Spark in the Cloud
South Africa - Apache Spark in the Cloud
Brasil - Apache Spark in the Cloud
Canada - Apache Spark in the Cloud
中国 - Apache Spark in the Cloud
香港 - Apache Spark in the Cloud
澳門 - Apache Spark in the Cloud
台灣 - Apache Spark in the Cloud
USA - Apache Spark in the Cloud
Österreich - Apache Spark in the Cloud
Schweiz - Apache Spark in the Cloud
Deutschland - Apache Spark in the Cloud
Czech Republic - Apache Spark in the Cloud
Denmark - Apache Spark in the Cloud
Estonia - Apache Spark in the Cloud
Finland - Apache Spark in the Cloud
Greece - Apache Spark in the Cloud
Magyarország - Apache Spark in the Cloud
Ireland - Apache Spark in the Cloud
Luxembourg - Apache Spark in the Cloud
Latvia - Apache Spark in the Cloud
España - Apache Spark in the Cloud
Italia - Apache Spark in the Cloud
Lithuania - Apache Spark in the Cloud
Nederland - Apache Spark in the Cloud
Norway - Apache Spark in the Cloud
Portugal - Apache Spark in the Cloud
România - Apache Spark in the Cloud
Sverige - Apache Spark in the Cloud
Türkiye - Apache Spark in the Cloud
Malta - Apache Spark in the Cloud
Belgique - Apache Spark in the Cloud
France - Apache Spark in the Cloud
日本 - Apache Spark in the Cloud
Australia - Apache Spark in the Cloud
Malaysia - Apache Spark in the Cloud
New Zealand - Apache Spark in the Cloud
Philippines - Apache Spark in the Cloud
Singapore - Apache Spark in the Cloud
Thailand - Apache Spark in the Cloud
Vietnam - Apache Spark in the Cloud
India - Apache Spark in the Cloud
Argentina - Apache Spark in the Cloud
Chile - Apache Spark in the Cloud
Costa Rica - Apache Spark in the Cloud
Ecuador - Apache Spark in the Cloud
Guatemala - Apache Spark in the Cloud
Colombia - Apache Spark in the Cloud
México - Apache Spark in the Cloud
Panama - Apache Spark in the Cloud
Peru - Apache Spark in the Cloud
Uruguay - Apache Spark in the Cloud
Venezuela - Apache Spark in the Cloud
Polska - Apache Spark in the Cloud
United Kingdom - Apache Spark in the Cloud
South Korea - Apache Spark in the Cloud
Pakistan - Apache Spark in the Cloud
Sri Lanka - Apache Spark in the Cloud
Bulgaria - Apache Spark in the Cloud
Bolivia - Apache Spark in the Cloud
Indonesia - Apache Spark in the Cloud
Kazakhstan - Apache Spark in the Cloud
Moldova - Apache Spark in the Cloud
Morocco - Apache Spark in the Cloud
Tunisia - Apache Spark in the Cloud
Kuwait - Apache Spark in the Cloud
Oman - Apache Spark in the Cloud
Slovakia - Apache Spark in the Cloud
Kenya - Apache Spark in the Cloud
Nigeria - Apache Spark in the Cloud
Botswana - Apache Spark in the Cloud
Slovenia - Apache Spark in the Cloud
Croatia - Apache Spark in the Cloud
Serbia - Apache Spark in the Cloud
Bhutan - Apache Spark in the Cloud