Course Code: sparkcloud
Duration: 21 hours
Prerequisites:

编程技能(最好是 python,scala)

SQL 基础知识

Overview:

Apache Spark的学习曲线在开始时逐渐增加,需要付出很多努力来获得第一次回归。本课程旨在突破第一个艰难的部分。参加本课程后,参与者将了解Apache Spark的基础知识,他们将明确区分RDD和DataFrame,他们将学习PythonScala API,他们将理解执行者和任务等。同样遵循最佳实践,本课程重点关注云部署,Databricks和AWS。学生还将了解AWS EMR与AWS最新Spark服务之一AWS Glue之间的差异。

听众:

数据工程师, DevOps ,数据科学家

Course Outline:

介绍:

  • Apache Spark 在 Hadoop 生态系统中
  • python、scala 的简短介绍

基础知识(理论):

  • 建筑
  • RDD型
  • 转型与行动
  • 阶段、任务、依赖项

使用 Databricks 环境了解基础知识(动手研讨会):

  • 使用 RDD API 的练习
  • 基本操作和转换函数
  • 货币对RDD
  • 加入
  • 缓存策略
  • 使用 DataFrame API 的练习
  • 火花SQL
  • DataFrame:选择、筛选、分组、排序
  • UDF(用户定义函数)
  • 查看数据集 API

使用 AWS 环境了解部署(动手研讨会):

  • AWS Glue 基础知识
  • 了解 AWS EMR 和AWS Glue 之间的差异
  • 两个环境中的示例作业
  • 了解利弊

额外:

  • Apache Airflow 编排简介
Sites Published:

United Arab Emirates - Apache Spark in the Cloud

Qatar - Apache Spark in the Cloud

Egypt - Apache Spark in the Cloud

Saudi Arabia - Apache Spark in the Cloud

South Africa - Apache Spark in the Cloud

Brasil - Apache Spark in the Cloud

Canada - Apache Spark in the Cloud

中国 - Apache Spark in the Cloud

香港 - Apache Spark in the Cloud

澳門 - Apache Spark in the Cloud

台灣 - Apache Spark in the Cloud

USA - Apache Spark in the Cloud

Österreich - Apache Spark in the Cloud

Schweiz - Apache Spark in the Cloud

Deutschland - Apache Spark in the Cloud

Czech Republic - Apache Spark in the Cloud

Denmark - Apache Spark in the Cloud

Estonia - Apache Spark in the Cloud

Finland - Apache Spark in the Cloud

Greece - Apache Spark in the Cloud

Magyarország - Apache Spark in the Cloud

Ireland - Apache Spark in the Cloud

Luxembourg - Apache Spark in the Cloud

Latvia - Apache Spark in the Cloud

España - Apache Spark in the Cloud

Italia - Apache Spark in the Cloud

Lithuania - Apache Spark in the Cloud

Nederland - Apache Spark in the Cloud

Norway - Apache Spark in the Cloud

Portugal - Apache Spark in the Cloud

România - Apache Spark in the Cloud

Sverige - Apache Spark in the Cloud

Türkiye - Apache Spark in the Cloud

Malta - Apache Spark in the Cloud

Belgique - Apache Spark in the Cloud

France - Apache Spark in the Cloud

日本 - Apache Spark in the Cloud

Australia - Apache Spark in the Cloud

Malaysia - Apache Spark in the Cloud

New Zealand - Apache Spark in the Cloud

Philippines - Apache Spark in the Cloud

Singapore - Apache Spark in the Cloud

Thailand - Apache Spark in the Cloud

Vietnam - Apache Spark in the Cloud

India - Apache Spark in the Cloud

Argentina - Apache Spark in the Cloud

Chile - Apache Spark in the Cloud

Costa Rica - Apache Spark in the Cloud

Ecuador - Apache Spark in the Cloud

Guatemala - Apache Spark in the Cloud

Colombia - Apache Spark in the Cloud

México - Apache Spark in the Cloud

Panama - Apache Spark in the Cloud

Peru - Apache Spark in the Cloud

Uruguay - Apache Spark in the Cloud

Venezuela - Apache Spark in the Cloud

Polska - Apache Spark in the Cloud

United Kingdom - Apache Spark in the Cloud

South Korea - Apache Spark in the Cloud

Pakistan - Apache Spark in the Cloud

Sri Lanka - Apache Spark in the Cloud

Bulgaria - Apache Spark in the Cloud

Bolivia - Apache Spark in the Cloud

Indonesia - Apache Spark in the Cloud

Kazakhstan - Apache Spark in the Cloud

Moldova - Apache Spark in the Cloud

Morocco - Apache Spark in the Cloud

Tunisia - Apache Spark in the Cloud

Kuwait - Apache Spark in the Cloud

Oman - Apache Spark in the Cloud

Slovakia - Apache Spark in the Cloud

Kenya - Apache Spark in the Cloud

Nigeria - Apache Spark in the Cloud

Botswana - Apache Spark in the Cloud

Slovenia - Apache Spark in the Cloud

Croatia - Apache Spark in the Cloud

Serbia - Apache Spark in the Cloud

Bhutan - Apache Spark in the Cloud

Nepal - Apache Spark in the Cloud

Uzbekistan - Apache Spark in the Cloud