Course Code: sparkcloud
Duration: 21 hours
Prerequisites:

程式設計技能(最好是 python,scala)

SQL 基礎知識

Overview:

Apache Spark的學習曲線在開始時逐漸增加,需要付出很多努力來獲得第一次回歸。本課程旨在突破第一個艱難的部分。參加本課程後,參與者將了解Apache Spark的基礎知識,他們將明確區分RDD和DataFrame,他們將學習PythonScala API,他們將理解執行者和任務等。同樣遵循最佳實踐,本課程重點關注雲部署,Databricks和AWS。學生還將了解AWS EMR與AWS Glue之間的差異,AWS Glue是AWS最新的Spark服務之一。

聽眾:

數據工程師, DevOps ,數據科學家

Course Outline:

介紹:

  • Apache Spark 在 Hadoop 生態系統中
  • python、scala 的簡短介紹

基礎知識(理論):

  • 建築
  • RDD型
  • 轉型與行動
  • 階段、任務、依賴項

使用 Databricks 環境瞭解基礎知識(動手研討會):

  • 使用 RDD API 的練習
  • 基本操作和轉換函數
  • 貨幣對RDD
  • 加入
  • 緩存策略
  • 使用 DataFrame API 的練習
  • 火花SQL
  • DataFrame:選擇、篩選、分組、排序
  • UDF(使用者定義函數)
  • 查看數據集 API

使用 AWS 環境瞭解部署(動手研討會):

  • AWS Glue 基礎知識
  • 瞭解 AWS EMR 和AWS Glue 之間的差異
  • 兩個環境中的示例作業
  • 瞭解利弊

額外:

  • Apache Airflow 編排簡介
Sites Published:

United Arab Emirates - Apache Spark in the Cloud

Qatar - Apache Spark in the Cloud

Egypt - Apache Spark in the Cloud

Saudi Arabia - Apache Spark in the Cloud

South Africa - Apache Spark in the Cloud

Brasil - Apache Spark in the Cloud

Canada - Apache Spark in the Cloud

中国 - Apache Spark in the Cloud

香港 - Apache Spark in the Cloud

澳門 - Apache Spark in the Cloud

台灣 - Apache Spark in the Cloud

USA - Apache Spark in the Cloud

Österreich - Apache Spark in the Cloud

Schweiz - Apache Spark in the Cloud

Deutschland - Apache Spark in the Cloud

Czech Republic - Apache Spark in the Cloud

Denmark - Apache Spark in the Cloud

Estonia - Apache Spark in the Cloud

Finland - Apache Spark in the Cloud

Greece - Apache Spark in the Cloud

Magyarország - Apache Spark in the Cloud

Ireland - Apache Spark in the Cloud

Luxembourg - Apache Spark in the Cloud

Latvia - Apache Spark in the Cloud

España - Apache Spark in the Cloud

Italia - Apache Spark in the Cloud

Lithuania - Apache Spark in the Cloud

Nederland - Apache Spark in the Cloud

Norway - Apache Spark in the Cloud

Portugal - Apache Spark in the Cloud

România - Apache Spark in the Cloud

Sverige - Apache Spark in the Cloud

Türkiye - Apache Spark in the Cloud

Malta - Apache Spark in the Cloud

Belgique - Apache Spark in the Cloud

France - Apache Spark in the Cloud

日本 - Apache Spark in the Cloud

Australia - Apache Spark in the Cloud

Malaysia - Apache Spark in the Cloud

New Zealand - Apache Spark in the Cloud

Philippines - Apache Spark in the Cloud

Singapore - Apache Spark in the Cloud

Thailand - Apache Spark in the Cloud

Vietnam - Apache Spark in the Cloud

India - Apache Spark in the Cloud

Argentina - Apache Spark in the Cloud

Chile - Apache Spark in the Cloud

Costa Rica - Apache Spark in the Cloud

Ecuador - Apache Spark in the Cloud

Guatemala - Apache Spark in the Cloud

Colombia - Apache Spark in the Cloud

México - Apache Spark in the Cloud

Panama - Apache Spark in the Cloud

Peru - Apache Spark in the Cloud

Uruguay - Apache Spark in the Cloud

Venezuela - Apache Spark in the Cloud

Polska - Apache Spark in the Cloud

United Kingdom - Apache Spark in the Cloud

South Korea - Apache Spark in the Cloud

Pakistan - Apache Spark in the Cloud

Sri Lanka - Apache Spark in the Cloud

Bulgaria - Apache Spark in the Cloud

Bolivia - Apache Spark in the Cloud

Indonesia - Apache Spark in the Cloud

Kazakhstan - Apache Spark in the Cloud

Moldova - Apache Spark in the Cloud

Morocco - Apache Spark in the Cloud

Tunisia - Apache Spark in the Cloud

Kuwait - Apache Spark in the Cloud

Oman - Apache Spark in the Cloud

Slovakia - Apache Spark in the Cloud

Kenya - Apache Spark in the Cloud

Nigeria - Apache Spark in the Cloud

Botswana - Apache Spark in the Cloud

Slovenia - Apache Spark in the Cloud

Croatia - Apache Spark in the Cloud

Serbia - Apache Spark in the Cloud

Bhutan - Apache Spark in the Cloud

Nepal - Apache Spark in the Cloud

Uzbekistan - Apache Spark in the Cloud