Course Code: sparkpython
Duration: 21 hours
Prerequisites:
  • 一般的编程技能
Overview:

Spark是一个用于查询、分析和转换大数据的数据处理引擎。Python是一种高级编程语言,因其清晰的语法和代码可读性而闻名。PySpark允许用户将Spark与Python连接。

在这一由讲师引导的现场培训中,学员将通过实践练习学习如何使用Python和Spark一起分析大数据。

在本次培训结束后,学员将能够:

  • 了解如何使用Spark和Python一起分析大数据
  • 开展模拟真实世界环境的练习
  • 用不同的工具和技术通过PySpark进行大数据分析

课程形式

  • 部分讲座、部分讨论、练习和大量实操
Course Outline:

介绍

了解大数据

Spark概述

Python概述

PySpark概述

  • 使用弹性分布式数据集框架分发数据
  • 使用Spark API运算符分布计算

设置Python和Spark

设置PySpark

针对Spark使用Amazon Web Services(AWS)EC2实例

设置数据块

设置AWS EMR集群

学习Python编程的基础知识

  • Python入门
  • 使用Jupyter Notebook
  • 使用变量和简单的数据类型
  • 使用列表
  • 使用 if 语句
  • 使用用户输入
  • 处理while循环
  • 实现函数
  • 使用类
  • 处理文件和异常
  • 处理项目、数据、API

学习Spark DataFrame的基础知识

  • Spark DataFrames入门
  • 用Spark实现基本操作
  • 使用Groupby和聚合操作
  • 使用时间戳和日期

进行Spark DataFrame项目练习

了解用MLlib进行机器学习

使用MLlib、Spark和Python进行机器学习

了解回归

  • 学习线性回归理论
  • 实现回归评估代码
  • 进行线性回归示例练习
  • 学习Logistic回归理论
  • 实现一个Logistic回归代码
  • 进行Logistic回归示例练习

了解随机森林(Random Forests)和决策树(Decision Trees)

  • 学习树方法论(Tree Methods Theory)
  • 实现决策树和随机森林代码
  • 进行随机森林分类示例练习

使用K均值聚类

  • 了解K均值聚类理论
  • 实现K均值聚类代码
  • 进行群集示例练习

使用推荐系统

实现自然语言处理

  • 理解自然语言处理(NLP)
  • NLP工具概述
  • 进行NLP示例练习

在Python中用Spark进行流式处理

  • 用Spark进行流式处理概述
  • Spark流数据处理(Spark Streaming)示例练习

结束语

Sites Published:

United Arab Emirates - Python and Spark for Big Data (PySpark)

Qatar - Python and Spark for Big Data (PySpark)

Egypt - Python and Spark for Big Data (PySpark)

Saudi Arabia - Python and Spark for Big Data (PySpark)

South Africa - Python and Spark for Big Data (PySpark)

Brasil - Python e Spark para Big Data (PySpark)

Canada - Python and Spark for Big Data (PySpark)

中国 - 用Spark和Python通过PySpark处理大数据

香港 - Python and Spark for Big Data (PySpark)

澳門 - Python and Spark for Big Data (PySpark)

台灣 - Python and Spark for Big Data (PySpark)

USA - Python and Spark for Big Data (PySpark)

Österreich - Python and Spark for Big Data (PySpark)

Schweiz - Python and Spark for Big Data (PySpark)

Deutschland - Python and Spark for Big Data (PySpark)

Czech Republic - Python and Spark for Big Data (PySpark)

Denmark - Python and Spark for Big Data (PySpark)

Estonia - Python and Spark for Big Data (PySpark)

Finland - Python and Spark for Big Data (PySpark)

Greece - Python and Spark for Big Data (PySpark)

Magyarország - Python and Spark for Big Data (PySpark)

Ireland - Python and Spark for Big Data (PySpark)

Luxembourg - Python and Spark for Big Data (PySpark)

Latvia - Python and Spark for Big Data (PySpark)

España - Python y Spark para Big Data (PySpark)

Italia - Python and Spark for Big Data (PySpark)

Lithuania - Python and Spark for Big Data (PySpark)

Nederland - Python and Spark for Big Data (PySpark)

Norway - Python and Spark for Big Data (PySpark)

Portugal - Python e Spark para Big Data (PySpark)

România - Python and Spark for Big Data (PySpark)

Sverige - Python and Spark for Big Data (PySpark)

Türkiye - Python and Spark for Big Data (PySpark)

Malta - Python and Spark for Big Data (PySpark)

Belgique - Python and Spark for Big Data (PySpark)

France - Python and Spark for Big Data (PySpark)

日本 - Python and Spark for Big Data (PySpark)

Australia - Python and Spark for Big Data (PySpark)

Malaysia - Python and Spark for Big Data (PySpark)

New Zealand - Python and Spark for Big Data (PySpark)

Philippines - Python and Spark for Big Data (PySpark)

Singapore - Python and Spark for Big Data (PySpark)

Thailand - Python and Spark for Big Data (PySpark)

Vietnam - Python and Spark for Big Data (PySpark)

India - Python and Spark for Big Data (PySpark)

Argentina - Python y Spark para Big Data (PySpark)

Chile - Python y Spark para Big Data (PySpark)

Costa Rica - Python y Spark para Big Data (PySpark)

Ecuador - Python y Spark para Big Data (PySpark)

Guatemala - Python y Spark para Big Data (PySpark)

Colombia - Python y Spark para Big Data (PySpark)

México - Python y Spark para Big Data (PySpark)

Panama - Python y Spark para Big Data (PySpark)

Peru - Python y Spark para Big Data (PySpark)

Uruguay - Python y Spark para Big Data (PySpark)

Venezuela - Python y Spark para Big Data (PySpark)

Polska - Python and Spark for Big Data (PySpark)

United Kingdom - Python and Spark for Big Data (PySpark)

South Korea - Python and Spark for Big Data (PySpark)

Pakistan - Python and Spark for Big Data (PySpark)

Sri Lanka - Python and Spark for Big Data (PySpark)

Bulgaria - Python and Spark for Big Data (PySpark)

Bolivia - Python y Spark para Big Data (PySpark)

Indonesia - Python and Spark for Big Data (PySpark)

Kazakhstan - Python and Spark for Big Data (PySpark)

Moldova - Python and Spark for Big Data (PySpark)

Morocco - Python and Spark for Big Data (PySpark)

Tunisia - Python and Spark for Big Data (PySpark)

Kuwait - Python and Spark for Big Data (PySpark)

Oman - Python and Spark for Big Data (PySpark)

Slovakia - Python and Spark for Big Data (PySpark)

Kenya - Python and Spark for Big Data (PySpark)

Nigeria - Python and Spark for Big Data (PySpark)

Botswana - Python and Spark for Big Data (PySpark)

Slovenia - Python and Spark for Big Data (PySpark)

Croatia - Python and Spark for Big Data (PySpark)

Serbia - Python and Spark for Big Data (PySpark)

Bhutan - Python and Spark for Big Data (PySpark)

Nepal - Python and Spark for Big Data (PySpark)

Uzbekistan - Python and Spark for Big Data (PySpark)