Course Code: sparkpython
Duration: 21 hours
Prerequisites:
  • General programming skills

Audience

  • Developers
  • IT Professionals
  • Data Scientists
Overview:

Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python.

In this instructor-led, live training, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.

By the end of this training, participants will be able to:

  • Learn how to use Spark with Python to analyze Big Data.
  • Work on exercises that mimic real world cases.
  • Use different tools and techniques for big data analysis using PySpark.

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline:

Introduction

Understanding Big Data

Overview of Spark

Overview of Python

Overview of PySpark

  • Distributing Data Using Resilient Distributed Datasets Framework
  • Distributing Computation Using Spark API Operators

Setting Up Python with Spark

Setting Up PySpark

Using Amazon Web Services (AWS) EC2 Instances for Spark

Setting Up Databricks

Setting Up the AWS EMR Cluster

Learning the Basics of Python Programming

  • Getting Started with Python
  • Using the Jupyter Notebook
  • Using Variables and Simple Data Types
  • Working with Lists
  • Using if Statements
  • Using User Inputs
  • Working with while Loops
  • Implementing Functions
  • Working with Classes
  • Working with Files and Exceptions
  • Working with Projects, Data, and APIs

Learning the Basics of Spark DataFrame

  • Getting Started with Spark DataFrames
  • Implementing Basic Operations with Spark
  • Using Groupby and Aggregate Operations
  • Working with Timestamps and Dates

Working on a Spark DataFrame Project Exercise

Understanding Machine Learning with MLlib

Working with MLlib, Spark, and Python for Machine Learning

Understanding Regressions

  • Learning Linear Regression Theory
  • Implementing a Regression Evaluation Code
  • Working on a Sample Linear Regression Exercise
  • Learning Logistic Regression Theory
  • Implementing a Logistic Regression Code
  • Working on a Sample Logistic Regression Exercise

Understanding Random Forests and Decision Trees

  • Learning Tree Methods Theory
  • Implementing Decision Trees and Random Forest Codes
  • Working on a Sample Random Forest Classification Exercise

Working with K-means Clustering

  • Understanding K-means Clustering Theory
  • Implementing a K-means Clustering Code
  • Working on a Sample Clustering Exercise

Working with Recommender Systems

Implementing Natural Language Processing

  • Understanding Natural Language Processing (NLP)
  • Overview of NLP Tools
  • Working on a Sample NLP Exercise

Streaming with Spark on Python

  • Overview Streaming with Spark
  • Sample Spark Streaming Exercise

Closing Remarks

Sites Published:

United Arab Emirates - Python and Spark for Big Data (PySpark)

Qatar - Python and Spark for Big Data (PySpark)

Egypt - Python and Spark for Big Data (PySpark)

Saudi Arabia - Python and Spark for Big Data (PySpark)

South Africa - Python and Spark for Big Data (PySpark)

Brasil - Python e Spark para Big Data (PySpark)

Canada - Python and Spark for Big Data (PySpark)

中国 - 用Spark和Python通过PySpark处理大数据

香港 - Python and Spark for Big Data (PySpark)

澳門 - Python and Spark for Big Data (PySpark)

台灣 - Python and Spark for Big Data (PySpark)

USA - Python and Spark for Big Data (PySpark)

Österreich - Python and Spark for Big Data (PySpark)

Schweiz - Python and Spark for Big Data (PySpark)

Deutschland - Python and Spark for Big Data (PySpark)

Czech Republic - Python and Spark for Big Data (PySpark)

Denmark - Python and Spark for Big Data (PySpark)

Estonia - Python and Spark for Big Data (PySpark)

Finland - Python and Spark for Big Data (PySpark)

Greece - Python and Spark for Big Data (PySpark)

Magyarország - Python and Spark for Big Data (PySpark)

Ireland - Python and Spark for Big Data (PySpark)

Luxembourg - Python and Spark for Big Data (PySpark)

Latvia - Python and Spark for Big Data (PySpark)

España - Python y Spark para Big Data (PySpark)

Italia - Python and Spark for Big Data (PySpark)

Lithuania - Python and Spark for Big Data (PySpark)

Nederland - Python and Spark for Big Data (PySpark)

Norway - Python and Spark for Big Data (PySpark)

Portugal - Python e Spark para Big Data (PySpark)

România - Python and Spark for Big Data (PySpark)

Sverige - Python and Spark for Big Data (PySpark)

Türkiye - Python and Spark for Big Data (PySpark)

Malta - Python and Spark for Big Data (PySpark)

Belgique - Python and Spark for Big Data (PySpark)

France - Python and Spark for Big Data (PySpark)

日本 - Python and Spark for Big Data (PySpark)

Australia - Python and Spark for Big Data (PySpark)

Malaysia - Python and Spark for Big Data (PySpark)

New Zealand - Python and Spark for Big Data (PySpark)

Philippines - Python and Spark for Big Data (PySpark)

Singapore - Python and Spark for Big Data (PySpark)

Thailand - Python and Spark for Big Data (PySpark)

Vietnam - Python and Spark for Big Data (PySpark)

India - Python and Spark for Big Data (PySpark)

Argentina - Python y Spark para Big Data (PySpark)

Chile - Python y Spark para Big Data (PySpark)

Costa Rica - Python y Spark para Big Data (PySpark)

Ecuador - Python y Spark para Big Data (PySpark)

Guatemala - Python y Spark para Big Data (PySpark)

Colombia - Python y Spark para Big Data (PySpark)

México - Python y Spark para Big Data (PySpark)

Panama - Python y Spark para Big Data (PySpark)

Peru - Python y Spark para Big Data (PySpark)

Uruguay - Python y Spark para Big Data (PySpark)

Venezuela - Python y Spark para Big Data (PySpark)

Polska - Python and Spark for Big Data (PySpark)

United Kingdom - Python and Spark for Big Data (PySpark)

South Korea - Python and Spark for Big Data (PySpark)

Pakistan - Python and Spark for Big Data (PySpark)

Sri Lanka - Python and Spark for Big Data (PySpark)

Bulgaria - Python and Spark for Big Data (PySpark)

Bolivia - Python y Spark para Big Data (PySpark)

Indonesia - Python and Spark for Big Data (PySpark)

Kazakhstan - Python and Spark for Big Data (PySpark)

Moldova - Python and Spark for Big Data (PySpark)

Morocco - Python and Spark for Big Data (PySpark)

Tunisia - Python and Spark for Big Data (PySpark)

Kuwait - Python and Spark for Big Data (PySpark)

Oman - Python and Spark for Big Data (PySpark)

Slovakia - Python and Spark for Big Data (PySpark)

Kenya - Python and Spark for Big Data (PySpark)

Nigeria - Python and Spark for Big Data (PySpark)

Botswana - Python and Spark for Big Data (PySpark)

Slovenia - Python and Spark for Big Data (PySpark)

Croatia - Python and Spark for Big Data (PySpark)

Serbia - Python and Spark for Big Data (PySpark)

Bhutan - Python and Spark for Big Data (PySpark)

Nepal - Python and Spark for Big Data (PySpark)

Uzbekistan - Python and Spark for Big Data (PySpark)