Course Code: sparkpythonhadoop
Duration: 21 hours
Prerequisites:
  • Experience with Spark and Hadoop
  • Python programming experience

Audience

  • Data scientists
  • Developers
Overview:

Python is a scalable, flexible, and widely used programming language for data science and machine learning. Spark is a data processing engine used in querying, analyzing, and transforming big data, while Hadoop is a software library framework for large-scale data storage and processing.

This instructor-led, live training (online or onsite) is aimed at developers who wish to use and integrate Spark, Hadoop, and Python to process, analyze, and transform large and complex data sets.

By the end of this training, participants will be able to:

  • Set up the necessary environment to start processing big data with Spark, Hadoop, and Python.
  • Understand the features, core components, and architecture of Spark and Hadoop.
  • Learn how to integrate Spark, Hadoop, and Python for big data processing.
  • Explore the tools in the Spark ecosystem (Spark MlLib, Spark Streaming, Kafka, Sqoop, Kafka, and Flume).
  • Build collaborative filtering recommendation systems similar to Netflix, YouTube, Amazon, Spotify, and Google.
  • Use Apache Mahout to scale machine learning algorithms.

Format of the Course

  • Interactive lecture and discussion.
  • Lots of exercises and practice.
  • Hands-on implementation in a live-lab environment.

Course Customization Options

  • To request a customized training for this course, please contact us to arrange.
Course Outline:

Introduction

  • Overview of Spark and Hadoop features and architecture
  • Understanding big data
  • Python programming basics

Getting Started

  • Setting up Python, Spark, and Hadoop
  • Understanding data structures in Python
  • Understanding PySpark API
  • Understanding HDFS and MapReduce

Integrating Spark and Hadoop with Python

  • Implementing Spark RDD in Python
  • Processing data using MapReduce
  • Creating distributed datasets in HDFS

Machine Learning with Spark MLlib

Processing Big Data with Spark Streaming

Working with Recommender Systems

Working with Kafka, Sqoop, Kafka, and Flume

Apache Mahout with Spark and Hadoop

Troubleshooting

Summary and Next Steps

Sites Published:

United Arab Emirates - Python, Spark, and Hadoop for Big Data

Qatar - Python, Spark, and Hadoop for Big Data

Egypt - Python, Spark, and Hadoop for Big Data

Saudi Arabia - Python, Spark, and Hadoop for Big Data

South Africa - Python, Spark, and Hadoop for Big Data

Brasil - Python, Spark, and Hadoop for Big Data

Canada - Python, Spark, and Hadoop for Big Data

中国 - Python, Spark, and Hadoop for Big Data

香港 - Python, Spark, and Hadoop for Big Data

澳門 - Python, Spark, and Hadoop for Big Data

台灣 - Python, Spark, and Hadoop for Big Data

USA - Python, Spark, and Hadoop for Big Data

Österreich - Python, Spark, and Hadoop for Big Data

Schweiz - Python, Spark, and Hadoop for Big Data

Deutschland - Python, Spark, and Hadoop for Big Data

Czech Republic - Python, Spark, and Hadoop for Big Data

Denmark - Python, Spark, and Hadoop for Big Data

Estonia - Python, Spark, and Hadoop for Big Data

Finland - Python, Spark, and Hadoop for Big Data

Greece - Python, Spark, and Hadoop for Big Data

Magyarország - Python, Spark, and Hadoop for Big Data

Ireland - Python, Spark, and Hadoop for Big Data

Luxembourg - Python, Spark, and Hadoop for Big Data

Latvia - Python, Spark, and Hadoop for Big Data

España - Python, Spark, and Hadoop for Big Data

Italia - Python, Spark, and Hadoop for Big Data

Lithuania - Python, Spark, and Hadoop for Big Data

Nederland - Python, Spark, and Hadoop for Big Data

Norway - Python, Spark, and Hadoop for Big Data

Portugal - Python, Spark, and Hadoop for Big Data

România - Python, Spark, and Hadoop for Big Data

Sverige - Python, Spark, and Hadoop for Big Data

Türkiye - Python, Spark, and Hadoop for Big Data

Malta - Python, Spark, and Hadoop for Big Data

Belgique - Python, Spark, and Hadoop for Big Data

France - Python, Spark, and Hadoop for Big Data

日本 - Python, Spark, and Hadoop for Big Data

Australia - Python, Spark, and Hadoop for Big Data

Malaysia - Python, Spark, and Hadoop for Big Data

New Zealand - Python, Spark, and Hadoop for Big Data

Philippines - Python, Spark, and Hadoop for Big Data

Singapore - Python, Spark, and Hadoop for Big Data

Thailand - Python, Spark, and Hadoop for Big Data

Vietnam - Python, Spark, and Hadoop for Big Data

India - Python, Spark, and Hadoop for Big Data

Argentina - Python, Spark, and Hadoop for Big Data

Chile - Python, Spark, and Hadoop for Big Data

Costa Rica - Python, Spark, and Hadoop for Big Data

Ecuador - Python, Spark, and Hadoop for Big Data

Guatemala - Python, Spark, and Hadoop for Big Data

Colombia - Python, Spark, and Hadoop for Big Data

México - Python, Spark, and Hadoop for Big Data

Panama - Python, Spark, and Hadoop for Big Data

Peru - Python, Spark, and Hadoop for Big Data

Uruguay - Python, Spark, and Hadoop for Big Data

Venezuela - Python, Spark, and Hadoop for Big Data

Polska - Python, Spark, and Hadoop for Big Data

United Kingdom - Python, Spark, and Hadoop for Big Data

South Korea - Python, Spark, and Hadoop for Big Data

Pakistan - Python, Spark, and Hadoop for Big Data

Sri Lanka - Python, Spark, and Hadoop for Big Data

Bulgaria - Python, Spark, and Hadoop for Big Data

Bolivia - Python, Spark, and Hadoop for Big Data

Indonesia - Python, Spark, and Hadoop for Big Data

Kazakhstan - Python, Spark, and Hadoop for Big Data

Moldova - Python, Spark, and Hadoop for Big Data

Morocco - Python, Spark, and Hadoop for Big Data

Tunisia - Python, Spark, and Hadoop for Big Data

Kuwait - Python, Spark, and Hadoop for Big Data

Oman - Python, Spark, and Hadoop for Big Data

Slovakia - Python, Spark, and Hadoop for Big Data

Kenya - Python, Spark, and Hadoop for Big Data

Nigeria - Python, Spark, and Hadoop for Big Data

Botswana - Python, Spark, and Hadoop for Big Data

Slovenia - Python, Spark, and Hadoop for Big Data

Croatia - Python, Spark, and Hadoop for Big Data

Serbia - Python, Spark, and Hadoop for Big Data

Bhutan - Python, Spark, and Hadoop for Big Data

Nepal - Python, Spark, and Hadoop for Big Data

Uzbekistan - Python, Spark, and Hadoop for Big Data