- Experience with Spark and Hadoop
- Python programming experience
Audience
- Data scientists
- Developers
Python is a scalable, flexible, and widely used programming language for data science and machine learning. Spark is a data processing engine used in querying, analyzing, and transforming big data, while Hadoop is a software library framework for large-scale data storage and processing.
This instructor-led, live training (online or onsite) is aimed at developers who wish to use and integrate Spark, Hadoop, and Python to process, analyze, and transform large and complex data sets.
By the end of this training, participants will be able to:
- Set up the necessary environment to start processing big data with Spark, Hadoop, and Python.
- Understand the features, core components, and architecture of Spark and Hadoop.
- Learn how to integrate Spark, Hadoop, and Python for big data processing.
- Explore the tools in the Spark ecosystem (Spark MlLib, Spark Streaming, Kafka, Sqoop, Kafka, and Flume).
- Build collaborative filtering recommendation systems similar to Netflix, YouTube, Amazon, Spotify, and Google.
- Use Apache Mahout to scale machine learning algorithms.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Introduction
- Overview of Spark and Hadoop features and architecture
- Understanding big data
- Python programming basics
Getting Started
- Setting up Python, Spark, and Hadoop
- Understanding data structures in Python
- Understanding PySpark API
- Understanding HDFS and MapReduce
Integrating Spark and Hadoop with Python
- Implementing Spark RDD in Python
- Processing data using MapReduce
- Creating distributed datasets in HDFS
Machine Learning with Spark MLlib
Processing Big Data with Spark Streaming
Working with Recommender Systems
Working with Kafka, Sqoop, Kafka, and Flume
Apache Mahout with Spark and Hadoop
Troubleshooting
Summary and Next Steps
United Arab Emirates - Python, Spark, and Hadoop for Big Data
Qatar - Python, Spark, and Hadoop for Big Data
Egypt - Python, Spark, and Hadoop for Big Data
Saudi Arabia - Python, Spark, and Hadoop for Big Data
South Africa - Python, Spark, and Hadoop for Big Data
Brasil - Python, Spark, and Hadoop for Big Data
Canada - Python, Spark, and Hadoop for Big Data
中国 - Python, Spark, and Hadoop for Big Data
香港 - Python, Spark, and Hadoop for Big Data
澳門 - Python, Spark, and Hadoop for Big Data
台灣 - Python, Spark, and Hadoop for Big Data
USA - Python, Spark, and Hadoop for Big Data
Österreich - Python, Spark, and Hadoop for Big Data
Schweiz - Python, Spark, and Hadoop for Big Data
Deutschland - Python, Spark, and Hadoop for Big Data
Czech Republic - Python, Spark, and Hadoop for Big Data
Denmark - Python, Spark, and Hadoop for Big Data
Estonia - Python, Spark, and Hadoop for Big Data
Finland - Python, Spark, and Hadoop for Big Data
Greece - Python, Spark, and Hadoop for Big Data
Magyarország - Python, Spark, and Hadoop for Big Data
Ireland - Python, Spark, and Hadoop for Big Data
Luxembourg - Python, Spark, and Hadoop for Big Data
Latvia - Python, Spark, and Hadoop for Big Data
España - Python, Spark, and Hadoop for Big Data
Italia - Python, Spark, and Hadoop for Big Data
Lithuania - Python, Spark, and Hadoop for Big Data
Nederland - Python, Spark, and Hadoop for Big Data
Norway - Python, Spark, and Hadoop for Big Data
Portugal - Python, Spark, and Hadoop for Big Data
România - Python, Spark, and Hadoop for Big Data
Sverige - Python, Spark, and Hadoop for Big Data
Türkiye - Python, Spark, and Hadoop for Big Data
Malta - Python, Spark, and Hadoop for Big Data
Belgique - Python, Spark, and Hadoop for Big Data
France - Python, Spark, and Hadoop for Big Data
日本 - Python, Spark, and Hadoop for Big Data
Australia - Python, Spark, and Hadoop for Big Data
Malaysia - Python, Spark, and Hadoop for Big Data
New Zealand - Python, Spark, and Hadoop for Big Data
Philippines - Python, Spark, and Hadoop for Big Data
Singapore - Python, Spark, and Hadoop for Big Data
Thailand - Python, Spark, and Hadoop for Big Data
Vietnam - Python, Spark, and Hadoop for Big Data
India - Python, Spark, and Hadoop for Big Data
Argentina - Python, Spark, and Hadoop for Big Data
Chile - Python, Spark, and Hadoop for Big Data
Costa Rica - Python, Spark, and Hadoop for Big Data
Ecuador - Python, Spark, and Hadoop for Big Data
Guatemala - Python, Spark, and Hadoop for Big Data
Colombia - Python, Spark, and Hadoop for Big Data
México - Python, Spark, and Hadoop for Big Data
Panama - Python, Spark, and Hadoop for Big Data
Peru - Python, Spark, and Hadoop for Big Data
Uruguay - Python, Spark, and Hadoop for Big Data
Venezuela - Python, Spark, and Hadoop for Big Data
Polska - Python, Spark, and Hadoop for Big Data
United Kingdom - Python, Spark, and Hadoop for Big Data
South Korea - Python, Spark, and Hadoop for Big Data
Pakistan - Python, Spark, and Hadoop for Big Data
Sri Lanka - Python, Spark, and Hadoop for Big Data
Bulgaria - Python, Spark, and Hadoop for Big Data
Bolivia - Python, Spark, and Hadoop for Big Data
Indonesia - Python, Spark, and Hadoop for Big Data
Kazakhstan - Python, Spark, and Hadoop for Big Data
Moldova - Python, Spark, and Hadoop for Big Data
Morocco - Python, Spark, and Hadoop for Big Data
Tunisia - Python, Spark, and Hadoop for Big Data
Kuwait - Python, Spark, and Hadoop for Big Data
Oman - Python, Spark, and Hadoop for Big Data
Slovakia - Python, Spark, and Hadoop for Big Data
Kenya - Python, Spark, and Hadoop for Big Data
Nigeria - Python, Spark, and Hadoop for Big Data
Botswana - Python, Spark, and Hadoop for Big Data
Slovenia - Python, Spark, and Hadoop for Big Data
Croatia - Python, Spark, and Hadoop for Big Data
Serbia - Python, Spark, and Hadoop for Big Data
Bhutan - Python, Spark, and Hadoop for Big Data