- An understanding of Machine Learning, specifically deep learning
- Familiarity with machine learning libraries (TensorFlow, Keras, PyTorch, Apache MXNet)
- Python programming experience
Audience
- Developers
- Data scientists
Horovod is an open source software framework, designed for processing fast and efficient distributed deep learning models using TensorFlow, Keras, PyTorch, and Apache MXNet. It can scale up a single-GPU training script to run on multiple GPUs or hosts with minimal code changes.
This instructor-led, live training (online or onsite) is aimed at developers or data scientists who wish to use Horovod to run distributed deep learning trainings and scale it up to run across multiple GPUs in parallel.
By the end of this training, participants will be able to:
- Set up the necessary development environment to start running deep learning trainings.
- Install and configure Horovod to train models with TensorFlow, Keras, PyTorch, and Apache MXNet.
- Scale deep learning training with Horovod to run on multiple GPUs.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- This course is focused on Horovod, but other software tools and frameworks such as TensorFlow, Keras, PyTorch, and Apache MXNet may be required. Please let us know if you have specific requirements or preferences.
- To request a customized training for this course, please contact us to arrange.
Introduction
- Overview of Horovod features and concepts
- Understanding the supported frameworks
Installing and Configuring Horovod
- Preparing the hosting environment
- Building Horovod for TensorFlow, Keras, PyTorch, and Apache MXNet
- Running Horovod
Running Distributed Training
- Modifying and running training examples with TensorFlow
- Modifying and running training examples with Keras
- Modifying and running training examples with PyTorch
- Modifying and running training examples with Apache MXNet
Optimizing Distributed Training Processes
- Running concurrent operations on multiple GPUs
- Tuning hyperparameters
- Enabling performance autotuning
Troubleshooting
Summary and Conclusion
United Arab Emirates - Distributed Deep Learning with Horovod
Qatar - Distributed Deep Learning with Horovod
Egypt - Distributed Deep Learning with Horovod
Saudi Arabia - Distributed Deep Learning with Horovod
South Africa - Distributed Deep Learning with Horovod
Brasil - Distributed Deep Learning with Horovod
Canada - Distributed Deep Learning with Horovod
中国 - Distributed Deep Learning with Horovod
香港 - Distributed Deep Learning with Horovod
澳門 - Distributed Deep Learning with Horovod
台灣 - Distributed Deep Learning with Horovod
USA - Distributed Deep Learning with Horovod
Österreich - Distributed Deep Learning with Horovod
Schweiz - Distributed Deep Learning with Horovod
Deutschland - Distributed Deep Learning with Horovod
Czech Republic - Distributed Deep Learning with Horovod
Denmark - Distributed Deep Learning with Horovod
Estonia - Distributed Deep Learning with Horovod
Finland - Distributed Deep Learning with Horovod
Greece - Distributed Deep Learning with Horovod
Magyarország - Distributed Deep Learning with Horovod
Ireland - Distributed Deep Learning with Horovod
Luxembourg - Distributed Deep Learning with Horovod
Latvia - Distributed Deep Learning with Horovod
España - Distributed Deep Learning with Horovod
Italia - Distributed Deep Learning with Horovod
Lithuania - Distributed Deep Learning with Horovod
Nederland - Distributed Deep Learning with Horovod
Norway - Distributed Deep Learning with Horovod
Portugal - Distributed Deep Learning with Horovod
România - Distributed Deep Learning with Horovod
Sverige - Distributed Deep Learning with Horovod
Türkiye - Distributed Deep Learning with Horovod
Malta - Distributed Deep Learning with Horovod
Belgique - Distributed Deep Learning with Horovod
France - Distributed Deep Learning with Horovod
日本 - Distributed Deep Learning with Horovod
Australia - Distributed Deep Learning with Horovod
Malaysia - Distributed Deep Learning with Horovod
New Zealand - Distributed Deep Learning with Horovod
Philippines - Distributed Deep Learning with Horovod
Singapore - Distributed Deep Learning with Horovod
Thailand - Distributed Deep Learning with Horovod
Vietnam - Distributed Deep Learning with Horovod
India - Distributed Deep Learning with Horovod
Argentina - Distributed Deep Learning with Horovod
Chile - Distributed Deep Learning with Horovod
Costa Rica - Distributed Deep Learning with Horovod
Ecuador - Distributed Deep Learning with Horovod
Guatemala - Distributed Deep Learning with Horovod
Colombia - Distributed Deep Learning with Horovod
México - Distributed Deep Learning with Horovod
Panama - Distributed Deep Learning with Horovod
Peru - Distributed Deep Learning with Horovod
Uruguay - Distributed Deep Learning with Horovod
Venezuela - Distributed Deep Learning with Horovod
Polska - Distributed Deep Learning with Horovod
United Kingdom - Distributed Deep Learning with Horovod
South Korea - Distributed Deep Learning with Horovod
Pakistan - Distributed Deep Learning with Horovod
Sri Lanka - Distributed Deep Learning with Horovod
Bulgaria - Distributed Deep Learning with Horovod
Bolivia - Distributed Deep Learning with Horovod
Indonesia - Distributed Deep Learning with Horovod
Kazakhstan - Distributed Deep Learning with Horovod
Moldova - Distributed Deep Learning with Horovod
Morocco - Distributed Deep Learning with Horovod
Tunisia - Distributed Deep Learning with Horovod
Kuwait - Distributed Deep Learning with Horovod
Oman - Distributed Deep Learning with Horovod
Slovakia - Distributed Deep Learning with Horovod
Kenya - Distributed Deep Learning with Horovod
Nigeria - Distributed Deep Learning with Horovod
Botswana - Distributed Deep Learning with Horovod
Slovenia - Distributed Deep Learning with Horovod
Croatia - Distributed Deep Learning with Horovod
Serbia - Distributed Deep Learning with Horovod
Bhutan - Distributed Deep Learning with Horovod