- An understanding of C/C++ language and parallel programming concepts
- Basic knowledge of computer architecture and memory hierarchy
- Experience with command-line tools and code editors
Audience
- Developers who wish to learn how to use different frameworks for GPU programming and compare their features, performance, and compatibility
- Developers who wish to write portable and scalable code that can run on different platforms and devices
- Programmers who wish to explore the trade-offs and challenges of GPU programming and optimization
GPU programming is a technique that leverages the parallel processing power of GPUs to accelerate applications that require high-performance computing, such as artificial intelligence, gaming, graphics, and scientific computing. There are several frameworks that enable GPU programming, each with its own advantages and disadvantages. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. ROCm is a platform that supports GPU programming on AMD GPUs, and also provides compatibility with CUDA and OpenCL.
This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level developers who wish to use different frameworks for GPU programming and compare their features, performance, and compatibility.
By the end of this training, participants will be able to:
- Set up a development environment that includes OpenCL SDK, CUDA Toolkit, ROCm Platform, a device that supports OpenCL, CUDA, or ROCm, and Visual Studio Code.
- Create a basic GPU program that performs vector addition using OpenCL, CUDA, and ROCm, and compare the syntax, structure, and execution of each framework.
- Use the respective APIs to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads.
- Use the respective languages to write kernels that execute on the device and manipulate data.
- Use the respective built-in functions, variables, and libraries to perform common tasks and operations.
- Use the respective memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses.
- Use the respective execution models to control the threads, blocks, and grids that define the parallelism.
- Debug and test GPU programs using tools such as CodeXL, CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
- Optimize GPU programs using techniques such as coalescing, caching, prefetching, and profiling.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Introduction
- What is GPU programming?
- Why use GPU programming?
- What are the challenges and trade-offs of GPU programming?
- What are the frameworks for GPU programming?
- Choosing the right framework for your application
OpenCL
- What is OpenCL?
- What are the advantages and disadvantages of OpenCL?
- Setting up the development environment for OpenCL
- Creating a basic OpenCL program that performs vector addition
- Using OpenCL API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
- Using OpenCL C language to write kernels that execute on the device and manipulate data
- Using OpenCL built-in functions, variables, and libraries to perform common tasks and operations
- Using OpenCL memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
- Using OpenCL execution model to control the work-items, work-groups, and ND-ranges that define the parallelism
- Debugging and testing OpenCL programs using tools such as CodeXL
- Optimizing OpenCL programs using techniques such as coalescing, caching, prefetching, and profiling
CUDA
- What is CUDA?
- What are the advantages and disadvantages of CUDA?
- Setting up the development environment for CUDA
- Creating a basic CUDA program that performs vector addition
- Using CUDA API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
- Using CUDA C/C++ language to write kernels that execute on the device and manipulate data
- Using CUDA built-in functions, variables, and libraries to perform common tasks and operations
- Using CUDA memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses
- Using CUDA execution model to control the threads, blocks, and grids that define the parallelism
- Debugging and testing CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight
- Optimizing CUDA programs using techniques such as coalescing, caching, prefetching, and profiling
ROCm
- What is ROCm?
- What are the advantages and disadvantages of ROCm?
- Setting up the development environment for ROCm
- Creating a basic ROCm program that performs vector addition
- Using ROCm API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
- Using ROCm C/C++ language to write kernels that execute on the device and manipulate data
- Using ROCm built-in functions, variables, and libraries to perform common tasks and operations
- Using ROCm memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
- Using ROCm execution model to control the threads, blocks, and grids that define the parallelism
- Debugging and testing ROCm programs using tools such as ROCm Debugger and ROCm Profiler
- Optimizing ROCm programs using techniques such as coalescing, caching, prefetching, and profiling
Comparison
- Comparing the features, performance, and compatibility of OpenCL, CUDA, and ROCm
- Evaluating GPU programs using benchmarks and metrics
- Learning the best practices and tips for GPU programming
- Exploring the current and future trends and challenges of GPU programming
Summary and Next Steps
United Arab Emirates - GPU Programming - OpenCL vs CUDA vs ROCm
Qatar - GPU Programming - OpenCL vs CUDA vs ROCm
Egypt - GPU Programming - OpenCL vs CUDA vs ROCm
Saudi Arabia - GPU Programming - OpenCL vs CUDA vs ROCm
South Africa - GPU Programming - OpenCL vs CUDA vs ROCm
Brasil - GPU Programming - OpenCL vs CUDA vs ROCm
Canada - GPU Programming - OpenCL vs CUDA vs ROCm
中国 - GPU Programming - OpenCL vs CUDA vs ROCm
香港 - GPU Programming - OpenCL vs CUDA vs ROCm
澳門 - GPU Programming - OpenCL vs CUDA vs ROCm
台灣 - GPU Programming - OpenCL vs CUDA vs ROCm
USA - GPU Programming - OpenCL vs CUDA vs ROCm
Österreich - GPU Programming - OpenCL vs CUDA vs ROCm
Schweiz - GPU Programming - OpenCL vs CUDA vs ROCm
Deutschland - GPU Programming - OpenCL vs CUDA vs ROCm
Czech Republic - GPU Programming - OpenCL vs CUDA vs ROCm
Denmark - GPU Programming - OpenCL vs CUDA vs ROCm
Estonia - GPU Programming - OpenCL vs CUDA vs ROCm
Finland - GPU Programming - OpenCL vs CUDA vs ROCm
Greece - GPU Programming - OpenCL vs CUDA vs ROCm
Magyarország - GPU Programming - OpenCL vs CUDA vs ROCm
Ireland - GPU Programming - OpenCL vs CUDA vs ROCm
Luxembourg - GPU Programming - OpenCL vs CUDA vs ROCm
Latvia - GPU Programming - OpenCL vs CUDA vs ROCm
España - GPU Programming - OpenCL vs CUDA vs ROCm
Italia - GPU Programming - OpenCL vs CUDA vs ROCm
Lithuania - GPU Programming - OpenCL vs CUDA vs ROCm
Nederland - GPU Programming - OpenCL vs CUDA vs ROCm
Norway - GPU Programming - OpenCL vs CUDA vs ROCm
Portugal - GPU Programming - OpenCL vs CUDA vs ROCm
România - GPU Programming - OpenCL vs CUDA vs ROCm
Sverige - GPU Programming - OpenCL vs CUDA vs ROCm
Türkiye - GPU Programming - OpenCL vs CUDA vs ROCm
Malta - GPU Programming - OpenCL vs CUDA vs ROCm
Belgique - GPU Programming - OpenCL vs CUDA vs ROCm
France - GPU Programming - OpenCL vs CUDA vs ROCm
日本 - GPU Programming - OpenCL vs CUDA vs ROCm
Australia - GPU Programming - OpenCL vs CUDA vs ROCm
Malaysia - GPU Programming - OpenCL vs CUDA vs ROCm
New Zealand - GPU Programming - OpenCL vs CUDA vs ROCm
Philippines - GPU Programming - OpenCL vs CUDA vs ROCm
Singapore - GPU Programming - OpenCL vs CUDA vs ROCm
Thailand - GPU Programming - OpenCL vs CUDA vs ROCm
Vietnam - GPU Programming - OpenCL vs CUDA vs ROCm
India - GPU Programming - OpenCL vs CUDA vs ROCm
Argentina - GPU Programming - OpenCL vs CUDA vs ROCm
Chile - GPU Programming - OpenCL vs CUDA vs ROCm
Costa Rica - GPU Programming - OpenCL vs CUDA vs ROCm
Ecuador - GPU Programming - OpenCL vs CUDA vs ROCm
Guatemala - GPU Programming - OpenCL vs CUDA vs ROCm
Colombia - GPU Programming - OpenCL vs CUDA vs ROCm
México - GPU Programming - OpenCL vs CUDA vs ROCm
Panama - GPU Programming - OpenCL vs CUDA vs ROCm
Peru - GPU Programming - OpenCL vs CUDA vs ROCm
Uruguay - GPU Programming - OpenCL vs CUDA vs ROCm
Venezuela - GPU Programming - OpenCL vs CUDA vs ROCm
Polska - GPU Programming - OpenCL vs CUDA vs ROCm
United Kingdom - GPU Programming - OpenCL vs CUDA vs ROCm
South Korea - GPU Programming - OpenCL vs CUDA vs ROCm
Pakistan - GPU Programming - OpenCL vs CUDA vs ROCm
Sri Lanka - GPU Programming - OpenCL vs CUDA vs ROCm
Bulgaria - GPU Programming - OpenCL vs CUDA vs ROCm
Bolivia - GPU Programming - OpenCL vs CUDA vs ROCm
Indonesia - GPU Programming - OpenCL vs CUDA vs ROCm
Kazakhstan - GPU Programming - OpenCL vs CUDA vs ROCm
Moldova - GPU Programming - OpenCL vs CUDA vs ROCm
Morocco - GPU Programming - OpenCL vs CUDA vs ROCm
Tunisia - GPU Programming - OpenCL vs CUDA vs ROCm
Kuwait - GPU Programming - OpenCL vs CUDA vs ROCm
Oman - GPU Programming - OpenCL vs CUDA vs ROCm
Slovakia - GPU Programming - OpenCL vs CUDA vs ROCm
Kenya - GPU Programming - OpenCL vs CUDA vs ROCm
Nigeria - GPU Programming - OpenCL vs CUDA vs ROCm
Botswana - GPU Programming - OpenCL vs CUDA vs ROCm
Slovenia - GPU Programming - OpenCL vs CUDA vs ROCm
Croatia - GPU Programming - OpenCL vs CUDA vs ROCm
Serbia - GPU Programming - OpenCL vs CUDA vs ROCm
Bhutan - GPU Programming - OpenCL vs CUDA vs ROCm