Introduction to GPU Programming

Course Code: gpuprog

Duration: 21 hours

Prerequisites:

An understanding of C/C++ language and parallel programming concepts
Basic knowledge of computer architecture and memory hierarchy
Experience with command-line tools and code editors

Audience

Developers who wish to learn the basics of GPU programming and the main frameworks and tools for developing GPU applications
Developers who wish to write portable and scalable code that can run on different platforms and devices
Programmers who wish to explore the benefits and challenges of GPU programming and optimization

Overview:

GPU programming is a technique that leverages the parallel processing power of GPUs to accelerate applications that require high-performance computing, such as artificial intelligence, gaming, graphics, and scientific computing. There are several frameworks and tools that enable GPU programming, each with its own advantages and disadvantages. Some of the most popular ones are OpenCL, CUDA, ROCm, and HIP.

This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level developers who wish to learn the basics of GPU programming and the main frameworks and tools for developing GPU applications.

By the end of this training, participants will be able to:
Understand the difference between CPU and GPU computing and the benefits and challenges of GPU programming.
Choose the right framework and tool for their GPU application.
Create a basic GPU program that performs vector addition using one or more of the frameworks and tools.
Use the respective APIs, languages, and libraries to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads.
Use the respective memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses.
Use the respective execution models, such as work-items, work-groups, threads, blocks, and grids, to control the parallelism.
Debug and test GPU programs using tools such as CodeXL, CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
Optimize GPU programs using techniques such as coalescing, caching, prefetching, and profiling.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Course Outline:

Introduction

What is GPU programming?
Why use GPU programming?
What are the challenges and trade-offs of GPU programming?
What are the frameworks and tools for GPU programming?
Choosing the right framework and tool for your application

OpenCL

What is OpenCL?
What are the advantages and disadvantages of OpenCL?
Setting up the development environment for OpenCL
Creating a basic OpenCL program that performs vector addition
Using OpenCL API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
Using OpenCL C language to write kernels that execute on the device and manipulate data
Using OpenCL built-in functions, variables, and libraries to perform common tasks and operations
Using OpenCL memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
Using OpenCL execution model to control the work-items, work-groups, and ND-ranges that define the parallelism
Debugging and testing OpenCL programs using tools such as CodeXL
Optimizing OpenCL programs using techniques such as coalescing, caching, prefetching, and profiling

CUDA

What is CUDA?
What are the advantages and disadvantages of CUDA?
Setting up the development environment for CUDA
Creating a basic CUDA program that performs vector addition
Using CUDA API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
Using CUDA C/C++ language to write kernels that execute on the device and manipulate data
Using CUDA built-in functions, variables, and libraries to perform common tasks and operations
Using CUDA memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses
Using CUDA execution model to control the threads, blocks, and grids that define the parallelism
Debugging and testing CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight
Optimizing CUDA programs using techniques such as coalescing, caching, prefetching, and profiling

ROCm

What is ROCm?
What are the advantages and disadvantages of ROCm?
Setting up the development environment for ROCm
Creating a basic ROCm program that performs vector addition
Using ROCm API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads
Using ROCm C/C++ language to write kernels that execute on the device and manipulate data
Using ROCm built-in functions, variables, and libraries to perform common tasks and operations
Using ROCm memory spaces, such as global, local, constant, and private, to optimize data transfers and memory accesses
Using ROCm execution model to control the threads, blocks, and grids that define the parallelism
Debugging and testing ROCm programs using tools such as ROCm Debugger and ROCm Profiler
Optimizing ROCm programs using techniques such as coalescing, caching, prefetching, and profiling

HIP

What is HIP?
What are the advantages and disadvantages of HIP?
Setting up the development environment for HIP
Creating a basic HIP program that performs vector addition
Using HIP language to write kernels that execute on the device and manipulate data
Using HIP built-in functions, variables, and libraries to perform common tasks and operations
Using HIP memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses
Using HIP execution model to control the threads, blocks, and grids that define the parallelism
Debugging and testing HIP programs using tools such as ROCm Debugger and ROCm Profiler
Optimizing HIP programs using techniques such as coalescing, caching, prefetching, and profiling

Comparison

Comparing the features, performance, and compatibility of OpenCL, CUDA, ROCm, and HIP
Evaluating GPU programs using benchmarks and metrics
Learning the best practices and tips for GPU programming
Exploring the current and future trends and challenges of GPU programming

Summary and Next Steps

Sites Published:

United Arab Emirates - Introduction to GPU Programming

Qatar - Introduction to GPU Programming

Egypt - Introduction to GPU Programming

Saudi Arabia - Introduction to GPU Programming

South Africa - Introduction to GPU Programming

Brasil - Introduction to GPU Programming

Canada - Introduction to GPU Programming

中国 - Introduction to GPU Programming

香港 - Introduction to GPU Programming

澳門 - Introduction to GPU Programming

台灣 - Introduction to GPU Programming

USA - Introduction to GPU Programming

Österreich - Introduction to GPU Programming

Schweiz - Introduction to GPU Programming

Deutschland - Introduction to GPU Programming

Czech Republic - Introduction to GPU Programming

Denmark - Introduction to GPU Programming

Estonia - Introduction to GPU Programming

Finland - Introduction to GPU Programming

Greece - Introduction to GPU Programming

Magyarország - Introduction to GPU Programming

Ireland - Introduction to GPU Programming

Luxembourg - Introduction to GPU Programming

Latvia - Introduction to GPU Programming

España - Introduction to GPU Programming

Italia - Introduction to GPU Programming

Lithuania - Introduction to GPU Programming

Nederland - Introduction to GPU Programming

Norway - Introduction to GPU Programming

Portugal - Introduction to GPU Programming

România - Introduction to GPU Programming

Sverige - Introduction to GPU Programming

Türkiye - Introduction to GPU Programming

Malta - Introduction to GPU Programming

Belgique - Introduction to GPU Programming

France - Introduction to GPU Programming

日本 - Introduction to GPU Programming

Australia - Introduction to GPU Programming

Malaysia - Introduction to GPU Programming

New Zealand - Introduction to GPU Programming

Philippines - Introduction to GPU Programming

Singapore - Introduction to GPU Programming

Thailand - Introduction to GPU Programming

Vietnam - Introduction to GPU Programming

India - Introduction to GPU Programming

Argentina - Introduction to GPU Programming

Chile - Introduction to GPU Programming

Costa Rica - Introduction to GPU Programming

Ecuador - Introduction to GPU Programming

Guatemala - Introduction to GPU Programming

Colombia - Introduction to GPU Programming

México - Introduction to GPU Programming

Panama - Introduction to GPU Programming

Peru - Introduction to GPU Programming

Uruguay - Introduction to GPU Programming

Venezuela - Introduction to GPU Programming

Polska - Introduction to GPU Programming

United Kingdom - Introduction to GPU Programming

South Korea - Introduction to GPU Programming

Pakistan - Introduction to GPU Programming

Sri Lanka - Introduction to GPU Programming

Bulgaria - Introduction to GPU Programming

Bolivia - Introduction to GPU Programming

Indonesia - Introduction to GPU Programming

Kazakhstan - Introduction to GPU Programming

Moldova - Introduction to GPU Programming

Morocco - Introduction to GPU Programming

Tunisia - Introduction to GPU Programming

Kuwait - Introduction to GPU Programming

Oman - Introduction to GPU Programming

Slovakia - Introduction to GPU Programming

Kenya - Introduction to GPU Programming

Nigeria - Introduction to GPU Programming

Botswana - Introduction to GPU Programming

Slovenia - Introduction to GPU Programming

Croatia - Introduction to GPU Programming

Serbia - Introduction to GPU Programming

Bhutan - Introduction to GPU Programming

Nepal - Introduction to GPU Programming

Uzbekistan - Introduction to GPU Programming