GPU Programming with CUDA

Course Code: gpuprogcuda

Duration: 28 hours

Prerequisites:

An understanding of C/C++ language and parallel programming concepts
Basic knowledge of computer architecture and memory hierarchy
Experience with command-line tools and code editors

Audience

Developers who wish to learn how to use CUDA to program NVIDIA GPUs and exploit their parallelism
Developers who wish to write high-performance and scalable code that can run on different CUDA devices
Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance

Overview:

CUDA is an open standard for GPU programming that enables a code to run on NVIDIA GPUs, which are widely used for high-performance computing, artificial intelligence (AI), gaming, and graphics. CUDA exposes the programmer to the hardware details and gives full control over the parallelization process. However, this also requires a good understanding of the device architecture, memory model, execution model, and optimization techniques.

This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level developers who wish to use CUDA to program NVIDIA GPUs and exploit their parallelism.

By the end of this training, participants will be able to:

Set up a development environment that includes CUDA Toolkit, a NVIDIA GPU, and Visual Studio Code.
Create a basic CUDA program that performs vector addition on the GPU and retrieves the results from the GPU memory.
Use CUDA API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads.
Use CUDA C/C++ language to write kernels that execute on the GPU and manipulate data.
Use CUDA built-in functions, variables, and libraries to perform common tasks and operations.
Use CUDA memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses.
Use CUDA execution model to control the threads, blocks, and grids that define the parallelism.
Debug and test CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
Optimize CUDA programs using techniques such as coalescing, caching, prefetching, and profiling.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.
96% de clients satisfaits

Course Outline:

Introduction

What is CUDA?
CUDA vs OpenCL vs SYCL
Overview of CUDA features and architecture
Setting up the Development Environment

Getting Started

Creating a new CUDA project using Visual Studio Code
Exploring the project structure and files
Compiling and running the program
Displaying the output using printf and fprintf

CUDA API

Understanding the role of CUDA API in the host program
Using CUDA API to query device information and capabilities
Using CUDA API to allocate and deallocate device memory
Using CUDA API to copy data between host and device
Using CUDA API to launch kernels and synchronize threads
Using CUDA API to handle errors and exceptions

CUDA C/C++

Understanding the role of CUDA C/C++ in the device program
Using CUDA C/C++ to write kernels that execute on the GPU and manipulate data
Using CUDA C/C++ data types, qualifiers, operators, and expressions
Using CUDA C/C++ built-in functions, such as math, atomic, warp, etc.
Using CUDA C/C++ built-in variables, such as threadIdx, blockIdx, blockDim, etc.
Using CUDA C/C++ libraries, such as cuBLAS, cuFFT, cuRAND, etc.

CUDA Memory Model

Understanding the difference between host and device memory models
Using CUDA memory spaces, such as global, shared, constant, and local
Using CUDA memory objects, such as pointers, arrays, textures, and surfaces
Using CUDA memory access modes, such as read-only, write-only, read-write, etc.
Using CUDA memory consistency model and synchronization mechanisms

CUDA Execution Model

Understanding the difference between host and device execution models
Using CUDA threads, blocks, and grids to define the parallelism
Using CUDA thread functions, such as threadIdx, blockIdx, blockDim, etc.
Using CUDA block functions, such as __syncthreads, __threadfence_block, etc.
Using CUDA grid functions, such as gridDim, gridSync, cooperative groups, etc.

Debugging

Understanding the common errors and bugs in CUDA programs
Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
Using CUDA-GDB to debug CUDA programs on Linux
Using CUDA-MEMCHECK to detect memory errors and leaks
Using NVIDIA Nsight to debug and analyze CUDA programs on Windows

Optimization

Understanding the factors that affect the performance of CUDA programs
Using CUDA coalescing techniques to improve memory throughput
Using CUDA caching and prefetching techniques to reduce memory latency
Using CUDA shared memory and local memory techniques to optimize memory accesses and bandwidth
Using CUDA profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps

Sites Published:

United Arab Emirates - GPU Programming with CUDA

Qatar - GPU Programming with CUDA

Egypt - GPU Programming with CUDA

Saudi Arabia - GPU Programming with CUDA

South Africa - GPU Programming with CUDA

Brasil - GPU Programming with CUDA

Canada - GPU Programming with CUDA

中国 - GPU Programming with CUDA

香港 - GPU Programming with CUDA

澳門 - GPU Programming with CUDA

台灣 - GPU Programming with CUDA

USA - GPU Programming with CUDA

Österreich - GPU Programming with CUDA

Schweiz - GPU Programming with CUDA

Deutschland - GPU Programming with CUDA

Czech Republic - GPU Programming with CUDA

Denmark - GPU Programming with CUDA

Estonia - GPU Programming with CUDA

Finland - GPU Programming with CUDA

Greece - GPU Programming with CUDA

Magyarország - GPU Programming with CUDA

Ireland - GPU Programming with CUDA

Luxembourg - GPU Programming with CUDA

Latvia - GPU Programming with CUDA

España - GPU Programming with CUDA

Italia - GPU Programming with CUDA

Lithuania - GPU Programming with CUDA

Nederland - GPU Programming with CUDA

Norway - GPU Programming with CUDA

Portugal - GPU Programming with CUDA

România - GPU Programming with CUDA

Sverige - GPU Programming with CUDA

Türkiye - GPU Programming with CUDA

Malta - GPU Programming with CUDA

Belgique - GPU Programming with CUDA

France - GPU Programming with CUDA

日本 - GPU Programming with CUDA

Australia - GPU Programming with CUDA

Malaysia - GPU Programming with CUDA

New Zealand - GPU Programming with CUDA

Philippines - GPU Programming with CUDA

Singapore - GPU Programming with CUDA

Thailand - GPU Programming with CUDA

Vietnam - GPU Programming with CUDA

India - GPU Programming with CUDA

Argentina - GPU Programming with CUDA

Chile - GPU Programming with CUDA

Costa Rica - GPU Programming with CUDA

Ecuador - GPU Programming with CUDA

Guatemala - GPU Programming with CUDA

Colombia - GPU Programming with CUDA

México - GPU Programming with CUDA

Panama - GPU Programming with CUDA

Peru - GPU Programming with CUDA

Uruguay - GPU Programming with CUDA

Venezuela - GPU Programming with CUDA

Polska - GPU Programming with CUDA

United Kingdom - GPU Programming with CUDA

South Korea - GPU Programming with CUDA

Pakistan - GPU Programming with CUDA

Sri Lanka - GPU Programming with CUDA

Bulgaria - GPU Programming with CUDA

Bolivia - GPU Programming with CUDA

Indonesia - GPU Programming with CUDA

Kazakhstan - GPU Programming with CUDA

Moldova - GPU Programming with CUDA

Morocco - GPU Programming with CUDA

Tunisia - GPU Programming with CUDA

Kuwait - GPU Programming with CUDA

Oman - GPU Programming with CUDA

Slovakia - GPU Programming with CUDA

Kenya - GPU Programming with CUDA

Nigeria - GPU Programming with CUDA

Botswana - GPU Programming with CUDA

Slovenia - GPU Programming with CUDA

Croatia - GPU Programming with CUDA

Serbia - GPU Programming with CUDA

Bhutan - GPU Programming with CUDA

Nepal - GPU Programming with CUDA

Uzbekistan - GPU Programming with CUDA