- An understanding of C/C++ language and parallel programming concepts
- Basic knowledge of computer architecture and memory hierarchy
- Experience with command-line tools and code editors
Audience
- Developers who wish to learn how to use CUDA to program NVIDIA GPUs and exploit their parallelism
- Developers who wish to write high-performance and scalable code that can run on different CUDA devices
- Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance
CUDA is an open standard for GPU programming that enables a code to run on NVIDIA GPUs, which are widely used for high-performance computing, artificial intelligence (AI), gaming, and graphics. CUDA exposes the programmer to the hardware details and gives full control over the parallelization process. However, this also requires a good understanding of the device architecture, memory model, execution model, and optimization techniques.
This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level developers who wish to use CUDA to program NVIDIA GPUs and exploit their parallelism.
By the end of this training, participants will be able to:
- Set up a development environment that includes CUDA Toolkit, a NVIDIA GPU, and Visual Studio Code.
- Create a basic CUDA program that performs vector addition on the GPU and retrieves the results from the GPU memory.
- Use CUDA API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads.
- Use CUDA C/C++ language to write kernels that execute on the GPU and manipulate data.
- Use CUDA built-in functions, variables, and libraries to perform common tasks and operations.
- Use CUDA memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses.
- Use CUDA execution model to control the threads, blocks, and grids that define the parallelism.
- Debug and test CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
- Optimize CUDA programs using techniques such as coalescing, caching, prefetching, and profiling.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
- 96% de clients satisfaits
Introduction
- What is CUDA?
- CUDA vs OpenCL vs SYCL
- Overview of CUDA features and architecture
- Setting up the Development Environment
Getting Started
- Creating a new CUDA project using Visual Studio Code
- Exploring the project structure and files
- Compiling and running the program
- Displaying the output using printf and fprintf
CUDA API
- Understanding the role of CUDA API in the host program
- Using CUDA API to query device information and capabilities
- Using CUDA API to allocate and deallocate device memory
- Using CUDA API to copy data between host and device
- Using CUDA API to launch kernels and synchronize threads
- Using CUDA API to handle errors and exceptions
CUDA C/C++
- Understanding the role of CUDA C/C++ in the device program
- Using CUDA C/C++ to write kernels that execute on the GPU and manipulate data
- Using CUDA C/C++ data types, qualifiers, operators, and expressions
- Using CUDA C/C++ built-in functions, such as math, atomic, warp, etc.
- Using CUDA C/C++ built-in variables, such as threadIdx, blockIdx, blockDim, etc.
- Using CUDA C/C++ libraries, such as cuBLAS, cuFFT, cuRAND, etc.
CUDA Memory Model
- Understanding the difference between host and device memory models
- Using CUDA memory spaces, such as global, shared, constant, and local
- Using CUDA memory objects, such as pointers, arrays, textures, and surfaces
- Using CUDA memory access modes, such as read-only, write-only, read-write, etc.
- Using CUDA memory consistency model and synchronization mechanisms
CUDA Execution Model
- Understanding the difference between host and device execution models
- Using CUDA threads, blocks, and grids to define the parallelism
- Using CUDA thread functions, such as threadIdx, blockIdx, blockDim, etc.
- Using CUDA block functions, such as __syncthreads, __threadfence_block, etc.
- Using CUDA grid functions, such as gridDim, gridSync, cooperative groups, etc.
Debugging
- Understanding the common errors and bugs in CUDA programs
- Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
- Using CUDA-GDB to debug CUDA programs on Linux
- Using CUDA-MEMCHECK to detect memory errors and leaks
- Using NVIDIA Nsight to debug and analyze CUDA programs on Windows
Optimization
- Understanding the factors that affect the performance of CUDA programs
- Using CUDA coalescing techniques to improve memory throughput
- Using CUDA caching and prefetching techniques to reduce memory latency
- Using CUDA shared memory and local memory techniques to optimize memory accesses and bandwidth
- Using CUDA profiling and profiling tools to measure and improve the execution time and resource utilization
Summary and Next Steps
United Arab Emirates - GPU Programming with CUDA
Qatar - GPU Programming with CUDA
Egypt - GPU Programming with CUDA
Saudi Arabia - GPU Programming with CUDA
South Africa - GPU Programming with CUDA
Brasil - GPU Programming with CUDA
Canada - GPU Programming with CUDA
中国 - GPU Programming with CUDA
香港 - GPU Programming with CUDA
澳門 - GPU Programming with CUDA
台灣 - GPU Programming with CUDA
USA - GPU Programming with CUDA
Österreich - GPU Programming with CUDA
Schweiz - GPU Programming with CUDA
Deutschland - GPU Programming with CUDA
Czech Republic - GPU Programming with CUDA
Denmark - GPU Programming with CUDA
Estonia - GPU Programming with CUDA
Finland - GPU Programming with CUDA
Greece - GPU Programming with CUDA
Magyarország - GPU Programming with CUDA
Ireland - GPU Programming with CUDA
Luxembourg - GPU Programming with CUDA
Latvia - GPU Programming with CUDA
España - GPU Programming with CUDA
Italia - GPU Programming with CUDA
Lithuania - GPU Programming with CUDA
Nederland - GPU Programming with CUDA
Norway - GPU Programming with CUDA
Portugal - GPU Programming with CUDA
România - GPU Programming with CUDA
Sverige - GPU Programming with CUDA
Türkiye - GPU Programming with CUDA
Malta - GPU Programming with CUDA
Belgique - GPU Programming with CUDA
France - GPU Programming with CUDA
日本 - GPU Programming with CUDA
Australia - GPU Programming with CUDA
Malaysia - GPU Programming with CUDA
New Zealand - GPU Programming with CUDA
Philippines - GPU Programming with CUDA
Singapore - GPU Programming with CUDA
Thailand - GPU Programming with CUDA
Vietnam - GPU Programming with CUDA
India - GPU Programming with CUDA
Argentina - GPU Programming with CUDA
Chile - GPU Programming with CUDA
Costa Rica - GPU Programming with CUDA
Ecuador - GPU Programming with CUDA
Guatemala - GPU Programming with CUDA
Colombia - GPU Programming with CUDA
México - GPU Programming with CUDA
Panama - GPU Programming with CUDA
Peru - GPU Programming with CUDA
Uruguay - GPU Programming with CUDA
Venezuela - GPU Programming with CUDA
Polska - GPU Programming with CUDA
United Kingdom - GPU Programming with CUDA
South Korea - GPU Programming with CUDA
Pakistan - GPU Programming with CUDA
Sri Lanka - GPU Programming with CUDA
Bulgaria - GPU Programming with CUDA
Bolivia - GPU Programming with CUDA
Indonesia - GPU Programming with CUDA
Kazakhstan - GPU Programming with CUDA
Moldova - GPU Programming with CUDA
Morocco - GPU Programming with CUDA
Tunisia - GPU Programming with CUDA
Kuwait - GPU Programming with CUDA
Oman - GPU Programming with CUDA
Slovakia - GPU Programming with CUDA
Kenya - GPU Programming with CUDA
Nigeria - GPU Programming with CUDA
Botswana - GPU Programming with CUDA
Slovenia - GPU Programming with CUDA
Croatia - GPU Programming with CUDA
Serbia - GPU Programming with CUDA
Bhutan - GPU Programming with CUDA