AMD GPU Programming

Course Code: amdgpuprog

Duration: 28 hours

Prerequisites:

An understanding of C/C++ language and parallel programming concepts
Basic knowledge of computer architecture and memory hierarchy
Experience with command-line tools and code editors

Audience

Developers who wish to learn how to use ROCm and HIP to program AMD GPUs and exploit their parallelism
Developers who wish to write high-performance and scalable code that can run on different AMD devices
Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance

Overview:

ROCm is an open source platform for GPU programming that supports AMD GPUs, and also provides compatibility with CUDA and OpenCL. ROCm exposes the programmer to the hardware details and gives full control over the parallelization process. However, this also requires a good understanding of the device architecture, memory model, execution model, and optimization techniques.

HIP is a C++ runtime API and kernel language that allows you to write portable code that can run on both AMD and NVIDIA GPUs. HIP provides a thin abstraction layer over the native GPU APIs, such as ROCm and CUDA, and allows you to leverage the existing GPU libraries and tools.

This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level developers who wish to use ROCm and HIP to program AMD GPUs and exploit their parallelism.

By the end of this training, participants will be able to:

Set up a development environment that includes ROCm Platform, a AMD GPU, and Visual Studio Code.
Create a basic ROCm program that performs vector addition on the GPU and retrieves the results from the GPU memory.
Use ROCm API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads.
Use HIP language to write kernels that execute on the GPU and manipulate data.
Use HIP built-in functions, variables, and libraries to perform common tasks and operations.
Use ROCm and HIP memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses.
Use ROCm and HIP execution models to control the threads, blocks, and grids that define the parallelism.
Debug and test ROCm and HIP programs using tools such as ROCm Debugger and ROCm Profiler.
Optimize ROCm and HIP programs using techniques such as coalescing, caching, prefetching, and profiling.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Course Outline:

Introduction

What is ROCm?
What is HIP?
ROCm vs CUDA vs OpenCL
Overview of ROCm and HIP features and architecture
Setting up the Development Environment

Getting Started

Creating a new ROCm project using Visual Studio Code
Exploring the project structure and files
Compiling and running the program
Displaying the output using printf and fprintf

ROCm API

Understanding the role of ROCm API in the host program
Using ROCm API to query device information and capabilities
Using ROCm API to allocate and deallocate device memory
Using ROCm API to copy data between host and device
Using ROCm API to launch kernels and synchronize threads
Using ROCm API to handle errors and exceptions

HIP Language

Understanding the role of HIP language in the device program
Using HIP language to write kernels that execute on the GPU and manipulate data
Using HIP data types, qualifiers, operators, and expressions
Using HIP built-in functions, variables, and libraries to perform common tasks and operations

ROCm and HIP Memory Model

Understanding the difference between host and device memory models
Using ROCm and HIP memory spaces, such as global, shared, constant, and local
Using ROCm and HIP memory objects, such as pointers, arrays, textures, and surfaces
Using ROCm and HIP memory access modes, such as read-only, write-only, read-write, etc.
Using ROCm and HIP memory consistency model and synchronization mechanisms

ROCm and HIP Execution Model

Understanding the difference between host and device execution models
Using ROCm and HIP threads, blocks, and grids to define the parallelism
Using ROCm and HIP thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
Using ROCm and HIP block functions, such as __syncthreads, __threadfence_block, etc.
Using ROCm and HIP grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.

Debugging

Understanding the common errors and bugs in ROCm and HIP programs
Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
Using ROCm Debugger to debug ROCm and HIP programs on AMD devices
Using ROCm Profiler to analyze ROCm and HIP programs on AMD devices

Optimization

Understanding the factors that affect the performance of ROCm and HIP programs
Using ROCm and HIP coalescing techniques to improve memory throughput
Using ROCm and HIP caching and prefetching techniques to reduce memory latency
Using ROCm and HIP shared memory and local memory techniques to optimize memory accesses and bandwidth
Using ROCm and HIP profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps

Sites Published:

United Arab Emirates - AMD GPU Programming

Qatar - AMD GPU Programming

Egypt - AMD GPU Programming

Saudi Arabia - AMD GPU Programming

South Africa - AMD GPU Programming

Brasil - AMD GPU Programming

Canada - AMD GPU Programming

中国 - AMD GPU Programming

香港 - AMD GPU Programming

澳門 - AMD GPU Programming

台灣 - AMD GPU Programming

USA - AMD GPU Programming

Österreich - AMD GPU Programming

Schweiz - AMD GPU Programming

Deutschland - AMD GPU Programming

Czech Republic - AMD GPU Programming

Denmark - AMD GPU Programming

Estonia - AMD GPU Programming

Finland - AMD GPU Programming

Greece - AMD GPU Programming

Magyarország - AMD GPU Programming

Ireland - AMD GPU Programming

Luxembourg - AMD GPU Programming

Latvia - AMD GPU Programming

España - AMD GPU Programming

Italia - AMD GPU Programming

Lithuania - AMD GPU Programming

Nederland - AMD GPU Programming

Norway - AMD GPU Programming

Portugal - AMD GPU Programming

România - AMD GPU Programming

Sverige - AMD GPU Programming

Türkiye - AMD GPU Programming

Malta - AMD GPU Programming

Belgique - AMD GPU Programming

France - AMD GPU Programming

日本 - AMD GPU Programming

Australia - AMD GPU Programming

Malaysia - AMD GPU Programming

New Zealand - AMD GPU Programming

Philippines - AMD GPU Programming

Singapore - AMD GPU Programming

Thailand - AMD GPU Programming

Vietnam - AMD GPU Programming

India - AMD GPU Programming

Argentina - AMD GPU Programming

Chile - AMD GPU Programming

Costa Rica - AMD GPU Programming

Ecuador - AMD GPU Programming

Guatemala - AMD GPU Programming

Colombia - AMD GPU Programming

México - AMD GPU Programming

Panama - AMD GPU Programming

Peru - AMD GPU Programming

Uruguay - AMD GPU Programming

Venezuela - AMD GPU Programming

Polska - AMD GPU Programming

United Kingdom - AMD GPU Programming

South Korea - AMD GPU Programming

Pakistan - AMD GPU Programming

Sri Lanka - AMD GPU Programming

Bulgaria - AMD GPU Programming

Bolivia - AMD GPU Programming

Indonesia - AMD GPU Programming

Kazakhstan - AMD GPU Programming

Moldova - AMD GPU Programming

Morocco - AMD GPU Programming

Tunisia - AMD GPU Programming

Kuwait - AMD GPU Programming

Oman - AMD GPU Programming

Slovakia - AMD GPU Programming

Kenya - AMD GPU Programming

Nigeria - AMD GPU Programming

Botswana - AMD GPU Programming

Slovenia - AMD GPU Programming

Croatia - AMD GPU Programming

Serbia - AMD GPU Programming

Bhutan - AMD GPU Programming

Nepal - AMD GPU Programming

Uzbekistan - AMD GPU Programming