- An understanding of C/C++ language and parallel programming concepts
- Basic knowledge of computer architecture and memory hierarchy
- Experience with command-line tools and code editors
- Familiarity with Windows operating system and PowerShell
Audience
- Developers who wish to learn how to install and use ROCm on Windows to program AMD GPUs and exploit their parallelism
- Developers who wish to write high-performance and scalable code that can run on different AMD devices
- Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance
ROCm is an open source platform for GPU programming that supports AMD GPUs, and also provides compatibility with CUDA and OpenCL. ROCm exposes the programmer to the hardware details and gives full control over the parallelization process. However, this also requires a good understanding of the device architecture, memory model, execution model, and optimization techniques.
ROCm for Windows is a recent development that allows users to install and use ROCm on Windows operating system, which is widely used for personal and professional purposes. ROCm for Windows enables users to leverage the power of AMD GPUs for various applications, such as artificial intelligence, gaming, graphics, and scientific computing.
This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level developers who wish to install and use ROCm on Windows to program AMD GPUs and exploit their parallelism.
By the end of this training, participants will be able to:
- Set up a development environment that includes ROCm Platform, a AMD GPU, and Visual Studio Code on Windows.
- Create a basic ROCm program that performs vector addition on the GPU and retrieves the results from the GPU memory.
- Use ROCm API to query device information, allocate and deallocate device memory, copy data between host and device, launch kernels, and synchronize threads.
- Use HIP language to write kernels that execute on the GPU and manipulate data.
- Use HIP built-in functions, variables, and libraries to perform common tasks and operations.
- Use ROCm and HIP memory spaces, such as global, shared, constant, and local, to optimize data transfers and memory accesses.
- Use ROCm and HIP execution models to control the threads, blocks, and grids that define the parallelism.
- Debug and test ROCm and HIP programs using tools such as ROCm Debugger and ROCm Profiler.
- Optimize ROCm and HIP programs using techniques such as coalescing, caching, prefetching, and profiling.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Introduction
- What is ROCm?
- What is HIP?
- ROCm vs CUDA vs OpenCL
- Overview of ROCm and HIP features and architecture
- ROCm for Windows vs ROCm for Linux
Installation
- Installing ROCm on Windows
- Verifying the installation and check the device compatibility
- Updating or uninstall ROCm on Windows
- Troubleshooting common installation issues
Getting Started
- Creating a new ROCm project using Visual Studio Code on Windows
- Exploring the project structure and files
- Compiling and run the program
- Displaying the output using printf and fprintf
ROCm API
- Using ROCm API in the host program
- Querying device information and capabilities
- Allocating and deallocate device memory
- Copying data between host and device
- Launching kernels and synchronize threads
- Handling errors and exceptions
HIP Language
- Using HIP language in the device program
- Writing kernels that execute on the GPU and manipulate data
- Using data types, qualifiers, operators, and expressions
- Using built-in functions, variables, and libraries
ROCm and HIP Memory Model
- Using different memory spaces, such as global, shared, constant, and local
- Using different memory objects, such as pointers, arrays, textures, and surfaces
- Using different memory access modes, such as read-only, write-only, read-write, etc.
- Using memory consistency model and synchronization mechanisms
ROCm and HIP Execution Model
- Using different execution models, such as threads, blocks, and grids
- Using thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
- Using block functions, such as __syncthreads, __threadfence_block, etc.
- Using grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.
Debugging
- Debugging ROCm and HIP programs on Windows
- Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
- Using ROCm Debugger to debug ROCm and HIP programs on AMD devices
- Using ROCm Profiler to analyze ROCm and HIP programs on AMD devices
Optimization
- Optimizing ROCm and HIP programs on Windows
- Using coalescing techniques to improve memory throughput
- Using caching and prefetching techniques to reduce memory latency
- Using shared memory and local memory techniques to optimize memory accesses and bandwidth
- Using profiling and profiling tools to measure and improve the execution time and resource utilization
Summary and Next Steps
United Arab Emirates - ROCm for Windows
Saudi Arabia - ROCm for Windows
South Africa - ROCm for Windows
Deutschland - ROCm for Windows
Czech Republic - ROCm for Windows
Magyarország - ROCm for Windows
New Zealand - ROCm for Windows
Philippines - ROCm for Windows
United Kingdom - ROCm for Windows