Course Code: poabc
Duration: 21 hours
Prerequisites:
  • Experience working with AI model training or deployment pipelines
  • Understanding of GPU/MLU compute principles and model optimization
  • Basic familiarity with performance profiling tools and metrics

Audience

  • Performance engineers
  • Machine learning infrastructure teams
  • AI system architects
Overview:

Ascend, Biren, and Cambricon are leading AI hardware platforms in China, each offering unique acceleration and profiling tools for production-scale AI workloads.

This instructor-led, live training (online or onsite) is aimed at advanced-level AI infrastructure and performance engineers who wish to optimize model inference and training workflows across multiple Chinese AI chip platforms.

By the end of this training, participants will be able to:

  • Benchmark models on Ascend, Biren, and Cambricon platforms.
  • Identify system bottlenecks and memory/compute inefficiencies.
  • Apply graph-level, kernel-level, and operator-level optimizations.
  • Tune deployment pipelines to improve throughput and latency.

Format of the Course

  • Interactive lecture and discussion.
  • Hands-on use of profiling and optimization tools on each platform.
  • Guided exercises focused on practical tuning scenarios.

Course Customization Options

  • To request a customized training for this course based on your performance environment or model type, please contact us to arrange.
Course Outline:

Performance Concepts and Metrics

  • Latency, throughput, power usage, resource utilization
  • System vs model-level bottlenecks
  • Profiling for inference vs training

Profiling on Huawei Ascend

  • Using CANN Profiler and MindInsight
  • Kernel and operator diagnostics
  • Offload patterns and memory mapping

Profiling on Biren GPU

  • Biren SDK performance monitoring features
  • Kernel fusion, memory alignment, and execution queues
  • Power and temperature-aware profiling

Profiling on Cambricon MLU

  • BANGPy and Neuware performance tools
  • Kernel-level visibility and log interpretation
  • MLU profiler integration with deployment frameworks

Graph and Model-Level Optimization

  • Graph pruning and quantization strategies
  • Operator fusion and computational graph restructuring
  • Input size standardization and batch tuning

Memory and Kernel Optimization

  • Optimizing memory layout and reuse
  • Efficient buffer management across chipsets
  • Kernel-level tuning techniques per platform

Cross-Platform Best Practices

  • Performance portability: abstraction strategies
  • Building shared tuning pipelines for multi-chip environments
  • Example: tuning an object detection model across Ascend, Biren, and MLU

Summary and Next Steps

Sites Published:

United Arab Emirates - Performance Optimization on Ascend, Biren, and Cambricon

Qatar - Performance Optimization on Ascend, Biren, and Cambricon

Egypt - Performance Optimization on Ascend, Biren, and Cambricon

Saudi Arabia - Performance Optimization on Ascend, Biren, and Cambricon

South Africa - Performance Optimization on Ascend, Biren, and Cambricon

Brasil - Performance Optimization on Ascend, Biren, and Cambricon

Canada - Performance Optimization on Ascend, Biren, and Cambricon

中国 - Performance Optimization on Ascend, Biren, and Cambricon

香港 - Performance Optimization on Ascend, Biren, and Cambricon

澳門 - Performance Optimization on Ascend, Biren, and Cambricon

台灣 - Performance Optimization on Ascend, Biren, and Cambricon

USA - Performance Optimization on Ascend, Biren, and Cambricon

Österreich - Performance Optimization on Ascend, Biren, and Cambricon

Schweiz - Performance Optimization on Ascend, Biren, and Cambricon

Deutschland - Performance Optimization on Ascend, Biren, and Cambricon

Czech Republic - Performance Optimization on Ascend, Biren, and Cambricon

Denmark - Performance Optimization on Ascend, Biren, and Cambricon

Estonia - Performance Optimization on Ascend, Biren, and Cambricon

Finland - Performance Optimization on Ascend, Biren, and Cambricon

Greece - Performance Optimization on Ascend, Biren, and Cambricon

Magyarország - Performance Optimization on Ascend, Biren, and Cambricon

Ireland - Performance Optimization on Ascend, Biren, and Cambricon

Luxembourg - Performance Optimization on Ascend, Biren, and Cambricon

Latvia - Performance Optimization on Ascend, Biren, and Cambricon

España - Performance Optimization on Ascend, Biren, and Cambricon

Italia - Performance Optimization on Ascend, Biren, and Cambricon

Lithuania - Performance Optimization on Ascend, Biren, and Cambricon

Nederland - Performance Optimization on Ascend, Biren, and Cambricon

Norway - Performance Optimization on Ascend, Biren, and Cambricon

Portugal - Performance Optimization on Ascend, Biren, and Cambricon

România - Performance Optimization on Ascend, Biren, and Cambricon

Sverige - Performance Optimization on Ascend, Biren, and Cambricon

Türkiye - Performance Optimization on Ascend, Biren, and Cambricon

Malta - Performance Optimization on Ascend, Biren, and Cambricon

Belgique - Performance Optimization on Ascend, Biren, and Cambricon

France - Performance Optimization on Ascend, Biren, and Cambricon

日本 - Performance Optimization on Ascend, Biren, and Cambricon

Australia - Performance Optimization on Ascend, Biren, and Cambricon

Malaysia - Performance Optimization on Ascend, Biren, and Cambricon

New Zealand - Performance Optimization on Ascend, Biren, and Cambricon

Philippines - Performance Optimization on Ascend, Biren, and Cambricon

Singapore - Performance Optimization on Ascend, Biren, and Cambricon

Thailand - Performance Optimization on Ascend, Biren, and Cambricon

Vietnam - Performance Optimization on Ascend, Biren, and Cambricon

India - Performance Optimization on Ascend, Biren, and Cambricon

Argentina - Performance Optimization on Ascend, Biren, and Cambricon

Chile - Performance Optimization on Ascend, Biren, and Cambricon

Costa Rica - Performance Optimization on Ascend, Biren, and Cambricon

Ecuador - Performance Optimization on Ascend, Biren, and Cambricon

Guatemala - Performance Optimization on Ascend, Biren, and Cambricon

Colombia - Performance Optimization on Ascend, Biren, and Cambricon

México - Performance Optimization on Ascend, Biren, and Cambricon

Panama - Performance Optimization on Ascend, Biren, and Cambricon

Peru - Performance Optimization on Ascend, Biren, and Cambricon

Uruguay - Performance Optimization on Ascend, Biren, and Cambricon

Venezuela - Performance Optimization on Ascend, Biren, and Cambricon

Polska - Performance Optimization on Ascend, Biren, and Cambricon

United Kingdom - Performance Optimization on Ascend, Biren, and Cambricon

South Korea - Performance Optimization on Ascend, Biren, and Cambricon

Pakistan - Performance Optimization on Ascend, Biren, and Cambricon

Sri Lanka - Performance Optimization on Ascend, Biren, and Cambricon

Bulgaria - Performance Optimization on Ascend, Biren, and Cambricon

Bolivia - Performance Optimization on Ascend, Biren, and Cambricon

Indonesia - Performance Optimization on Ascend, Biren, and Cambricon

Kazakhstan - Performance Optimization on Ascend, Biren, and Cambricon

Moldova - Performance Optimization on Ascend, Biren, and Cambricon

Morocco - Performance Optimization on Ascend, Biren, and Cambricon

Tunisia - Performance Optimization on Ascend, Biren, and Cambricon

Kuwait - Performance Optimization on Ascend, Biren, and Cambricon

Oman - Performance Optimization on Ascend, Biren, and Cambricon

Slovakia - Performance Optimization on Ascend, Biren, and Cambricon

Kenya - Performance Optimization on Ascend, Biren, and Cambricon

Nigeria - Performance Optimization on Ascend, Biren, and Cambricon

Botswana - Performance Optimization on Ascend, Biren, and Cambricon

Slovenia - Performance Optimization on Ascend, Biren, and Cambricon

Croatia - Performance Optimization on Ascend, Biren, and Cambricon

Serbia - Performance Optimization on Ascend, Biren, and Cambricon

Bhutan - Performance Optimization on Ascend, Biren, and Cambricon

Nepal - Performance Optimization on Ascend, Biren, and Cambricon

Uzbekistan - Performance Optimization on Ascend, Biren, and Cambricon