AMD’s Radeon Instinct™ accelerators unleashed, delivering unprecedented machine intelligence capabilities

AMD is launching a new era in instinctive computing with its Radeon Instinct accelerators, shipping soon to partners to power their deep learning and HPC solutions.

First previewed in December 2016, this new line of GPU server accelerators – Radeon Instinct™ MI25, Radeon Instinct MI8, and Radeon Instinct MI6 – together with AMD’s open ROCm 1.6 software platform, will help dramatically increase performance, efficiency, and ease of implementation, speeding through deep learning inference and training to accelerate the drive to machine intelligence.

Radeon Instinct’s three initial accelerators are designed to address a wide range of machine intelligence applications:

  • The Radeon Instinct™ MI25 accelerator, based on the “Vega” GPU architecture with a 14nm FinFET process, will be the world’s ultimate training accelerator for large-scale machine intelligence and deep learning datacenter applications. The MI25 delivers superior FP16 and FP32 performance[1] in a passively-cooled single GPU server card with 24.6 TFLOPS of FP16 or 12.3 TFLOPS of FP32 peak performance through its 64 compute units (4,096 stream processors). With 16GB of ultra-high bandwidth HBM2 ECC[2] GPU memory and up to 484 GB/s of memory bandwidth, the Radeon Instinct MI25’s design is optimized for massively parallel applications with large datasets for Machine Intelligence and HPC-class system workloads.
  • The Radeon Instinct™ MI8 accelerator, harnessing the high-performance, energy-efficiency of the “Fiji” GPU architecture, is a small form factor HPC and inference accelerator with 8.2 TFLOPS of peak FP16|FP32 performance at less than 175W board power and 4GB of High-Bandwidth Memory (HBM) on a 512-bit memory interface. The MI8 is well suited for machine learning inference and HPC applications.
  • The Radeon Instinct™ MI6 accelerator, based on the acclaimed “Polaris” GPU architecture, is a passively cooled inference accelerator with 5.7 TFLOPS of peak FP16|FP32 performance at 150W peak board power and 16GB of ultra-fast GDDR5 GPU memory on a 256-bit memory interface. The MI6 is a versatile accelerator ideal for HPC and machine learning inference and edge-training deployments.

 

Radeon Instinct hardware is fueled by AMD’s open-source software solutions, including:

  • Planned for June 29th rollout, the ROCm 1.6 software platform with performance improvements and now support for MIOpen 1.0 is scalable and fully open source providing a flexible, powerful heterogeneous compute solution for a new class of hybrid Hyperscale and HPC-class system workloads. Comprised of an open-source Linux® driver optimized for scalable multi-GPU computing, the ROCm software platform provides multiple programming models, the HIP CUDA conversion tool, and support for GPU acceleration using the Heterogeneous Computing Compiler (HCC).
  • The open-source MIOpen GPU-accelerated library is now available with the ROCm platform and supports machine intelligence frameworks including planned support for Caffe, TensorFlow and Torch.

We plan to ship Radeon Instinct products to our technology partners (including Boxx, Colfax, Exxact Corporation, Gigabyte, Inventec and Supermicro, among others) to power their deep learning and HPC solutions starting in Q3.

For more information, visit Radeon.com/Instinct.

[1] TFLOPS calculations: FLOPS calculations are performed by taking the engine clock from the highest DPM state and multiplying it by 64 CUs per GPU. Then, multiplying that number by 64 shader units, which exist in each CU. Then, that number is multiplied by 2 FLOPS per clock for FP32. To calculate TFLOPS for FP16, 4 FLOPS per clock were used. The FP64 TFLOPS rate is calculated using 1/16th rate.

[2] ECC support is limited to the HBM2 memory and ECC protection is not provided for internal GPU structures.