A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing

K. Li, J. Zhou, B. Li, S. Yang, S. Huang, H. Yu

TCAS-II

PAPER


Abstract

There is an emerging need to design multi-precision floating-point (FP) accelerators for high-performance-computing (HPC) applications. However, the existing multi-precision design using high-precision-split method and low-precision-combination method suffers either low hardware utilization rate and long multiple clock-cycle processing period, respectively. In this paper, a new pipelined multi-precision FP processing element (PE) is developed with proposed redundancy-minimized bit-partitioning method. 3.8× throughput improving is achieved by the elaborate designed pipeline. Compared with the existing multi-precision FPPE method, this work achieves 11.7%, 6.7%, 62.6% enhancement on energy-efficiency at FP16, FP32 and FP64 operations, respectively. Moreover, to further improve the system-level throughput and energy efficiency, a vector systolic accelerator is employed. Benefit from the pipelined vector FP-PE and vector systolic date reuse, the proposed accelerator exhibits the best energy-efficiency performance of 1193 GFLOPS/W at FP16, 298.3 GFLOPS/W at FP32 and 74.6 GFLOPS/W at FP64.

Paper