Reconfigurable Array


PE: Multi-Precision FP Vector

Data Type Input Shape Output Shape
INT8 4×32 4×1
BF16 4×16 4×1
FP16 4×16 4×1
FP32 2×8 2×1
TF32 2×8 2×1
FP64 1×4 1×1
Fig1. PE Structure with its Pipeline Fig2. Multi-Presicion ISA Fig3. PE IO Definition

Array: Reconfigurable Vector Systolic Array

Fig4. Vecotr Array Architecture
Data Type Input Matrix Output Matrix clk Tips
INT8 [16×32] * [32×16] [16×16] 16 INT32 output
BF16 [16×16] * [16×16] [16×16] 16
FP16 [16×16] * [16×16] [16×16] 16
FP32 [8×8] * [8×8] [8×8] 8
TF32 [8×8] * [8×8] [8×8] 8
FP64 [4×4] * [4×4] [4×4] 4
Fig5. Configurable Vecotr Array Architecture

Why Systolic?

  • Data-reuse, reducing Fanin Fanout.
  • Data-stationary, reducing toggle rate.

Why Vector?

  • Vector, 规模效益/紧耦合可以最大限度的提高能效
  • Multi-precision本身就是vetor的, 实现高精度DataType的运算意味着可以实现更多的低精度运输

Why Pipeline?

  • Reduce critical path, 减少信号毛刺/静态功耗, Energy Efficiency/Throughput
  • 对齐不同组建/不同模式下的critical path

进度&展望

  • PE部分正在重构, Array已经完成
  • Next Step: 评估和优化, 若效果理想, 会进行下一步的系统实现

Junzhuo Zhou

HaoYu lab, SUSTech