Reconfigurable Array
PE: Multi-Precision FP Vector
Data Type |
Input Shape |
Output Shape |
INT8 |
4×32 |
4×1 |
BF16 |
4×16 |
4×1 |
FP16 |
4×16 |
4×1 |
FP32 |
2×8 |
2×1 |
TF32 |
2×8 |
2×1 |
FP64 |
1×4 |
1×1 |
Array: Reconfigurable Vector Systolic Array
Data Type |
Input Matrix |
Output Matrix |
clk |
Tips |
INT8 |
[16×32] * [32×16] |
[16×16] |
16 |
INT32 output |
BF16 |
[16×16] * [16×16] |
[16×16] |
16 |
|
FP16 |
[16×16] * [16×16] |
[16×16] |
16 |
|
FP32 |
[8×8] * [8×8] |
[8×8] |
8 |
|
TF32 |
[8×8] * [8×8] |
[8×8] |
8 |
|
FP64 |
[4×4] * [4×4] |
[4×4] |
4 |
|
Why Systolic?
- Data-reuse, reducing Fanin Fanout.
- Data-stationary, reducing toggle rate.
Why Vector?
- Vector, 规模效益/紧耦合可以最大限度的提高能效
- Multi-precision本身就是vetor的, 实现高精度DataType的运算意味着可以实现更多的低精度运输
Why Pipeline?
- Reduce critical path, 减少信号毛刺/静态功耗, Energy Efficiency/Throughput
- 对齐不同组建/不同模式下的critical path
进度&展望
- PE部分正在重构, Array已经完成
- Next Step: 评估和优化, 若效果理想, 会进行下一步的系统实现