Abstract. Deep Neural Networks (DNN) have demonstrated highly promising results across computer vision and speech recognition, and are becoming foundational for ubiquitous AI. The computational complexity of these algorithms and a need for high energy-efficiency has led to a surge in research on hardware accelerators. To reduce the latency and energy costs of accessing DRAM, most DNN accelerators are spatial in nature, with hundreds of processing elements operating in parallel and communicating with each other directly.
DNNs are evolving at a rapid rate - leading to myriad layer/operator types of varying shapes. Given a DNN there can be myriad computationally efficient implementations (e.g., via pruning) - leading to structured and unstructured sparsity. Finally, a given DNN can be tiled and partitioned in myriad ways to exploit data reuse. All of the above can lead to irregular dataflow patterns within the accelerator substrate. Getting high mapping efficiency for all these cases is highly challenging in accelerators today that are often tightly coupled 2D grids with rigid near-neighbor connectivity.
In this talk, we will demonstrate a systematic methodology for understanding data reuse opportunities within the algorithm, determine the cost vs benefit for efficiently exploiting them in hardware and a communication-centric methodology for accelerator design, that can provide $\sim 100$% mapping efficiency for arbitrary DNNs shapes, sparsity ratios and mappings.
| PP22 Home 2022 | Program | Speaker Index |