Friday, February 25

SP1
Daily Remarks and Presentation: SIAG Best Paper Prize: Optimizing Communication-Avoiding Sparse Lu Factorization on Multi-Gpu Clusters

8:20 AM - 9:15 AM
Chair: Stefan Wild, Argonne National Laboratory, U.S.
Sherry Li, Lawrence Berkeley National Laboratory, U.S.
Keita Teranishi, Sandia National Laboratories, U.S.

We present a highly optimized implementation of the communication-avoiding sparse LU factorization algorithm, specifically targeting pre-exascale multi-GPU clusters such as Summit Supercomputer at Oak Ridge National Laboratory.

Prior to this work, distributed memory sparse LU factorization used GPUs mostly as a co-processor because of the relatively smaller DRAM capacity available and limited hardware support for GPU-aware message passing on older GPUs. The current pre-exascale multi-GPU clusters have relatively higher DRAM capacity and hardware support for GPU-aware MPI that allows performing the entire sparse LU factorization on GPU. The challenge is, sparse LU factorization consists of many operations on small and irregular size operands, which makes it difficult to effectively use GPU during all phases on sparse LU factorization.

To overcome such challenges, we (a) redesigned the data structure to reduce the cost of index-algebra on GPUs; b) combined streams with the so-called tree parallelism to schedule multiple operations, and; c) exploit high bandwidth GPU-2-GPU and GPU-aware MPI with the look-ahead factorization techniques to effectively overlap communication with computation.

Our proposed optimizations improve the performance of communication-avoiding sparse LU factorization by up to $3\times$ over offload based GPU acceleration of the same algorithm on single and multiple node configurations on the Summit supercomputer.

Piyush Sao
Oak Ridge National Laboratory, U.S.

Xioaye Li
Lawrence Berkeley National Laboratory, U.S.

Rich Vuduc
Georgia Institute of Technology, U.S.

PP22 Home 2022 Program Speaker Index