X-ScaleSolutions will present a tutorial and live demo of their newest software offering, MVAPICH2-DPU. The tutorial, Accelerating HPC Applications with MVAPICH2-DPU and Live Demos, will give an overview of MVAPICH2-DPU library. The MVAPICH2-DPU library takes advantage of the DPU features to offload communication components in the MPI library and accelerates HPC applications. It integrates key components enabling full computation and communication overlap, especially with non-blocking collectives. This tutorial will provide an overview of the MVAPICH2-DPU product, main features, and acceleration capabilities for a set of representative HPC applications and benchmarks. Live demos of these applications will be shown to demonstrate the capabilities of the MVAPICH2-DPU product. The tutorial will be led by Dr. Donglai Dai, Chief Engineer, Richmond Liew, Software Engineer, and Nick Sarkauskas, Software Engineer.
In addition, Dr. Donglai Dai will give a talk titled Accelerating HPC and DL Applications using DPUs and Efficient Checkpointing. This talk will present an overview of two products with enhanced capabilities by X-ScaleSolutions. The products are: 1) MVAPICH2-DPU communication library using NVIDIA Bluefield DPUs and 2) SCR-Exa checkpointing-restart library for HPC and Deep Learning applications. The MVAPICH2-DPU library takes advantage of the features to offload communication components in the MPI library and deliver best-in-class scale-up and scale-out performance for HPC and DL applications. It integrates key components enabling full computation and communication overlap, especially with non-blocking collectives. The SCR-Exa product enhances the existing open-source SCR library with: i) significantly increased portability and flexibility for diverse job launchers, resource managers, storage devices with a variety of underlying protocols; ii) new capabilities to launch applications with spare nodes for fast and efficient restart and resume; iii) new python interface and internal core for ease-of-use and improved maintainability and extensibility. This talk will present an overview of the software architectures of MVAPICH2-DPU and SCR-Exa products, discuss the underlying designs and benefits.
X-ScaleSolutions is also a sponsor of MUG 2021.