X-ScaleAI-DPU is a high performance solution to accelerate CPU-based distributed DNN training by utilizing the capabilities of data processing units (DPUs).
Features
- Exploiting HPC Technologies for CPU-based deep learning
- Offload DNN training tasks to the DPU
- User friendly Python interface to run DL applications on the CPU and DPU
- Fine-tuned MPI library for CPU and DPU systems
- Distributed Training with Pytorch using Horovod
- “Out of the box” optimal performance on CPU+DPU platforms
- Tested on several DNNs and datasets with up to 17% improvement in performance.
- Simple installation and execution in one command
- Coming Soon: support for more system configurations
Installation
X-ScaleAI-DPU offers a one-command installation process.

Sample Run
X-ScaleAI also offers a simple run command.

Performance
System Configuration
- Two Intel(R) Xeon(R) 16-core CPUs (32 total) E5-2697A V4 @ 2.60 GHz
- NVIDIA BlueField-2 SoC, HDR100 100Gb/s InfiniBand/VPI adapters
- Memory: 256GB DDR4 2400MHz RDIMMs per node
- 1TB 7.2K RPM SSD 2.5″ hard drive per node
- NVIDIA ConnectX-6 HDR/HDR100 200/100Gb/s InfiniBand/VPI adapters with Socket Direc


- Up to 17% improvement in training performance using X-ScaleAI-DPU
- Consistent improvement with scaling up to 32 nodes
- Performance improvement across different DL models and datasets