

If you don’t use Trainer and want to use your own Trainer where you integrated DeepSpeed This document is focused on this feature. Of integration - just supply your custom config file or use our template and you have nothing else to do. Integration of the core DeepSpeed features via Trainer. 🤗 Transformers integrates DeepSpeed via 2 options: Memory Wall for Extreme Scale Deep Learning.ĭeepSpeed ZeRO-2 is primarily used only for training, as its features are of no use to inference.ĭeepSpeed ZeRO-3 can be used for inference as well, since it allows huge models to be loaded on multiple GPUs, which And NVMe-support is described in the paper ZeRO-Infinity: Breaking the GPU ZeRO-Offload has its own dedicated paper: ZeRO-Offload: Democratizing Billion-Scale Model Training.

Optimizer state partitioning (ZeRO stage 1)Ī range of fast CUDA-extension-based optimizers

Performance and Scalability: How To Fit a Bigger Model and Train It FasterĭeepSpeed implements everything described in the ZeRO paper.
