Composer#
Composer provides well-engineered implementations of efficient training methods to give the tools that help you train a better model for cheaper.
Using Composer, you can:
Train an ImageNet model to 76.1% accuracy for $37 (with vanilla PyTorch:$127)
Train a GPT-2 125M to a perplexity of 23.9 for $148 (with vanilla PyTorch: $255)
Use start-of-the-art implementations of methods to speed up your own training loop.
Composer features:
20+ efficient training methods for training a better language and vision models! Donโt waste hours trying to reproduce research papers when Composer has done the work for you.
Easy-to-use Trainer interface written to be as performant as possible, and integrated best practices.
Easy-to-use Functional forms that allow you to integrate efficient training methods into your training loop!
Strong, reproducible baselines to get you started as ๐จ fast ๐จ as possible
See Getting Started for installation an initial usage, the Trainer section for an introduction to our trainer, and Methods for details about our efficiency methods and how to use them in your code.
At MosaicML, we are focused on making training ML models accessible. To do this, we continually productionize state-of-the-art academic research on efficient model training, and also study the combinations` of these methods in order to ensure that model training is โจ as efficient as possible โจ.
If you have any questions, please feel free to reach out to us on Twitter, Email, or join our Slack channel!
Composer is part of the broader Machine Learning community, and we welcome any contributions, pull requests, or issues.
Table of Contents#
- ๐ Methods Overview
- ๐ฅธ ALiBi
- ๐จ AugMix
- ๐ BlurPool
- ๐บ Channels Last
- ๐ CutMix
- ๐๏ธ ColOut
- โ๏ธ Cutout
- ๐๏ธโโ๏ธ Decoupled Weight Decay
- โ Factorize
- ๐ป Ghost BatchNorm
- ๐ง Label Smoothing
- โ๏ธ Layer Freezing
- ๐ฅฃ MixUp
- ๐๏ธ Progressive Image Resizing
- ๐ฒ RandAugment
- โ๏ธ Scale Schedule
- โ๏ธ Scaling Laws
- โฎ๏ธ Selective Backprop
- ๐ Sequence Length Warmup
- ๐๏ธ Sharpness Aware Minimization (SAM)
- ๐ซ Squeeze-and-Excitation
- ๐ง Stochastic Depth (Block-Wise)
- ๐ฐ Stochastic Depth (Sample-Wise)
- ๐งฉ Stochastic Weight Averaging