Table of Contents
FastMoE Related #
If you are interested in listing your papers here, please post an issue on FastMoE’s GitHub Repo
- FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models PPoPP’22
- Boost the performance of FastMoE using multiple parallel techniques.
- BaGuaLu: targeting brain scale pretrained models with over 37 million cores PPoPP’22
- Training a 174-trillion-parameter MoE model based on FastMoE.
- FastMoE: A Fast Mixture-of-Expert Training System arXiv preprint
- Introduction to the core FastMoE system.
Talks about Fast(er)MoE #
Other Systems #
- Lita: Accelerating Distributed Training of Sparsely Activated Models arxiv
- SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System arxiv
- Optimizing Mixture of Experts using Dynamic Recompilations arxiv
- HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System arxiv
- DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale arxiv
- Tutel: An efficient mixture-of-experts implementation for large DNN model training github blog arxiv
- BASE Layers: Simplifying Training of Large, Sparse Models ICML’21
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding ICLR’21