Related Works

Table of Contents

If you are interested in listing your papers here, please post an issue on FastMoE’s GitHub Repo

2022 #

  • FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models PPoPP’22
    • Boost the performance of FastMoE using multiple parallel techniques.
  • BaGuaLu: targeting brain scale pretrained models with over 37 million cores PPoPP’22
    • Training a 174-trillion-parameter MoE model based on FastMoE.

2021 #

  • FastMoE: A Fast Mixture-of-Expert Training System arXiv preprint
    • Introduction to the core FastMoE system.

Talks about Fast(er)MoE #

Other Systems #

  • Lita: Accelerating Distributed Training of Sparsely Activated Models arxiv
  • SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System arxiv
  • Optimizing Mixture of Experts using Dynamic Recompilations arxiv
  • HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System arxiv
  • DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale arxiv
  • Tutel: An efficient mixture-of-experts implementation for large DNN model training github blog arxiv
  • BASE Layers: Simplifying Training of Large, Sparse Models ICML’21
  • GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding ICLR’21

MoE Paper Collections #