Table of Contents
Get Started #
You can get started with FastMoE with docker or in a direct way.
Environment Setup #
On host machine #
First, you need to setup the environment on the host machine.
Then, we recommend the official PyTorch docker image, as the environment is well-setup there. Note that you should use the image with
devel in its tag, rather than
runtime. Theoretically, Pytorch environment on your host machine is not needed.
For example, you can run
docker pull pytorch/pytorch:1.8.0-cuda11.1-cudnn8-devel to get a Pytorch docker image.
Inside the docker #
Run a docker container with commands like:
docker run --name pytorch -it pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel
And use bash to interact with it:
docker exec -ti pytorch /bin/bash
For distributed expert feature, NCCL is required. Inside the docker, you can first check if the NCCL is installed, such as:
$ apt list --installed | grep nccl libnccl-dev/unknown,now 2.8.4-1+cuda11.2 amd64 [installed] libnccl2/unknown,now 2.8.4-1+cuda11.2 amd64 [installed]
If not, you can follow the official documentation to install the right version according to CUDA version (which can be inspected by
nvcc -V) in your docker. After that, you need to setup NCCL in your conda environment, following this.
Finally, you can check NCCL simply with
torch.cuda.nccl.version() in Python. Additionally, there is an official repo for testing NCCL, and it is up to you.
Enter our repo directory inside the well-prepared docker container. By default, the distributed expert feature is disabled. So you need to set environment variable
USE_NCCL=1 to enable it. Use
python setup.py install to easily install our FastMoE, and you can check the installation with:
$ conda list | grep fastmoe fastmoe 0.1.1 pypi_0 pypi
Finally, enjoy using FastMoE for training!
Direct way #
To use FastMoe, CUDA and PyTorch are required.
CUDA Tookit is available at https://developer.nvidia.com/cuda-downloads. Select your operating system and follow instructions on the website to install CUDA. Notice: version of CUDA must match the version of nvidia driver. If you’re not sure whether you have installed nvidia driver or you don’t know its version, you may use
nvidia-smito get information about nvidia driver.
Add CUDA to the list of environmental variables. If you work with Linux, use command
vi ~/.bashrcand add the following content to the end of file (replace X.X with version of CUDA you’ve downloaded):
export PATH=$PATH:/usr/local/cuda-X.X/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-X.X/lib64 export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-X.X
Then don’t forget to use
source ~/.bashrc to update the configurations.
So far, CUDA has been installed successfully, you can use
nvcc --version to check its version.
- PyTorch can be installed with pip. Version
>=1.8.0is required if you want to use Megatron. After installation, run the following Python code:
import torch torch.cuda.is_available() torch.cuda.decive_count()
If result of
torch.cuda.decive_count() returns number of your device, then conguatulations! CUDA and PyTorch run successfully on your device.
If you want to enable distributed expert feature, please download NCCL at https://developer.nvidia.com/nccl/nccl-legacy-downloads. Version of NCCL should be no less than
2.7.5and match the version of PyTorch. You can use function
torch.cuda.nccl.version()to see the version of NCCL required.
Install the ‘deb’ file. If you use Ubuntu or Debian, just use the following commands (nccl_repo_file is your file, XXX and X.X mean version of NCCL and CUDA):
sudo dpkg -i nccl_repo_file.deb sudo apt update sudo apt install libnccl2=XXX+cudaX.X libnccl-dev=XXX+cudaX.X
FastMoe Installation #
- Clone the repo of FastMoe from https://github.com/laekov/fastmoe, and use the following command to install:
python3 setup.py install
- If you need NCCL, set environmental variable
USE_NCCL=1before installation. For example, you may use command as follows:
- Installation finishes. Enjoy FastMoe now! You can try excuting