MPS is not required to use MPI
If you don't use MPS, but you launch multiple MPI ranks per node (i.e. per GPU), then if you have the compute mode set to default, then your GPU activity will serialize. If you have your compute mode set to EXCLUSIVE_PROCESS or EXCLUSIVE_THREAD, you'll get errors when multiple MPI ranks attempt to use a single GPU.
CUDA MPS documentation is available here.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…