If you are fortunate enough that the cluster is consistent in the types of nodes that host the GPUs, and that the features
of the nodes a properly specified and allow distinguishing between the nodes that host the different GPU types, you can use the --constraint
parameter.
For the sake of the argument, let's assume that the nodes that host the titanV
have haswell
CPUs, and those that host the titanX
have skylake
CPUs and that those are defined as features. Then, you can request
--gres=gpu:2
--constraint=[haswell|skylake]
If the above does not apply to your use case, you can submit two jobs and keep only the one that starts the earliest. For that, give your jobs an identical name, and use the singleton
dependency.
Write a submission script like this one
#!/bin/bash
#SBATCH --dependency=singleton
#SBATCH --job-name=gpujob
# Other options
scancel --state=PENDING --jobname=gpujob
# etc.
and submit it twice with
$ sbatch --gres=gpu:titanX:2 submit.sh
$ sbatch --gres=gpu:titanV:2 submit.sh
Each job will be assigned only one type of GPU, and the first one that starts will cancel the other one. This approach can scale up with more than two GPU types.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…