I want to submit a few thousand single-threaded jobs in a slurm job-array, but I want to leave some space in our cluster for others. How can I run on 4 nodes for example, each having 28 cores? So far I have the below test script, where %112
limits the number of cores to that of 4 nodes (4*28=112), but the jobs are spread out over all nodes. Commands like -N 4
or --exclusive
seem to refer to each job in the array, not the full array.
#!/usr/bin/env python3
#SBATCH -D .
####SBATCH -N 4 --exclusive # not successful
#SBATCH -t 1:00:00
#SBATCH -J teeeest
#SBATCH --array=1-2000%112 # limit nr. of tasks
#SBATCH --output=outs/test_%A_%4a.out
#SBATCH --error=errs/test_%A_%4a.err
# Important 3-second task
import os
from time import time
t1 = time()+3
su=0
while time()<t1:
su+=1e-16
print(
"hej", su,
os.environ["SLURM_ARRAY_TASK_ID"],
os.environ["SLURM_JOB_NODELIST"],
)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…