Scheduling GPUs

In TORQUE 2.5.4 and later, users can request GPUs on a node at job submission by specifying a nodes resource request, using the qsub -l option. The number of GPUs a node has must be specified in the nodes file (see Server node file configuration). The GPU is then reported in the output of pbsnodes:

napali

state = free

np = 2

ntype = cluster

status = rectime=1288888871,varattr=,jobs=,state=free,netload=1606207294,gres=tom:!

/home/dbeer/dev/scripts/dynamic_resc.sh,loadave=0.10,ncpus=2,physmem=3091140kb,availmem=32788032348kb,

totmem=34653576492kb,idletime=4983,nusers=3,nsessions=14,sessions=3136 1805 2380 2428 1161 3174 3184

3191 3209 3228 3272 3333 20560 32371,uname=Linux napali 2.6.32-25-generic #45-Ubuntu SMP Sat Oct 16 19:52:42

UTC 2010 x86_64,opsys=linux

mom_service_port = 15002

mom_manager_port = 15003

gpus = 1

The $PBS_GPUFILE has been created to include GPU awareness. The GPU appears as a separate line in $PBS_GPUFILE and follows this syntax:

If a job were submitted to run on a server called "napali" (the submit command would look something like: qsub test.sh -l nodes=1:ppn=2:gpus=1), the $PBS_GPUFILE would contain:

It is left up to the job's owner to make sure that the job executes properly on the GPU. By default, TORQUE treats GPUs exactly the same as ppn (which corresponds to CPUs).

3.7 Scheduling GPUs