5.681 Requesting Resources

Various resources can be requested at the time of job submission. A job can request a particular node, a particular node attribute, or even a number of nodes with particular attributes. Either native Torque resources or external scheduler resource extensions may be specified.

qsub -l supports:

5.681.1 Native Torque Resources

The native Torque resources are listed in the following table.

Resource Format Description
arch string Specifies the administrator defined system architecture required. This defaults to whatever the PBS_MACH string is set to in "local.mk".
cput seconds, or [[HH:]MM;]SS Maximum amount of CPU time used by all processes in the job.
cpuclock string

Specify the CPU clock frequency for each node requested for this job. A cpuclock request applies to every processor on every node in the request. Specifying varying CPU frequencies for different nodes or different processors on nodes in a single job request is not supported.

Not all processors support all possible frequencies or ACPI states. If the requested frequency is not supported by the CPU, the nearest frequency is used.

ALPS 1.4 or later is required when using cpuclock on Cray.

The clock frequency can be specified via:

  • a number that indicates the clock frequency (with or without the SI unit suffix).

    qsub -l cpuclock=1800,nodes=2 script.sh
    qsub -l cpuclock=1800mhz,nodes=2 script.sh

    This job requests 2 nodes and specifies their CPU frequencies should be set to 1800 MHz.

  • a Linux power governor policy name. The governor names are:
    • performance: This governor instructs Linux to operate each logical processor at its maximum clock frequency.

      This setting consumes the most power and workload executes at the fastest possible speed.

    • powersave: This governor instructs Linux to operate each logical processor at its minimum clock frequency.

      This setting executes workload at the slowest possible speed. This setting does not necessarily consume the least amount of power since applications execute slower, and may actually consume more energy because of the additional time needed to complete the workload's execution.

    • ondemand: This governor dynamically switches the logical processor's clock frequency to the maximum value when system load is high and to the minimum value when the system load is low.

      This setting causes workload to execute at the fastest possible speed or the slowest possible speed, depending on OS load. The system switches between consuming the most power and the least power.

      The power saving benefits of ondemand might be non-existent due to frequency switching latency if the system load causes clock frequency changes too often.

      This has been true for older processors since changing the clock frequency required putting the processor into the C3 "sleep" state, changing its clock frequency, and then waking it up, all of which required a significant amount of time.

      Newer processors, such as the Intel Xeon E5-2600 Sandy Bridge processors, can change clock frequency dynamically and much faster.

    • conservative: This governor operates like the ondemand governor but is more conservative in switching between frequencies. It switches more gradually and uses all possible clock frequencies.

      This governor can switch to an intermediate clock frequency if it seems appropriate to the system load and usage, which the ondemand governor does not do.

    qsub -l cpuclock=performance,nodes=2 script.sh

    This job requests 2 nodes and specifies their CPU frequencies should be set to the performance power governor policy.

  • an ACPI performance state (or P-state) with or without the P prefix. P-states are a special range of values (0-15) that map to specific frequencies. Not all processors support all 16 states, however, they all start at P0. P0 sets the CPU clock frequency to the highest performance state which runs at the maximum frequency. P15 sets the CPU clock frequency to the lowest performance state which runs at the lowest frequency.

    qsub -l cpuclock=3,nodes=2 script.sh
    qsub -l cpuclock=p3,nodes=2 script.sh

    This job requests 2 nodes and specifies their CPU frequencies should be set to a performance state of 3.

When reviewing job or node properties when cpuclock was used, be mindful of unit conversion. The OS reports frequency in Hz, not MHz or GHz.

epilogue string

Specifies a user owned epilogue script which will be run before the system epilogue and epilogue.user scripts at the completion of a job. The syntax is epilogue=<file>. The file can be designated with an absolute or relative path.

For more information, see Prologue and Epilogue Scripts.

feature string

Specifies a property or feature for the job. Feature corresponds to Torque node properties and Moab features.

qsub script.sh -l procs=10,feature=bigmem
file size

Sets RLIMIT_FSIZE for each process launched through the TM interface.

See FILEREQUESTISJOBCENTRIC for information on how Moab schedules.

host string Name of the host on which the job should be run. This resource is provided for use by the site's scheduling policy. The allowable values and effect on job placement is site dependent.
mem size

Maximum amount of physical memory used by the job. Ignored on Darwin, Digital Unix, Free BSD, HPUX 11, IRIX, NetBSD, and SunOS. Not implemented on AIX and HPUX 10.

The mem resource will only work for single-node jobs. If your job requires multiple nodes, use pmem instead.

ncpus integer

The number of processors in one task where a task cannot span nodes.

You cannot request both ncpus and nodes in the same job.

nice integer Number between -20 (highest priority) and 19 (lowest priority). Adjust the process execution priority.
nodes {<node_count> |
<hostname>} [:ppn=<ppn>][:gpus=<gpu>]
[:<property>[:<property>]...] [+ ...]

Number and/or type of nodes to be reserved for exclusive use by the job. The value is one or more node_specs joined with the + (plus) character: node_spec[+node_spec...]. Each node_spec is a number of nodes required of the type declared in the node_spec and a name of one or more properties desired for the nodes. The number, the name, and each property in the node_spec are separated by a : (colon). If no number is specified, one (1) is assumed. The name of a node is its hostname. The properties of nodes are:

  • ppn=# - Specify the number of virtual processors per node requested for this job.

    The number of virtual processors available on a node by default is 1, but it can be configured in the TORQUE_HOME/server_priv/nodes file using the np attribute (see Server Node File Configuration). The virtual processor can relate to a physical core on the node or it can be interpreted as an "execution slot" such as on sites that set the node np value greater than the number of physical cores (or hyper-thread contexts). The ppn value is a characteristic of the hardware, system, and site, and its value is to be determined by the administrator.

  • gpus=# - Specify the number of GPUs per node requested for this job.

    The number of GPUs available on a node can be configured in the TORQUE_HOME/server_priv/nodes file using the gpu attribute (see Server Node File Configuration). The GPU value is a characteristic of the hardware, system, and site, and its value is to be determined by the administrator.

  • property - A string assigned by the system administrator specifying a node's features. Check with your administrator as to the node names and properties available to you.

Torque does not have a TPN (tasks per node) property. You can specify TPN in Moab Workload Manager with Torque as your resource manager, but Torque does not recognize the property when it is submitted directly to it via qsub.

See qsub -l nodes for examples.

By default, the node resource is mapped to a virtual node (that is, directly to a processor, not a full physical compute node). This behavior can be changed within Maui or Moab by setting the JOBNODEMATCHPOLICY parameter. See Moab Parameters in the Moab Workload Manager Administrator Guide for more information.

All nodes in Torque have their own name as a property. You may request a specific node by using it's name in the nodes request. Multiple nodes can be requested this way by using '+' as a delimiter. For example:

qsub -l nodes=node01:ppn=3+node02:ppn=6

See the HOSTLIST RM extension in the Moab Workload Manager Administrator Guide for more information.

opsys string Specifies the administrator defined operating system as defined in the MOM configuration file.
other string

Allows a user to specify site specific information. This resource is provided for use by the site's scheduling policy. The allowable values and effect on job placement is site dependent.

This does not work for msub using Moab and Maui.

pcput seconds, or [[HH:]MM:]SS Maximum amount of CPU time used by any single process in the job.
pmem size Maximum amount of physical memory used by any single process of the job. (Ignored on Fujitsu. Not implemented on Digital Unix and HPUX.)
procs procs=<integer>

(Applicable in version 2.5.0 and later.) The number of processors to be allocated to a job. The processors can come from one or more qualified node(s). Only one procs declaration may be used per submitted qsub command.

> qsub -l nodes=3 -1 procs=2

procs_bitmap string

A string made up of 1's and 0's in reverse order of the processor cores requested. A procs_bitmap=1110 means the job requests a node that has four available cores, but the job runs exclusively on cores two, three, and four. With this bitmap, core one is not used.

For more information, see Scheduling Cores.

prologue string

Specifies a user owned prologue script which will be run after the system prologue and prologue.user scripts at the beginning of a job. The syntax is prologue=<file>. The file can be designated with an absolute or relative path.

For more information, see Prologue and Epilogue Scripts.

pvmem size Maximum amount of virtual memory used by any single process in the job. (Ignored on Unicos.)
size integer

For Torque, this resource has no meaning. It is passed on to the scheduler for interpretation. In the Moab scheduler, the size resource is intended for use in Cray installations only.

software string Allows a user to specify software required by the job. This is useful if certain software packages are only available on certain systems in the site. This resource is provided for use by the site's scheduling policy. The allowable values and effect on job placement is site dependent. See License Management in the Moab Workload Manager Administrator Guide for more information.
vmem size Maximum amount of virtual memory used by all concurrent processes in the job. (Ignored on Unicos.)
walltime seconds, or [[HH:]MM:]SS Maximum amount of real time during which the job can be in the running state.

size

The size format specifies the maximum amount in terms of bytes or words. It is expressed in the form integer[suffix]. The suffix is a multiplier defined in the following table ("b" means bytes [the default] and "w" means words). The size of a word is calculated on the execution server as its word size.

Suffix Multiplier
b w 1
kb kw 1024
mb mw 1,048,576
gb gw 1,073,741,824
tb tw 1,099,511,627,776

5.681.2 Interpreting Resource Requests

The table below shows how various requests are interpreted in the qsub -l syntax and corresponding cgroup usage.

Memory parameters (mem, pmem, vmem, pvmem) may specify units (examples: mem=1024mb, mem=320kb, mem=1gb). Recognized units are kb (kilobytes), mb (megabytes), gb (gigabytes), tb (terabyte), pb (petabytes), and eb (exabyte). If units are not specified, mb (megabytes) is assumed.

Example 5-365: Interpreting qsub -l requests

Usage

Description

node=X:ppn=Y Creates X tasks that will use Y lprocs per task.
procs=X Creates X tasks that will use 1 lproc each.
ncpus=X Creates 1 task that will use X lprocs.
mem=X The entire job will use X memory, divided evenly among the tasks.*
pmem=X Each task will use X memory. In translation, -l nodes=1:ppn=4,pmem=1gb will use 4 GB of memory.*

vmem=X

The entire job will use X swap, divided evenly among the tasks. If legacy_vmem is set to true in the server, then the entire specified value will be given per host.**
pvmem=X Each task will use X swap. In translation, -l nodes=1:ppn=4,pvmem=1gb will use 4 GB of swap.**

*If both mem and pmem are specified, the less restrictive of the two will be used as the limit for the job. For example, qsub job.sh -l nodes=2:ppn=2,mem=4gb,pmem=1gb will apply the mem requested instead of pmem, because it will allow 2 GB per task (4 GB/2 tasks) instead of 1 GB per task.

**If both vmem and pvmem are specified, the less restrictive of the two will be used as the limit for the job. For example, qsub job.sh -l nodes=2:ppn=2,vmem=4gb,pvmem=1gb will apply pvmem instead of vmem, because it will allow 2 GB swap per task (1 GB * 2 ppn) instead of .5 GB per task (1 GB/2 tasks).

5.681.3 Interpreting Node Requests

The table below shows how various qsub -l nodes requests are interpreted.

Example 5-366: qsub -l nodes

Usage Description
> qsub -l nodes=12 Request 12 nodes of any type
> qsub -l nodes=2:server+14 Request 2 "server" nodes and 14 other nodes (a total of 16) - this specifies two node_specs, "2:server" and "14"
> qsub -l nodes=server:hippi+10:noserver+3:bigmem:hippi Request (a) 1 node that is a "server" and has a "hippi" interface, (b) 10 nodes that are not servers, and (c) 3 nodes that have a large amount of memory and have hippi
> qsub -l nodes=b2005+b1803+b1813 Request 3 specific nodes by hostname
> qsub -l nodes=4:ppn=2 Request 2 processors on each of four nodes
> qsub -l nodes=1:ppn=4 Request 4 processors on one node
> qsub -l nodes=2:blue:ppn=2+red:ppn=3+b1014 Request 2 processors on each of two blue nodes, three processors on one red node, and the compute node "b1014"

Example 5-367:  

This job requests a node with 200MB of available memory:

> qsub -l mem=200mb /home/user/script.sh

Example 5-368:  

This job will wait until node01 is free with 200MB of available memory:

> qsub -l nodes=node01,mem=200mb /home/user/script.sh

5.681.4 Moab Job Extensions

qsub -l supports these Moab scheduler job extensions:

advres

cpuclock

deadline

depend

ddisk

dmem

energy_used

epilogue

feature

flags

gattr

geometry

gmetric

gres

hostlist

image

jgroup

jobflags

latency

loglevel

minprocspeed

minpreempttime

minwclimit

naccesspolicy

nallocpolicy

nodeset

opsys

os

partition

pref

procs

procs_bitmap

prologue

qos

queuejob

reqattr

retrycount

retrycc

rmtype

select

sid

signal

stagein

spriority

subnode

subnode_list

taskdistpolicy

template, termsig

termtime

tid

tpn

trig

trl

var

vcores

wcrequeue

Related Topics 

© 2016 Adaptive Computing