You are here: 11 Resource Managers and Interfaces > Resource Manager Extensions

11.3 Resource Manager Extensions

All resource managers are not created equal. There is a wide range in what capabilities are available from system to system. Additionally, there is a large body of functionality that many, if not all, resource managers have no concept of. A good example of this is job QoS. Since most resource managers do not have a concept of quality of service, they do not provide a mechanism for users to specify this information. In many cases, Moab is able to add capabilities at a global level. However, a number of features require a per job specification. Resource manager extensions allow this information to be associated with the job.

11.3.1 Resource Manager Extension Specification

Specifying resource manager extensions varies by resource manager. Torque, OpenPBS, PBSPro, Loadleveler, LSF, S3, and Wiki each allow the specification of an extension field as described in the following table:

Resource manager Specification method
Torque 2.0+

-l

> qsub -l nodes=3,qos=high sleepy.cmd
Torque 1.x/OpenPBS

-W x=

> qsub -l nodes=3 -W x=qos:high sleepy.cmd

OpenPBS does not support this ability by default but can be patched as described in the PBS Resource Manager Extension Overview.

Loadleveler

#@comment

#@nodes = 3
#@comment = qos:high
LSF

-ext

> bsub -ext advres:system.2
PBSPro

-l

> qsub -l advres=system.2

Use of PBSPro resources requires configuring the server_priv/resourcedef file to define the needed extensions as in the following example:

advres type=string
qos    type=string
sid    type=string
sjid   type=string

Wiki

comment

comment=qos:high

11.3.2 Resource Manager Extension Values

Using the resource manager specific method, the following job extensions are currently available:

ADVRES
BANDWIDTH
CPUCLOCK
DDISK
DEADLINE
DEPEND
DMEM
EPILOGUE
EXCLUDENODES
FEATURE
GATTR
GMETRIC
GPUs
GRES and SOFTWARE
HOSTLIST
JGROUP
JOBFLAGS (aka FLAGS)
JOBREJECTPOLICY
MAXMEM
MAXPROC
MEM
MICs
MINPREEMPTTIME
MINPROCSPEED
MINWCLIMIT
MSTAGEIN
MSTAGEOUT
NACCESSPOLICY
NALLOCPOLICY
NCPUS
NMATCHPOLICY
NODESET
NODESETCOUNT
NODESETDELAY
NODESETISOPTIONAL
OPSYS
PARTITION
PMEM
PREF
PROCS
PROLOGUE
PVMEM
QoS
QUEUEJOB
REQATTR
RESFAILPOLICY
RMTYPE
SIGNAL
GRES and SOFTWARE
SPRIORITY
TEMPLATE
TERMTIME
TPN
TRIG
TRL (Format 1)
TRL (Format 2)
VAR
VC
VMEM
ADVRES
Format [!]<RSVID>
Description

Specifies that reserved resources are required to run the job. If <RSVID> is specified, then only resources within the specified reservation may be allocated (see Job to Reservation Binding).

You can request to not use a specific reservation by using advres=!<reservationname>.

Example
> qsub -l advres=grid.3

Resources for the job must come from grid.3.

> qsub -l advres=!grid.5

Resources for the job must not come from grid.5

BANDWIDTH
Format <DOUBLE> (in MB/s)
Description Minimum available network bandwidth across allocated resources (See Network Management.).
Example
> bsub -ext bandwidth=120 chemjob.txt
CPUCLOCK
Format <STRING>
Description

Specify the CPU clock frequency for each node requested for this job. A cpuclock request applies to every processor on every node in the request. Specifying varying CPU frequencies for different nodes or different processors on nodes in a single job request is not supported.

Not all CPUs support all possible frequencies or ACPI states. If the requested frequency is not supported by the CPU, the nearest frequency is used.

If a job does not place any load on the node then some OSs will drop the frequency below the requested frequency.

Using cpuclock sets NODEACCESSPOLICY to SINGLEJOB.

ALPS 1.4 or later is required when using cpuclock on Cray.

The clock frequency can be specified via:

  • a number that indicates the clock frequency (with or without the SI unit suffix).

  • a Linux power governor policy name. The governor names are:
    • performance: This governor instructs Linux to operate each logical processor at its maximum clock frequency.

      This setting consumes the most power and workload executes at the fastest possible speed.

    • powersave: This governor instructs Linux to operate each logical processor at its minimum clock frequency.

      This setting executes workload at the slowest possible speed. This setting does not necessarily consume the least amount of power since applications execute slower, and may actually consume more energy because of the additional time needed to complete the workload's execution.

    • ondemand: This governor dynamically switches the logical processor's clock frequency to the maximum value when system load is high and to the minimum value when the system load is low.

      This setting causes workload to execute at the fastest possible speed or the slowest possible speed, depending on OS load. The system switches between consuming the most power and the least power.

      The power saving benefits of ondemand might be non-existent due to frequency switching latency if the system load causes clock frequency changes too often.

      This has been true for older processors since changing the clock frequency required putting the processor into the C3 "sleep" state, changing its clock frequency, and then waking it up, all of which required a significant amount of time.

      Newer processors, such as the Intel Xeon E5-2600 Sandy Bridge processors, can change clock frequency dynamically and much faster.

    • conservative: This governor operates like the ondemand governor but is more conservative in switching between frequencies. It switches more gradually and uses all possible clock frequencies.

      This governor can switch to an intermediate clock frequency if it seems appropriate to the system load and usage, which the ondemand governor does not do.

  • an ACPI performance state (or P-state) with or without the P prefix. P-states are a special range of values (0-15) that map to specific frequencies. Not all processors support all 16 states, however, they all start at P0. P0 sets the CPU clock frequency to the highest performance state which runs at the maximum frequency. P15 sets the CPU clock frequency to the lowest performance state which runs at the lowest frequency.

When reviewing job or node properties when cpuclock was used, be mindful of unit conversion. The OS reports frequency in Hz, not MHz or GHz.

If a job does not place any load on the node then some OSs will drop the frequency below the requested frequency.

Example
msub -l cpuclock=1800,nodes=2 script.sh
msub -l cpuclock=1800mhz,nodes=2 script.sh

This job requests 2 nodes and specifies their CPU frequencies should be set to 1800 MHz.

msub -l cpuclock=performance,nodes=2 script.sh

This job requests 2 nodes and specifies their CPU frequencies should be set to the performance power governor policy.

msub -l cpuclock=3,nodes=2 script.sh
msub -l cpuclock=p3,nodes=2 script.sh

This job requests 2 nodes and specifies their CPU frequencies should be set to a performance state of 3.

DDISK
Format <INTEGER>
Default 0
Description Dedicated disk per task in MB.
Example
> qsub -l ddisk=2000
DEADLINE
Format

Relative time: [[[DD:]HH:]MM:]SS

Absolute time: hh:mm:ss_mm/dd/yy

Description

Either the relative completion deadline of job (from job submission time) or an absolute deadline in which you specify the date and time the job will finish.

Example:
> qsub -l deadline=2:00:00,nodes=4 /tmp/bio3.cmd

The job's deadline is 2 hours after its submission.

DEPEND
Format [<DEPENDTYPE>:][{jobname|jobid}.]<ID>[:[{jobname|jobid}.]<ID>]...
Description Allows specification of job dependencies for compute or system jobs. If no ID prefix (jobname or jobid) is specified, the ID value is interpreted as a job ID.
Example
# submit job which will run after job 1301 and 1304 complete
> msub -l depend=orion.1301:orion.1304 test.cmd
orion.1322
# submit jobname-based dependency job
> msub -l depend=jobname.data1005 dataetl.cmd
orion.1428
DMEM
Format <INTEGER>
Default 0
Description Dedicated memory per task in bytes.
Example
> msub -l dmem=20480

Moab will dedicate 20 MB of memory to the task.

EPILOGUE
Format <STRING>
Description Specifies a user owned epilogue script which is run before the system epilogue and epilogue.user scripts at the completion of a job. The syntax is epilogue=<file>. The file can be designated with an absolute or relative path.

This parameter works only with Torque.

Example
> msub -l epilogue=epilogue_script.sh job.sh
EXCLUDENODES
Format {<nodeid>|<node_range>}[:...]
Description Specifies nodes that should not be considered for the given job.
Example
> msub -l excludenodes=k1:k2:k[5-8]
# Comma separated ranges work only with SLURM
> msub -l excludenodes=k[1-2,5-8]
FEATURE
Format <FEATURE>[{:|}<FEATURE>]...
Description Required list of node attribute/node features.

If the pipe (|) character is used as a delimiter, the features are logically OR'd together and the associated job may use resources that match any of the specified features.

Requesting node names as features will result in the job being blocked from running.

Example
> qsub -l feature='fastos:bigio' testjob.cmd
GATTR
Format <STRING>
Description Generic job attribute associated with job. The maximum size for an attribute is 63 bytes (the core Moab size limit of 64, including a null byte)
Example
> qsub -l gattr=bigjob
GMETRIC
Format Generic metric requirement for allocated nodes where the requirement is specified using the format <GMNAME>[:{lt:,le:,eq:,ge:,gt:,ne:}<VALUE>]
Description Indicates generic constraints that must be found on all allocated nodes. If a <VALUE> is not specified, the node must simply possess the generic metric (See Generic Metrics for more information.).
Example
> qsub -l gmetric=bioversion:ge:133244 testj.txt
GPUs
Format

msub -l nodes=<VALUE>:ppn=<VALUE>:gpus=<VALUE>[:mode][:reseterr]

Where mode is one of:

exclusive - The default setting. The GPU is used exclusively by one process thread.

exclusive_thread - The GPU is used exclusively by one process thread.

exclusive_process - The GPU is used exclusively by one process regardless of process thread.

If present, reseterr resets the ECC memory bit error counters. This only resets the volatile error counts, or errors since the last reboot. The permanent error counts are not affected.

Moab passes the mode and reseterr portion of the request to Torque for processing.

Moab does not support requesting GPUs as a GRES. Submitting msub -l gres=gpus:x does not work.

Description Moab schedules GPUs as a special type of node-locked generic resources. When Torque reports GPUs to Moab, Moab can schedule jobs and correctly assign GPUs to ensure that jobs are scheduled efficiently. To have Moab schedule GPUs, configure them in Torque then submit jobs using the "GPU" attribute. Moab automatically parses the "GPU" attribute and assigns them in the correct manner. For information about GPU metrics, see GPGPUMetrics.
Examples
> msub -l nodes=2:ppn=2:gpus=1:exclusive_process:reseterr

Submits a job that requests 2 tasks, 2 processors and 1 GPU per task (2 GPUs total). Each GPU runs only threads related to the task and resets the volatile ECC memory big error counts at job start time.

> msub -l nodes=4:gpus=1,tpn=2

Submits a job that requests 4 tasks, 1 GPU per node (4 GPUs total), and 2 tasks per node. Each GPU is dedicated exclusively to one task process and the ECC memory bit error counters are not reset.

> msub -l nodes=4:gpus=1:reseterr

Submits a job that requests 4 tasks, 1 processor and 1 GPU per task (4 GPUs total). Each GPU is dedicated exclusively to one task process and resets the volatile ECC memory bit error counts at job start time.

> msub -l nodes=4:gpus=2+1:ppn=2,walltime=600

Submits a job that requests two different types of tasks, the first is 4 tasks, each with 1 processor and 2 gpus, and the second is 1 task with 2 processors. Each GPU is dedicated exclusively to one task process and the ECC memory bit error counters are not reset.

GRES and SOFTWARE
Format Percent sign (%) delimited list of generic resources where each resource is specified using the format <RESTYPE>[{+|:}<COUNT>]
Description Indicates generic resources required by the job. If the generic resource is node-locked, it is a per-task count. If a <COUNT> is not specified, the resource count defaults to 1.
Example
> qsub -W x=GRES:tape+2%matlab+3 testj.txt

When specifying more than one generic resource with -l, use the percent (%) character to delimit them.

> qsub -l gres=tape+2%matlab+3 testj.txt
> qsub -l software=matlab:2 testj.txt
HOSTLIST
Format Comma (,) or plus (+) delimited list of hostnames. Ranges and regular expressions are supported in msub only.
Description

Indicates an exact set, superset, or subset of nodes on which the job must run. Use the caret (^) or asterisk (*) characters to specify a host list as superset or subset respectively.

An exact set is defined without a caret or asterisk. An exact set means all the hosts in the specified hostlist must be selected for the job.

A subset means the specified hostlist is used first to select hosts for the job. If the job requires more hosts than are in the subset hostlist, they will be obtained from elsewhere if possible. If the job does not require all of the nodes in the subset hostlist, it will use only the ones it needs.

A superset means the hostlist is the only source of hosts that should be considered for running the job. If the job can't find the necessary resources in the superset hostlist it should not run. No other hosts should be considered in allocating the job.

Torque ignores hostlist as an extension. Hostlist is only supported in Moab.

Examples
> msub -l hostlist=nodeA+nodeB+nodeE

hostlist=foo[1-5]

This is an exact set of (foo1,foo2,...,foo5). The job must run on all these nodes.


hostlist=foo1+foo[3-9]

This is an exact set of (foo1,foo3,foo4,...,foo9). The job must run on all these nodes.


hostlist=foo[1,3-9]

This is an exact set of the same nodes as the previous example.


hostlist=foo[1-3]+bar[72-79]

This is an exact set of (foo1,foo2,foo3,bar72,bar73,...,bar79). The job must run on all these nodes.

hostlist=^node[1-50]

This is a superset of (node1,node2,...,node50). These are the only nodes that can be considered for the job. If the necessary resources for the job are not in this hostlist, the job is not run. If the job does not require all the nodes in this hostlist, it will use only the ones that it needs.

hostlist=*node[15-25]

This is a subset of (node15,node16,...,node25). The nodes in this hostlist are considered first for the job. If the necessary resources for the job are not in this hostlist, Moab tries to obtain the necessary resources from elsewhere. If the job does not require all the nodes in this hostlist, it will use only the ones that it needs.

JGROUP
Format <JOBGROUPID>
Description ID of job group to which this job belongs (different from the GID of the user running the job).
Example
> msub -l JGROUP=bluegroup
JOBFLAGS (aka FLAGS)
Format One or more of the following colon delimited job flags including ADVRES[:RSVID], NOQUEUE, NORMSTART, PREEMPTEE, PREEMPTOR, RESTARTABLE, or SUSPENDABLE (see job flag overview for a complete listing).
Description Associates various flags with the job.
Example
> qsub -l nodes=1,walltime=3600,jobflags=advres myjob.py
JOBREJECTPOLICY
Format: One or more of CANCEL, HOLD, IGNORE, MAIL, or RETRY
Default: HOLD
Details:

Specifies the action to take when the scheduler determines that a job can never run. CANCEL issues a call to the resource manager to cancel the job. HOLD places a batch hold on the job preventing the job from being further evaluated until released by an administrator.

Administrators can dynamically alter job attributes and possibly fix the job with mjobctl -m.

With IGNORE, the scheduler will allow the job to exist within the resource manager queue but will neither process it nor report it. MAIL will send email to both the admin and the user when rejected jobs are detected. If RETRY is set, then Moab will allow the job to remain idle and will only attempt to start the job when the policy violation is resolved.  Any combination of attributes may be specified.

This is a per-job policy specified with msub -l. JOBREJECTPOLICY also exists as a global parameter.

 

Also see QOSREJECTPOLICY.

Example:
> msub -l jobrejectpolicy=cancel:mail
MAXMEM
Forma: <INTEGER> (in megabytes)
Description Maximum amount of memory the job may consume across all tasks before the JOBMEM action is taken.
Example
> qsub -l x=MAXMEM:1000mb bw.cmd

If a RESOURCELIMITPOLICY is set for per-job memory utilization, its action will be taken when this value is reached.

MAXPROC
Format <INTEGER>
Description Maximum CPU load the job may consume across all tasks before the JOBPROC action is taken.
Example
> qsub -W x=MAXPROC:4 bw.cmd

If a RESOURCELIMITPOLICY is set for per-job processor utilization, its action will be taken when this value is reached.

MEM
Format <INTEGER>
Description Specify the maximum amount of physical memory used by the job. If you do not specify MB or GB, Moab uses bytes if your resource manger is Torque and MB if your resource manager is Native.
Example
> msub -l nodes=4:ppn=2,mem=1024mb

The job must have 4 compute nodes with 2 processors per node. The job is limited to 1024 MB of memory.

MICs
Format

msub -l nodes=<VALUE>:ppn=<VALUE>:mics=<VALUE>[:mode]

Where mode is one of:

exclusive - The default setting. The MIC is used exclusively by one process thread.

exclusive_thread - The MIC is used exclusively by one process thread.

exclusive_process - The MIC is used exclusively by one process regardless of process thread.

Moab passes the mode portion of the request to Torque for processing.

Moab does not support requesting MICs as a GRES. Submitting msub -l gres=mics:x does not work.

Description Moab schedules MICs as a special type of node-locked generic resources. When Torque reports MICs to Moab, Moab can schedule jobs and correctly assign MICs to ensure that jobs are scheduled efficiently. To have Moab schedule MICs , configure them in Torque then submit jobs using the "MIC" attribute. Moab automatically parses the "MIC" attribute and assigns them in the correct manner.
Examples
> msub -l nodes=2:ppn=2:mics=1:exclusive_process

Submits a job that requests 2 tasks, 2 processors and 1 MIC per task (2 MICs total). Each MIC runs only threads related to the task.

> msub -l nodes=4:mics=1,tpn=2

Submits a job that requests 4 tasks, 1 MIC per node (4 MICs total), and 2 tasks per node. Each MIC is dedicated exclusively to one task process.

> msub -l nodes=4:mics=1

Submits a job that requests 4 tasks, 1 processor and 1 MIC per task (4 MICs total). Each MIC is dedicated exclusively to one task process.

> msub -l nodes=4:mics=2+1:ppn=2,walltime=600

Submits a job that requests two different types of tasks, the first is 4 tasks, each with 1 processor and 2 MICs , and the second is 1 task with 2 processors. Each MIC is dedicated exclusively to one task process.

MINPREEMPTTIME
Format [[DD:]HH:]MM:]SS
Description Minimum time job must run before being eligible for preemption.

Can only be specified if associated QoS allows per-job preemption configuration by setting the preemptconfig flag.

Example
> qsub -l minpreempttime=900 bw.cmd

Job cannot be preempted until it has run for 15 minutes.

MINPROCSPEED
Format <INTEGER>
Default 0
Description Minimum processor speed (in MHz) for every node that this job will run on.
Example
> qsub -W x=MINPROCSPEED:2000 bw.cmd

Every node that runs this job must have a processor speed of at least 2000 MHz.

MINWCLIMIT
Format [[DD:]HH:]MM:]SS
Default ---
Description Minimum wallclock limit job must run before being eligible for extension (See JOBEXTENDDURATION or JOBEXTENDSTARTWALLTIME.).
Example
> qsub -l minwclimit=300,walltime=16000 bw.cmd

Job will run for at least 300 seconds but up to 16,000 seconds if possible (without interfering with other jobs).

MSTAGEIN
Format [<SRCURL>[|<SRCRUL>...]%]<DSTURL>
Description

Indicates a job has data staging requirements. The source URL(s) listed will be transferred to the execution system for use by the job. If more than one source URL is specified, the destination URL must be a directory.

The format of <SRCURL> is: [PROTO://][HOST][:PORT]][/PATH]where the path is local.

The format of <DSTURL> is:

[PROTO://][HOST][:PORT]][/PATH]where the path is remote.

PROTO can be any of the following protocols: ssh, file, or gsiftp.
HOST is the name of the host where the file resides.
PATH is the path of the source or destination file. The destination path may be a directory when sending a single file and must be a directory when sending multiple files. If a directory is specified, it must end with a forward slash (/).

Valid variables include:
$JOBID
$HOME - Path the script was run from
$RHOME - Home dir of the user on the remote system
$SUBMITHOST
$DEST - This is the Moab where the job will run
$LOCALDATASTAGEHEAD

If no destination is given, the protocol and file name will be set to the same as the source.

The $RHOME (remote home directory) variable is for when a user's home directory on the compute node is different than on the submission host.

Example:
> msub -Wx='mstagein=file://$HOME/helperscript.sh|file:///home/dev/datafile.txt%ssh://host/home/dev/' script.sh
Copy helperscript.sh and datafile.txt from the local machine to /home/dev/ on host for use in execution of script.sh. $HOME is a path containing a preceding / (i.e. /home/adaptive)
MSTAGEOUT
Format [<SRCURL>[|<SRCRUL>...]%]<DSTURL>
Description Indicates whether a job has data staging requirements. The source URL(s) listed will be transferred from the execution system after the completion of the job. If more than one source URL is specified, the destination URL must be a directory.

The format of <SRCURL> is: [PROTO://][HOST][:PORT]][/PATH]where the path is remote.

The format of <DSTURL> is: [PROTO://][HOST][:PORT]][/PATH]where the path is local.

PROTO can be any of the following protocols: ssh, file, or gsiftp.
HOST is the name of the host where the file resides.
PATH is the path of the source or destination file. The destination path may be a directory when sending a single file and must be a directory when sending multiple files. If a directory is specified, it must end with a forward slash (/).

Valid variables include:
$JOBID
$HOME - Path the script was run from
$RHOME - Home dir of the user on the remote system
$SUBMITHOST
$DEST - This is the Moab where the job will run
$LOCALDATASTAGEHEAD

If no destination is given, the protocol and file name will be set to the same as the source.

The $RHOME (remote home directory) variable is for when a user's home directory on the compute node is different than on the submission host.

Example
> msub -W x='mstageout=ssh://$DEST/$HOME/resultfile1.txt|ssh://host/home/dev/resultscript.sh%file:///home/dev/' script.sh									

Copy resultfile1.txt and resultscript.sh from the execution system to /home/dev/ after the execution of script.sh is complete. $HOME is a path containing a preceding / (i.e. /home/adaptive).

NACCESSPOLICY
Format One of SHARED, SINGLEJOB, SINGLETASK, SINGLEUSER, or UNIQUEUSER
Description Specifies how node resources should be accessed. (See Node Access Policies for more information).

The naccesspolicy option can only be used to make node access more constraining than is specified by the system, partition, or node policies. If the effective node access policy is shared, naccesspolicy can be set to singleuser, if the effective node access policy is singlejob, naccesspolicy can be set to singletask.

Example
> qsub -l naccesspolicy=singleuser bw.cmd
> bsub -ext naccesspolicy=singleuser lancer.cmd

Job can only allocate free nodes or nodes running jobs by same user.

> qsub -l naccesspolicy=singlejob jobscript.sh
# OR
> qsub -W x=naccesspolicy:singlejob jobscript.sh

Jobs can only run on specific nodes; regardless if the machine has free cores.

NALLOCPOLICY
Format One of the valid settings for the parameter NODEALLOCATIONPOLICY
Description Specifies how node resources should be selected and allocated to the job. (See Node Allocation Policies for more information.)
Example
> qsub -l nallocpolicy=minresource bw.cmd

Job should use the minresource node allocation policy.

NCPUS
Format <INTEGER>
Description

The number of processors in one task where a task cannot span nodes. If NCPUS is used, then the resource manager's SUBMITPOLICY should be set to NODECENTRIC to get correct behavior. -l ncpus=<#> is equivalent to -l nodes=1:ppn=<#>when JOBNODEMATCHPOLICY is set to EXACTNODE. NCPUS is used when submitting jobs to an SMP. When using GPUs to submit to an SMP, use -1 ncpus=<#>:GPUs=<#>.

You cannot request both ncpus and nodes in the same job.

NMATCHPOLICY
Format One of the valid settings for the parameter JOBNODEMATCHPOLICY
Description Specifies how node resources should be selected and allocated to the job.
Example
> qsub -l nodes=2 -W x=nmatchpolicy:exactnode bw.cmd

Job should use the EXACTNODEJOBNODEMATCHPOLICY.

NODESET
Format <SETTYPE>:<SETATTR>[:<SETLIST>]
Description Specifies nodeset constraints for job resource allocation (See the NodeSet Overview for more information.).
Example
> qsub -l nodeset=ONEOF:FEATURE:fastos:hiprio:bigmem bw.cmd
NODESETCOUNT
Format <INTEGER>
Description Specifies how many node sets a job uses.
Example
> msub -l nodesetcount=2
NODESETDELAY
Format [[[DD:]HH:]MM:]SS
Description

Causes Moab to attempt to span a job evenly across nodesets unless doing so delays the job beyond the requested NODESETDELAY.

Example
> qsub -l nodesetdelay=300,walltime=16000 bw.cmd
NODESETISOPTIONAL
Format <BOOLEAN>
Description Specifies whether the nodeset constraint is optional (See the NodeSet Overview for more information.).

Requires SCHEDCFG[] FLAGS=allowperjobnodesetisoptional.

Example
> msub -l nodesetisoptional=true bw.cmd
OPSYS
Format <OperatingSystem>
Description Specifies the job's required operating system.
Example
> qsub -l nodes=1,opsys=rh73 chem92.cmd
PARTITION
Format <STRING>[:<STRING>]...
Description Specifies the partition (or partitions) in which the job must run.

The job must have access to this partition based on system wide or credential based partition access lists.

Example
> qsub -l nodes=1,partition=math:geology

The job must only run in the math partition or the geology partition.

PMEM
Format <INTEGER>
Description

Specifies the maximum amount of physical memory used by any single process of the job.

Example
> msub -l nodes=4:ppn=2,pmem=1024mb

The job must have 4 compute nodes with 2 processors per node, and each process of the job is limited to 1024 MB of physical memory.

PREF
Format [{feature|variable}:]<STRING>[:<STRING>]...

If feature or variable are not specified, then feature is assumed.

Description Specifies which node features are preferred by the job and should be allocated if available. If preferred node criteria are specified, Moab favors the allocation of matching resources but is not bound to only consider these resources.

Preferences are not honored unless the node allocation policy is set to PRIORITY and the PREF priority component is set within the node's PRIORITYF attribute.

Example
> qsub -l nodes=1,pref=bigmem

The job may run on any nodes but prefers to allocate nodes with the bigmem feature.


PROCS
Format <INTEGER>
Description

Requests a specific amount of processors for the job. Instead of users trying to determine the amount of nodes they need, they can instead decide how many processors they need and Moab will automatically request the appropriate amount of nodes from the RM. This also works with feature requests, such as procs=12[:feature1[:feature2[-]]].

Using this resource request overrides any other processor or node related request, such as nodes=4.

Example
> msub -l procs=32 myjob.pl

Moab will request as many nodes as is necessary to meet the 32-processor requirement for the job.


PROLOGUE
Format <STRING>
Description Specifies a user owned prologue script which will be run after the system prologue and prologue.user scripts at the beginning of a job. The syntax isprologue=<file>. The file can be designated with an absolute or relative path.

This parameter works only with Torque.

Example
> msub -l prologue=prologue_script.sh job.s

PVMEM
Format <INTEGER>
Description Specify the maximum amount of virtual memory used by any single process in the job.
Example
> msub -l nodes=4:ppn=2,pvmem=1024mb

The job must have 4 compute nodes with 2 processors per node, and each process of the job is limited to 1024 MB of virtual memory.

QoS
Format <STRING>
Description Requests the specified QoS for the job.
Example
> qsub -l walltime=1000,qos=highprio biojob.cmd
QUEUEJOB
Format

<BOOLEAN>

Default TRUE
Description Indicates whether or not the scheduler should queue the job if resources are not available to run the job immediately
Example
> msub -l nodes=1,queuejob=false test.cmd
REQATTR
Format Required node attributes with version number support: reqattr=[<must|must not|should|should not>]:<ATTRIBUTE>[{>=|>|<=|<|=}<VERSION>]
Description

Indicates required node attributes. Values may include letters, numbers, dashes, underscores, and spaces.

You can choose one of four requirement types for each node attribute you request: 

  • must – The node on which this job runs must include the attribute at the value specified. If no node matches this requirement, Moab will not schedule the job.
  • must not – The node on which this job runs must not include the attribute at the value specified. If no node matches this requirement, Moab will not schedule the job.
  • should – If possible, the node on which this job runs should include the attribute at the value specified. If no node matches this requirement, Moab selects a node without it.
  • should not – If possible, the node on which this job runs should not include the attribute at the value specified. If no node matches this requirement, Moab selects a node without it.

If you do not specify a requirement type, Moab assumes "must."

For information about using reqattr to request dynamic features, see Configuring dynamic features in Torque and Moab.

Example
> qsub -l reqattr=matlab=7.1 testj.txt
RESFAILPOLICY
Format One of CANCEL, HOLD, IGNORE, NOTIFY, or REQUEUE
Description Specifies the action to take on an executing job if one or more allocated nodes fail. This setting overrides the global value specified with the NODEALLOCRESFAILUREPOLICY parameter.
Example
> msub -l resfailpolicy=ignore

For this particular job, ignore node failures.

RMTYPE
Format <STRING>
Description One of the resource manager types currently available within the cluster or grid. Typically, this is one of PBS, LSF, LL, SGE, SLURM, BProc, and so forth.
Example
> msub -l rmtype=ll
Only run job on a Loadleveler destination resource manager.
SIGNAL
Format <INTEGER>[@<OFFSET>]
Description Specifies the pre-termination signal to be sent to a job prior to it reaching its walltime limit or being terminated by Moab. The optional offset value specifies how long before job termination the signal should be sent. By default, the pre-termination signal is sent one minute before a job is terminated
Example
> msub -l signal=32@120 bio45.cmd
SPRIORITY
Format <INTEGER>
Default 0
Description Allows Moab administrators to set a system priority on a job (similar to setspri). This only works if the job submitter is an administrator.
Example
> qsub -l nodes=16,spriority=100 job.cmd
TEMPLATE
Format <STRING>
Description Specifies a job template to be used as a set template. The requested template must have SELECT=TRUE (See Job Templates.).
Example
> msub -l walltime=1000,nodes=16,template=biojob job.cmd
TERMTIME
Format <TIMESPEC>
Default 0
Description Specifies the time at which Moab should cancel a queued or active job (See Job Deadline Support.).
Example
> msub -l nodes=10,walltime=600,termtime=12:00_Jun/14 job.cmd
TPN
Format <INTEGER>[+]
Default 0
Description Tasks per node allowed on allocated hosts. If the plus (+) character is specified, the tasks per node value is interpreted as a minimum tasks per node constraint; otherwise it is interpreted as an exact tasks per node constraint.

Differences between TPN and PPN:

There are two key differences between the following: (A) qsub -l nodes=12:ppn=3 and (B) qsub -l nodes=12,tpn=3.

The first difference is that ppn is interpreted as the minimum required tasks per node while tpn defaults to exact tasks per node; case (B) executes the job with exactly 3 tasks on each allocated node while case (A) executes the job with at least 3 tasks on each allocated node-nodeA:4,nodeB:3,nodeC:5

The second major difference is that the line, nodes=X:ppn=Y actually requests X*Y tasks, whereas nodes=X,tpn=Y requests only X tasks.

TPN with Torque as an RM:

Moab interprets nodes loosely as procs. Torque interprets nodes as the number of nodes from the actual number of nodes that you have in your nodes file, not your total number of procs. This means that if Torque is your resource manager and you specify msub -l nodes=16:tpn=8 but do not have 16 nodes, Torque will not run the job. Instead, you should specify msub -l procs=16:tpn=8.

To resolve the problem long term, you can also set server resources_available.nodect to the total number of procs in your system and use msub -l nodes=16:tpn=8 as you would in a non-Torque Moab environment. See resources_available in the Torque 6.0.1 Administrator Guide for more information.

Example
> msub -l nodes=10,walltime=600,tpn=4 job.cmd
TRIG
Format: <TRIGSPEC>
Description: Adds trigger(s) to the job (See Creating a Trigger for specific syntax.).

Job triggers can only be specified if allowed by the QoS flag trigger. See Enabling Job Triggers for more information.

Example:
> qsub -l trig=etype=start\&atype=exec\&action="/tmp/email.sh job.cmd"
TRL (Format 1)
Format <INTEGER>[@<INTEGER>][:<INTEGER>[@<INTEGER>]]...
Default: 0
Description: Specifies alternate task requests with their optional walltimes (See Malleable Jobs.).
Example:
> msub -l trl=2@500:4@250:8@125:16@62 job.cmd

or
> qsub -l trl=2:3:4
TRL (Format 2)
Format <INTEGER>-<INTEGER>
Default 0
Description Specifies a range of task requests that require the same walltime (See Malleable Jobs.).
Example
> msub -l trl=32-64 job.cmd

For optimization purposes Moab does not perform an exhaustive search of all possible values but will at least do the beginning, the end, and 4 equally distributed choices in between.

VAR
Format <ATTR>[:<VALUE>]
Description Adds a generic variable or variables to the job.
Example
> msub -l VAR=testvar1:testvalue1

Single variable

> msub -l VAR=testvar1:testvalue1+testvar2:testvalue2+testvar3:testvalue3

Multiple variables

VC
Format vc=<NAME>
Description Submits the job or workflow to a virtual container (VC).
Example
vc=vc13
VMEM
Format: <INTEGER>
Description: Specify the maximum amount of virtual memory used by all concurrent processes in the job.
Example:
> msub -l nodes=4:ppn=2,vmem=1024mb

The job must have 4 compute nodes with 2 processors per node, and the job is limited to 1024 MB of virtual memory.

11.3.3 Resource Manager Extension Examples

If more than one extension is required in a given job, extensions can be concatenated with a semicolon separator using the format <ATTR>:<VALUE>[;<ATTR>:<VALUE>]...

Example 11-1:

#@comment="HOSTLIST:node1,node2;QOS:special;SID:silverA"

Job must run on nodes node1 and node2 using the QoS special. The job is also associated with the system ID silverAallowing the silver daemon to monitor and control the job.

Example 11-2:

# PBS -W x=\"NODESET:ONEOF:NETWORK;DMEM:64\"

Job will have resources allocated subject to network based nodeset constraints. Further, each task will dedicate 64 MB of memory.

Example 11-3:  

>  qsub -l nodes=4,walltime=1:00:00 -W x="FLAGS:ADVRES:john.1"

Job will be forced to run within the john.1 reservation.

11.3.4 Configuring dynamic features in Torque and Moab

Used together, the reqattr RM extension and Torque $varattr parameter allow you to create jobs that request resources that may change or disappear. For example, if you wanted a job to request a certain version of Octave but different versions are configured on each node and updated at any time, you can create a script that searches for the feature and version on the nodes at a specified interval. Your Moab job can then retrieve the dynamic node attributes from the latest poll and use them for scheduling.

This functionality is available when you use the Torque $varattr parameter to configure a script that regularly retrieves updates on the nodes' feature(s) and the reqattr RM extension to require a feature with a certain value.

To set up a dynamic feature in Torque and Moab

  1. Create a script that pulls the information you need. For instance, the following script pulls the version of Octave on each node and prints it.
    #!/bin/bash
    # pull the version string for octave and print it for $varattr
    version_str=`octave -v | grep version`
    [[ $version_str =~ ([[:digit:]].[[:digit:]].[[:digit:]]) ]]
    echo "Octave: ${BASH_REMATCH[1]}"
  2. Use the Torque $varattr parameter to configure the script. Specify both the number of seconds between each time Torque runs the script and the path to the script. If you set the seconds to -1, the script will run just once. You may include arguments if desired. In the following example, the varattr parameter specifies that Torque calls the Octave script every 30 seconds.
    $varattr 30 /usr/local/scripts/octave.sh
  3. Submit your job in Moab, specifying reqattr as a resource. In this example, the job requests a node where the octave feature has a value of 3.2.4 (that the node has Octave version 3.2.4 installed).
    > msub -l reqattr=octave=3.2.4 myJob.sh

    Your job requests a node with Octave version 3.2.4. Torque passes the most recent (pulled within the last 30 seconds) version of Octave on each node. Moab then schedules the job on a node that currently has Octave 3.2.4.

Related Topics 

© 2016 Adaptive Computing