(Click to open topic with navigation)
All resource managers are not created equal. There is a wide range in what capabilities are available from system to system. Additionally, there is a large body of functionality that many, if not all, resource managers have no concept of. A good example of this is job QoS. Since most resource managers do not have a concept of quality of service, they do not provide a mechanism for users to specify this information. In many cases, Moab is able to add capabilities at a global level. However, a number of features require a per job specification. Resource manager extensions allow this information to be associated with the job.
Specifying resource manager extensions varies by resource manager. TORQUE, OpenPBS, PBSPro, Loadleveler, LSF, S3, and Wiki each allow the specification of an extension field as described in the following table:
Resource manager | Specification method |
---|---|
TORQUE 2.0+ |
-l > qsub -l nodes=3,qos=high sleepy.cmd |
TORQUE 1.x/OpenPBS |
-W x= > qsub -l nodes=3 -W x=qos:high sleepy.cmd OpenPBS does not support this ability by default but can be patched as described in the PBS Resource Manager Extension Overview. |
Loadleveler |
#@comment #@nodes = 3 #@comment = qos:high |
LSF |
-ext > bsub -ext advres:system.2 |
PBSPro |
-l > qsub -l advres=system.2 Use of PBSPro resources requires configuring the server_priv/resourcedef file to define the needed extensions as in the following example: advres type=string qos type=string sid type=string sjid type=string |
Wiki |
comment comment=qos:high |
Using the resource manager specific method, the following job extensions are currently available:
ADVRES | |
---|---|
Format | [!]<RSVID> |
Description |
Specifies that reserved resources are required to run the job. If <RSVID> is specified, then only resources within the specified reservation may be allocated (see Job to Reservation Binding). You can request to not use a specific reservation by using advres=!<reservationname>. |
Example |
> qsub -l advres=grid.3 Resources for the job must come from grid.3. > qsub -l advres=!grid.5 Resources for the job must not come from grid.5 |
BANDWIDTH | |
---|---|
Format | <DOUBLE> (in MB/s) |
Description | Minimum available network bandwidth across allocated resources (See Network Management.). |
Example |
> bsub -ext bandwidth=120 chemjob.txt |
DDISK | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Dedicated disk per task in MB. |
Example |
> qsub -l ddisk=2000 |
DMEM | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Dedicated memory per task in bytes. |
Example |
> msub -l dmem=20480 Moab will dedicate 20 MB of memory to the task. |
FEATURE | |
---|---|
Format | <FEATURE>[{:|}<FEATURE>]... |
Description | Required list of node attribute/node features.
If the pipe (|) character is used as a delimiter, the features are logically ORed together and the associated job may use resources that match any of the specified features. |
Example |
> qsub -l feature='fastos:bigio' testjob.cmd |
GMETRIC | |
---|---|
Format | Generic metric requirement for allocated nodes where the requirement is specified using the format <GMNAME>[:{lt:,le:,eq:,ge:,gt:,ne:}<VALUE>] |
Description | Indicates generic constraints that must be found on all allocated nodes. If a <VALUE> is not specified, the node must simply possess the generic metric (See Generic Metrics for more information.). |
Example |
> qsub -l gmetric=bioversion:ge:133244 testj.txt |
GPUs | |
---|---|
Format |
msub -l nodes=<VALUE>:ppn=<VALUE>:gpus=<VALUE>[:mode][:reseterr] Where mode is one of: exclusive - The default setting. The GPU is used exclusively by one process thread. exclusive_thread - The GPU is used exclusively by one process thread. exclusive_process - The GPU is used exclusively by one process regardless of process thread. If present, reseterr resets the ECC memory bit error counters. This only resets the volatile error counts, or errors since the last reboot. The permanent error counts are not affected. Moab passes the mode and reseterr portion of the request to TORQUE for processing. Moab does not support requesting GPUs as a GRES. Submitting msub -l gres=gpus:x does not work. |
Description | Moab schedules GPUs as a special type of node-locked generic resources. When TORQUE reports GPUs to Moab, Moab can schedule jobs and correctly assign GPUs to ensure that jobs are scheduled efficiently. To have Moab schedule GPUs, configure them in TORQUE then submit jobs using the "GPU" attribute. Moab automatically parses the "GPU" attribute and assigns them in the correct manner. For information about GPU metrics, see GPGPUMetrics. |
Examples |
> msub -l nodes=2:ppn=2:gpus=1:exclusive_process:reseterr Submits a job that requests 2 tasks, 2 processors and 1 GPU per task (2 GPUs total). Each GPU runs only threads related to the task and resets the volatile ECC memory big error counts at job start time. > msub -l nodes=4:gpus=1,tpn=2 Submits a job that requests 4 tasks, 1 GPU per node (4 GPUs total), and 2 tasks per node. Each GPU is dedicated exclusively to one task process and the ECC memory bit error counters are not reset. > msub -l nodes=4:gpus=1:reseterr Submits a job that requests 4 tasks, 1 processor and 1 GPU per task (4 GPUs total). Each GPU is dedicated exclusively to one task process and resets the volatile ECC memory bit error counts at job start time. > msub -l nodes=4:gpus=2+1:ppn=2,walltime=600 Submits a job that requests two different types of tasks, the first is 4 tasks, each with 1 processor and 2 gpus, and the second is 1 task with 2 processors. Each GPU is dedicated exclusively to one task process and the ECC memory bit error counters are not reset. |
JGROUP | |
---|---|
Format | <JOBGROUPID> |
Description | ID of job group to which this job belongs (different from the GID of the user running the job). |
Example |
> msub -l JGROUP=bluegroup |
JOBFLAGS (aka FLAGS) | |
---|---|
Format | One or more of the following colon delimited job flags including ADVRES[:RSVID], NOQUEUE, NORMSTART, PREEMPTEE, PREEMPTOR, RESTARTABLE, or SUSPENDABLE (see job flag overview for a complete listing). |
Description | Associates various flags with the job. |
Example |
> qsub -l nodes=1,walltime=3600,jobflags=advres myjob.py |
JOBREJECTPOLICY | |
---|---|
Format: | One or more of CANCEL, HOLD, IGNORE (beta), MAIL, or RETRY |
Default: | HOLD |
Details: |
Specifies the action to take when the scheduler determines that a job can never run. CANCEL issues a call to the resource manager to cancel the job. HOLD places a batch hold on the job preventing the job from being further evaluated until released by an administrator. Administrators can dynamically alter job attributes and possibly fix the job with mjobctl -m. With IGNORE (currently in beta), the scheduler will allow the job to exist within the resource manager queue but will neither process it nor report it. MAIL will send email to both the admin and the user when rejected jobs are detected. If RETRY is set, then Moab will allow the job to remain idle and will only attempt to start the job when the policy violation is resolved. Any combination of attributes may be specified. See QOSREJECTPOLICY. This is a per-job policy specified with msub -l. JOBREJECTPOLICY also exists as a global parameter. |
Example: |
> msub -l jobrejectpolicy=cancel:mail |
LOGLEVEL | |
---|---|
Format | <INTEGER> |
Description | Per job log verbosity. |
Example |
> qsub -l -W x=loglevel:5 bw.cmd |
MAXMEM | |
---|---|
Forma: | <INTEGER> (in megabytes) |
Description | Maximum amount of memory the job may consume across all tasks before the JOBMEM action is taken. |
Example |
> qsub -W x=MAXMEM:1000mb bw.cmd If a RESOURCELIMITPOLICY is set for per-job memory utilization, its action will be taken when this value is reached. |
MAXPROC | |
---|---|
Format | <INTEGER> |
Description | Maximum CPU load the job may consume across all tasks before the JOBPROC action is taken. |
Example |
> qsub -W x=MAXPROC:4 bw.cmd If a RESOURCELIMITPOLICY is set for per-job processor utilization, its action will be taken when this value is reached. |
MICs | |
---|---|
Format |
msub -l nodes=<VALUE>:ppn=<VALUE>:mics=<VALUE>[:mode] Where mode is one of: exclusive - The default setting. The MIC is used exclusively by one process thread. exclusive_thread - The MIC is used exclusively by one process thread. exclusive_process - The MIC is used exclusively by one process regardless of process thread. Moab passes the mode portion of the request to TORQUE for processing. Moab does not support requesting MICs as a GRES. Submitting msub -l gres=mics:x does not work. |
Description | Moab schedules MICs as a special type of node-locked generic resources. When TORQUE reports MICs to Moab, Moab can schedule jobs and correctly assign MICs to ensure that jobs are scheduled efficiently. To have Moab schedule MICs , configure them in TORQUE then submit jobs using the "MIC" attribute. Moab automatically parses the "MIC" attribute and assigns them in the correct manner. |
Examples |
> msub -l nodes=2:ppn=2:mics=1:exclusive_process Submits a job that requests 2 tasks, 2 processors and 1 MIC per task (2 MICs total). Each MIC runs only threads related to the task. > msub -l nodes=4:mics=1,tpn=2 Submits a job that requests 4 tasks, 1 MIC per node (4 MICs total), and 2 tasks per node. Each MIC is dedicated exclusively to one task process. > msub -l nodes=4:mics=1 Submits a job that requests 4 tasks, 1 processor and 1 MIC per task (4 MICs total). Each MIC is dedicated exclusively to one task process. > msub -l nodes=4:mics=2+1:ppn=2,walltime=600 Submits a job that requests two different types of tasks, the first is 4 tasks, each with 1 processor and 2 MICs , and the second is 1 task with 2 processors. Each MIC is dedicated exclusively to one task process. |
MINPREEMPTTIME | |
---|---|
Format | [[DD:]HH:]MM:]SS |
Description | Minimum time job must run before being eligible for preemption.
Can only be specified if associated QoS allows per-job preemption configuration by setting the preemptconfig flag. |
Example |
> qsub -l minpreempttime=900 bw.cmd Job cannot be preempted until it has run for 15 minutes. |
MINPROCSPEED | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Minimum processor speed (in MHz) for every node that this job will run on. |
Example |
> qsub -W x=MINPROCSPEED:2000 bw.cmd Every node that runs this job must have a processor speed of at least 2000 MHz. |
MINWCLIMIT | |
---|---|
Format | [[DD:]HH:]MM:]SS |
Default | --- |
Description | Minimum wallclock limit job must run before being eligible for extension (See JOBEXTENDDURATION or JOBEXTENDSTARTWALLTIME.). |
Example |
> qsub -l minwclimit=300,walltime=16000 bw.cmd Job will run for at least 300 seconds but up to 16,000 seconds if possible (without interfering with other jobs). |
NACCESSPOLICY |
|
---|---|
Format | One of SHARED,
SINGLEJOB, SINGLETASK, SINGLEUSER, or UNIQUEUSER |
Description | Specifies how node resources should be accessed. (See Node Access Policies for more information).
The naccesspolicy option can only be used to make node access more constraining than is specified by the system, partition, or node policies. If the effective node access policy is shared, naccesspolicy can be set to singleuser, if the effective node access policy is singlejob, naccesspolicy can be set to singletask. |
Example |
> qsub -l naccesspolicy=singleuser bw.cmd > bsub -ext naccesspolicy=singleuser lancer.cmd Job can only allocate free nodes or nodes running jobs by same user. |
NALLOCPOLICY |
|
---|---|
Format | One of the valid settings for the parameter NODEALLOCATIONPOLICY |
Description | Specifies how node resources should be selected and allocated to the job. (See Node Allocation Policies for more information.) |
Example |
> qsub -l nallocpolicy=minresource bw.cmd Job should use the minresource node allocation policy. |
NCPUS |
|
---|---|
Format | <INTEGER> |
Description |
The number of processors in one task where a task cannot span nodes. If NCPUS is used, then the resource manager's SUBMITPOLICY should be set to NODECENTRIC to get correct behavior. -l ncpus=<#> is equivalent to -l nodes=1:ppn=<#>when JOBNODEMATCHPOLICY is set to EXACTNODE. NCPUS is used when submitting jobs to an SMP. When using GPUs to submit to an SMP, use -1 ncpus=<#>:GPUs=<#>. You cannot request both ncpus and nodes in the same queue. |
NMATCHPOLICY |
|
---|---|
Format | One of the valid settings for the parameter JOBNODEMATCHPOLICY |
Description | Specifies how node resources should be selected and allocated to the job. |
Example |
> qsub -l nodes=2 -W x=nmatchpolicy:exactnode bw.cmd Job should use the EXACTNODE JOBNODEMATCHPOLICY. |
NODESET | |
---|---|
Format | <SETTYPE>:<SETATTR>[:<SETLIST>] |
Description | Specifies node set constraints for job resource allocation (See the Node Set Overview for more information.). |
Example |
> qsub -l nodeset=ONEOF:FEATURE:fastos:hiprio:bigmem bw.cmd |
NODESETCOUNT | |
---|---|
Format | <INTEGER> |
Description | Specifies how many node sets a job uses. |
Example |
> msub -l nodesetcount=2 |
NODESETISOPTIONAL | |
---|---|
Format | <BOOLEAN> |
Description | Specifies whether the nodeset constraint is optional (See the NodeSet Overview for more information.).
Requires SCHEDCFG[] FLAGS=allowperjobnodesetisoptional. |
Example |
> msub -l nodesetisoptional=true bw.cmd |
OPSYS | |
---|---|
Format | <OperatingSystem> |
Description | Specifies the job's required operating system. |
Example |
> qsub -l nodes=1,opsys=rh73 chem92.cmd |
PARTITION | |
---|---|
Format | <STRING>[:<STRING>]... |
Description | Specifies the partition (or partitions) in
which the job must run.
The job must have access to this partition based on system wide or credential based partition access lists. |
Example |
> qsub -l nodes=1,partition=math:geology The job must only run in the math partition or the geology partition. |
PREF | |
---|---|
Format | [{feature|variable}:]<STRING>[:<STRING>]...
If feature or variable are not specified, then feature is assumed. |
Description | Specifies which node
features are preferred by the job and should be allocated if available. If
preferred node criteria are specified, Moab favors the allocation of matching
resources but is not bound to only consider these resources.
Preferences are not honored unless the node allocation policy is set to PRIORITY and the PREF priority component is set within the node's PRIORITYF attribute. |
Example |
> qsub -l nodes=1,pref=bigmem The job may run on any nodes but prefers to allocate nodes with the bigmem feature. |
QoS | |
---|---|
Format | <STRING> |
Description | Requests the specified QoS for the job. |
Example |
> qsub -l walltime=1000,qos=highprio biojob.cmd |
REQATTR | |
---|---|
Format | Required node attributes with version number support: <ATTRIBUTE>[{>=|>|<=|<|=}<VERSION>] |
Description | Indicates required node attributes. For information about using reqattr to request dynamic features, see Configuring dynamic features in TORQUE and Moab. |
Example |
> qsub -l reqattr=matlab=7.1 testj.txt |
RESFAILPOLICY | |
---|---|
Format | One of CANCEL, HOLD, IGNORE, NOTIFY, or REQUEUE |
Description | Specifies the action to take on an executing job if one or more allocated nodes fail. This setting overrides the global value specified with the NODEALLOCRESFAILUREPOLICY parameter. |
Example |
> msub -l resfailpolicy=ignore For this particular job, ignore node failures. |
SPRIORITY | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Allows Moab administrators to set a system priority on a job (similar to setspri). This only works if the job submitter is an administrator. |
Example |
> qsub -l nodes=16,spriority=100 job.cmd |
TEMPLATE | |
---|---|
Format | <STRING> |
Description | Specifies a job template to be used as a set template. The requested template must have SELECT=TRUE (See Job Templates.). |
Example |
> msub -l walltime=1000,nodes=16,template=biojob job.cmd |
TERMTIME | |
---|---|
Format | <TIMESPEC> |
Default | 0 |
Description | Specifies the time at which Moab should cancel a queued or active job (See Job Deadline Support.). |
Example |
> msub -l nodes=10,walltime=600,termtime=12:00_Jun/14 job.cmd |
TPN | |
---|---|
Format | <INTEGER>[+] |
Default | 0 |
Description | Tasks per node allowed on allocated hosts. If the plus (+) character is
specified, the tasks per node value is interpreted as a minimum tasks per node
constraint; otherwise it is interpreted as an exact tasks per node constraint.
Differences between TPN and PPN: There are two key differences between the following: (A) qsub -l nodes=12:ppn=3 and (B) qsub -l nodes=12,tpn=3. The first difference is that ppn is interpreted as the minimum required tasks per node while tpn defaults to exact tasks per node; case (B) executes the job with exactly 3 tasks on each allocated node while case (A) executes the job with at least 3 tasks on each allocated node-nodeA:4,nodeB:3,nodeC:5 The second major difference is that the line, nodes=X:ppn=Y actually requests X*Y tasks, whereas nodes=X,tpn=Y requests only X tasks. TPN with TORQUE as an RM: Moab interprets nodes loosely as procs. TORQUE interprets nodes as the number of nodes from the actual number of nodes that you have in your nodes file, not your total number of procs. This means that if TORQUE is your resource manager and you specify msub -l nodes=16:tpn=8 but do not have 16 nodes, TORQUE will not run the job. Instead, you should specify msub -l procs=16:tpn=8. To resolve the problem long term, you can also set server resources_available.nodect to the total number of procs in your system and use msub -l nodes=16:tpn=8 as you would in a non-TORQUE Moab environment. For more information, see resources_available in the TORQUE Administrator Guide. |
Example |
> msub -l nodes=10,walltime=600,tpn=4 job.cmd |
TRIG | |
---|---|
Format: | <TRIGSPEC> |
Description: | Adds trigger(s) to the job (See Creating a trigger for specific syntax.). |
Example: |
> qsub -l trig=start:exec@/tmp/email.sh job.cmd |
TRL (Format 1) | |
---|---|
Format | <INTEGER>[@<INTEGER>][:<INTEGER>[@<INTEGER>]]... |
Default: | 0 |
Description: | Specifies alternate task requests with their optional walltimes (See Malleable Jobs.). |
Example: |
> msub -l trl=2@500:4@250:8@125:16@62 job.cmd or > qsub -l trl=2:3:4 |
TRL (Format 2) | |
---|---|
Format | <INTEGER>-<INTEGER> |
Default | 0 |
Description | Specifies a range of task requests that require the same walltime (See Malleable Jobs.). |
Example |
> msub -l trl=32-64 job.cmd For optimization purposes Moab does not perform an exhaustive search of all possible values but will at least do the beginning, the end, and 4 equally distributed choices in between. |
VC | |
---|---|
Format | vc=<NAME> |
Description | Submits the job or workflow to a virtual container (VC). |
Example |
vc=vc13 |
If more than one extension is required in a given job, extensions can be concatenated with a semicolon separator using the format <ATTR>:<VALUE>[;<ATTR>:<VALUE>]...
Example 12-1:
#@comment="HOSTLIST:node1,node2;QOS:special;SID:silverA"
Job must run on nodes node1 and node2 using the QoS special. The job is also associated with the system ID silverA allowing the silver daemon to monitor and control the job.
Example 12-2:
# PBS -W x=\"NODESET:ONEOF:NETWORK;DMEM:64\"
Job will have resources allocated subject to network based nodeset constraints. Further, each task will dedicate 64 MB of memory.
Example 12-3:
> qsub -l nodes=4,walltime=1:00:00 -W x="FLAGS:ADVRES:john.1"
Job will be forced to run within the john.1 reservation.
Used together, the reqattr RM extension and TORQUE $varattr parameter allow you to create jobs that request resources that may change or disappear. For example, if you wanted a job to request a certain version of Octave but different versions are configured on each node and updated at any time, you can create a script that searches for the feature and version on the nodes at a specified interval. Your Moab job can then retrieve the dynamic node attributes from the latest poll and use them for scheduling.
This functionality is available when you use the TORQUE $varattr parameter to configure a script that regularly retrieves updates on the nodes' feature(s) and the reqattr RM extension to require a feature with a certain value.
To set up a dynamic feature in TORQUE and Moab
#!/bin/bash
# pull the version string for octave and print it for $varattr
version_str=`octave -v | grep version`
[[ $version_str =~ ([[:digit:]].[[:digit:]].[[:digit:]]) ]]
echo "Octave: ${BASH_REMATCH[1]}"
$varattr 30 /usr/local/scripts/octave.sh
> msub -l rerqattr=octave=3.2.4 myJob.sh
Your job requests a node with Octave version 3.2.4. TORQUE passes the most recent (pulled within the last 30 seconds) version of Octave on each node. Moab then schedules the job on a node that currently has Octave 3.2.4.
Related topics