(Click to open topic with navigation)
All resource managers are not created equal. There is a wide range in what capabilities are available from system to system. Additionally, there is a large body of functionality that many, if not all, resource managers have no concept of. A good example of this is job QoS. Since most resource managers do not have a concept of quality of service, they do not provide a mechanism for users to specify this information. In many cases, Moab is able to add capabilities at a global level. However, a number of features require a per job specification. Resource manager extensions allow this information to be associated with the job.
11.3.1 Resource Manager Extension Specification
Specifying resource manager extensions varies by resource manager. Torque, OpenPBS, PBSPro, Loadleveler, LSF, S3, and Wiki each allow the specification of an extension field as described in the following table:
Resource manager | Specification method |
---|---|
Torque 2.0+ |
-l > qsub -l nodes=3,qos=high sleepy.cmd |
Torque 1.x/OpenPBS |
-W x= > qsub -l nodes=3 -W x=qos:high sleepy.cmd OpenPBS does not support this ability by default but can be patched as described in the PBS Resource Manager Extension Overview. |
Loadleveler |
#@comment #@nodes = 3 #@comment = qos:high |
LSF |
-ext > bsub -ext advres:system.2 |
PBSPro |
-l > qsub -l advres=system.2 Use of PBSPro resources requires configuring the server_priv/resourcedef file to define the needed extensions as in the following example: advres type=string qos type=string sid type=string sjid type=string |
Wiki |
comment comment=qos:high |
11.3.2 Resource Manager Extension Values
All of the following job extensions will work with "msub -l" (or "msub -W x=. . ."). However, "qsub -l" only provides legacy support for a subset of these extensions; see Requesting Resources in the Torque Resource ManagerAdministrator Guide for the list.
If your configuration primarily uses qsub to submit jobs, Adaptive Computing recommends you use the "qsub -W x=" syntax for all submissions with Moab job extensions to avoid qsub rejection for any unsupported (non-legacy) extensions.
The following job extensions are supported when using the resource manager-specific method:
ADVRES | |
---|---|
Format | [!]<RSVID> |
Description |
Specifies that reserved resources are required to run the job. If <RSVID> is specified, then only resources within the specified reservation may be allocated (see Job to Reservation Binding). You can request to not use a specific reservation by using advres=!<reservationname>. |
Example |
> qsub -l advres=grid.3 Resources for the job must come from grid.3. > qsub -l advres=!grid.5 Resources for the job must not come from grid.5 |
BANDWIDTH | |
---|---|
Format | <DOUBLE> (in MB/s) |
Description | Minimum available network bandwidth across allocated resources (See Network Management.). |
Example |
> bsub -ext bandwidth=120 chemjob.txt |
DDISK | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Dedicated disk per task in MB. |
Example |
> qsub -l ddisk=2000 |
DMEM | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Dedicated memory per task in bytes. |
Example |
> msub -l dmem=20480 Moab will dedicate 20 MB of memory to the task. |
FEATURE | |
---|---|
Format | <FEATURE>[{:|}<FEATURE>]... |
Description | Required list of node attribute/node features.
If the pipe (|) character is used as a delimiter, the features are logically OR'd together and the associated job may use resources that match any of the specified features. Requesting node names as features will result in the job being blocked from running. |
Example |
> qsub -l feature='fastos:bigio' testjob.cmd |
GMETRIC | |
---|---|
Format | Generic metric requirement for allocated nodes where the requirement is specified using the format <GMNAME>[:{lt:,le:,eq:,ge:,gt:,ne:}<VALUE>] |
Description | Indicates generic constraints that must be found on all allocated nodes. If a <VALUE> is not specified, the node must simply possess the generic metric (See Generic Metrics for more information.). |
Example |
> qsub -l gmetric=bioversion:ge:133244 testj.txt |
GPUs | |
---|---|
Format |
msub -l nodes=<VALUE>:ppn=<VALUE>:gpus=<VALUE>[:mode][:reseterr] Where mode is one of: exclusive - The default setting. The GPU is used exclusively by one process thread. exclusive_thread - The GPU is used exclusively by one process thread. exclusive_process - The GPU is used exclusively by one process regardless of process thread. If present, reseterr resets the ECC memory bit error counters. This only resets the volatile error counts, or errors since the last reboot. The permanent error counts are not affected. Moab passes the mode and reseterr portion of the request to Torque for processing. Moab does not support requesting GPUs as a GRES. Submitting msub -l gres=gpus:x does not work. |
Description | Moab schedules GPUs as a special type of node-locked generic resources. When Torque reports GPUs to Moab, Moab can schedule jobs and correctly assign GPUs to ensure that jobs are scheduled efficiently. To have Moab schedule GPUs, configure them in Torque then submit jobs using the "GPU" attribute. Moab automatically parses the "GPU" attribute and assigns them in the correct manner. For information about GPU metrics, see GPGPUMetrics. |
Examples |
> msub -l nodes=2:ppn=2:gpus=1:exclusive_process:reseterr Submits a job that requests 2 tasks, 2 processors and 1 GPU per task (2 GPUs total). Each GPU runs only threads related to the task and resets the volatile ECC memory big error counts at job start time. > msub -l nodes=4:gpus=1,tpn=2 Submits a job that requests 4 tasks, 1 GPU per node (4 GPUs total), and 2 tasks per node. Each GPU is dedicated exclusively to one task process and the ECC memory bit error counters are not reset. > msub -l nodes=4:gpus=1:reseterr Submits a job that requests 4 tasks, 1 processor and 1 GPU per task (4 GPUs total). Each GPU is dedicated exclusively to one task process and resets the volatile ECC memory bit error counts at job start time. > msub -l nodes=4:gpus=2+1:ppn=2,walltime=600 Submits a job that requests two different types of tasks, the first is 4 tasks, each with 1 processor and 2 gpus, and the second is 1 task with 2 processors. Each GPU is dedicated exclusively to one task process and the ECC memory bit error counters are not reset. |
JGROUP | |
---|---|
Format | <JOBGROUPID> |
Description | ID of job group to which this job belongs (different from the GID of the user running the job). |
Example |
> msub -l JGROUP=bluegroup |
JOBFLAGS (aka FLAGS) | |
---|---|
Format | One or more of the following colon delimited job flags including ADVRES[:RSVID], NOQUEUE, NORMSTART, PREEMPTEE, PREEMPTOR, RESTARTABLE, or SUSPENDABLE (see job flag overview for a complete listing). |
Description | Associates various flags with the job. |
Example |
> qsub -l nodes=1,walltime=3600,jobflags=advres myjob.py |
JOBREJECTPOLICY | |
---|---|
Format: | One or more of CANCEL, HOLD, IGNORE, MAIL, or RETRY |
Default: | HOLD |
Details: |
Specifies the action to take when the scheduler determines that a job can never run. CANCEL issues a call to the resource manager to cancel the job. HOLD places a batch hold on the job preventing the job from being further evaluated until released by an administrator. Administrators can dynamically alter job attributes and possibly fix the job with mjobctl -m. With IGNORE, the scheduler will allow the job to exist within the resource manager queue but will neither process it nor report it. MAIL will send email to both the admin and the user when rejected jobs are detected. If RETRY is set, then Moab will allow the job to remain idle and will only attempt to start the job when the policy violation is resolved. Any combination of attributes may be specified. This is a per-job policy specified with msub -l. JOBREJECTPOLICY also exists as a global parameter.
Also see QOSREJECTPOLICY. |
Example: |
> msub -l jobrejectpolicy=cancel:mail |
MAXMEM | |
---|---|
Forma: | <INTEGER> (in megabytes) |
Description | Maximum amount of memory the job may consume across all tasks before the JOBMEM action is taken. |
Example |
> qsub -l x=MAXMEM:1000mb bw.cmd If a RESOURCELIMITPOLICY is set for per-job memory utilization, its action will be taken when this value is reached. |
MAXPROC | |
---|---|
Format | <INTEGER> |
Description | Maximum CPU load the job may consume across all tasks before the JOBPROC action is taken. |
Example |
> qsub -W x=MAXPROC:4 bw.cmd If a RESOURCELIMITPOLICY is set for per-job processor utilization, its action will be taken when this value is reached. |
MICs | |
---|---|
Format |
msub -l nodes=<VALUE>:ppn=<VALUE>:mics=<VALUE>[:mode] Where mode is one of: exclusive - The default setting. The MIC is used exclusively by one process thread. exclusive_thread - The MIC is used exclusively by one process thread. exclusive_process - The MIC is used exclusively by one process regardless of process thread. Moab passes the mode portion of the request to Torque for processing. Moab does not support requesting MICs as a GRES. Submitting msub -l gres=mics:x does not work. |
Description | Moab schedules MICs as a special type of node-locked generic resources. When Torque reports MICs to Moab, Moab can schedule jobs and correctly assign MICs to ensure that jobs are scheduled efficiently. To have Moab schedule MICs , configure them in Torque then submit jobs using the "MIC" attribute. Moab automatically parses the "MIC" attribute and assigns them in the correct manner. |
Examples |
> msub -l nodes=2:ppn=2:mics=1:exclusive_process Submits a job that requests 2 tasks, 2 processors and 1 MIC per task (2 MICs total). Each MIC runs only threads related to the task. > msub -l nodes=4:mics=1,tpn=2 Submits a job that requests 4 tasks, 1 MIC per node (4 MICs total), and 2 tasks per node. Each MIC is dedicated exclusively to one task process. > msub -l nodes=4:mics=1 Submits a job that requests 4 tasks, 1 processor and 1 MIC per task (4 MICs total). Each MIC is dedicated exclusively to one task process. > msub -l nodes=4:mics=2+1:ppn=2,walltime=600 Submits a job that requests two different types of tasks, the first is 4 tasks, each with 1 processor and 2 MICs , and the second is 1 task with 2 processors. Each MIC is dedicated exclusively to one task process. |
MINPREEMPTTIME | |
---|---|
Format | [[DD:]HH:]MM:]SS |
Description | Minimum time job must run before being eligible for preemption.
Can only be specified if associated QoS allows per-job preemption configuration by setting the preemptconfig flag. |
Example |
> qsub -l minpreempttime=900 bw.cmd Job cannot be preempted until it has run for 15 minutes. |
MINPROCSPEED | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Minimum processor speed (in MHz) for every node that this job will run on. |
Example |
> qsub -W x=MINPROCSPEED:2000 bw.cmd Every node that runs this job must have a processor speed of at least 2000 MHz. |
MINWCLIMIT | |
---|---|
Format | [[DD:]HH:]MM:]SS |
Default | --- |
Description | Minimum wallclock limit job must run before being eligible for extension (See JOBEXTENDDURATION or JOBEXTENDSTARTWALLTIME.). |
Example |
> qsub -l minwclimit=300,walltime=16000 bw.cmd Job will run for at least 300 seconds but up to 16,000 seconds if possible (without interfering with other jobs). |
MSTAGEIN | |
---|---|
Format | [<SRCURL>[|<SRCRUL>...]%]<DSTURL> |
Description |
Indicates a job has data
staging requirements. The source URL(s) listed will be transferred to the execution system for use by the job. If more than one source URL is specified, the
destination URL must be a directory. PROTO can be any of the following protocols: ssh, file, or gsiftp. HOST is the name of the host where the file resides. PATH is the path of the source or destination file. The destination path may be a directory when sending a single file and must be a directory when sending multiple files. If a directory is specified, it must end with a forward slash (/). Valid variables include: $JOBID $HOME - Path the script was run from $RHOME - Home dir of the user on the remote system $SUBMITHOST $DEST - This is the Moab where the job will run $LOCALDATASTAGEHEAD If no destination is given, the protocol and file name will be set to the same as the source. The $RHOME (remote home directory) variable is for when a user's home directory on the compute node is different than on the submission host. |
Example: |
> msub -Wx='mstagein=file://$HOME/helperscript.sh|file:///home/dev/datafile.txt%ssh://host/home/dev/' script.sh |
MSTAGEOUT | |
---|---|
Format | [<SRCURL>[|<SRCRUL>...]%]<DSTURL> |
Description | Indicates whether a job has data
staging requirements. The source URL(s) listed will be transferred from the execution system after the completion of the job. If more than one source URL is specified, the
destination URL must be a directory. The format of <SRCURL> is: [PROTO://][HOST][:PORT]][/PATH]where the path is remote. The format of <DSTURL> is: [PROTO://][HOST][:PORT]][/PATH]where the path is local. PROTO can be any of the following protocols: ssh, file, or gsiftp. HOST is the name of the host where the file resides. PATH is the path of the source or destination file. The destination path may be a directory when sending a single file and must be a directory when sending multiple files. If a directory is specified, it must end with a forward slash (/). Valid variables include: $JOBID $HOME - Path the script was run from $RHOME - Home dir of the user on the remote system $SUBMITHOST $DEST - This is the Moab where the job will run $LOCALDATASTAGEHEAD If no destination is given, the protocol and file name will be set to the same as the source. The $RHOME (remote home directory) variable is for when a user's home directory on the compute node is different than on the submission host. |
Example |
> msub -W x='mstageout=ssh://$DEST/$HOME/resultfile1.txt|ssh://host/home/dev/resultscript.sh%file:///home/dev/' script.sh Copy resultfile1.txt and resultscript.sh from the execution system to /home/dev/ after the execution of script.sh is complete. $HOME is a path containing a preceding / (i.e. /home/adaptive). |
NACCESSPOLICY |
|
---|---|
Format | One of SHARED,
SINGLEJOB, SINGLETASK, SINGLEUSER, or UNIQUEUSER |
Description | Specifies how node resources should be accessed. (See Node Access Policies for more information).
The naccesspolicy option can only be used to make node access more constraining than is specified by the system, partition, or node policies. If the effective node access policy is shared, naccesspolicy can be set to singleuser, if the effective node access policy is singlejob, naccesspolicy can be set to singletask. |
Example |
> qsub -l naccesspolicy=singleuser bw.cmd > bsub -ext naccesspolicy=singleuser lancer.cmd Job can only allocate free nodes or nodes running jobs by same user. > qsub -l naccesspolicy=singlejob jobscript.sh # OR > qsub -W x=naccesspolicy:singlejob jobscript.sh Jobs can only run on specific nodes; regardless if the machine has free cores. |
NALLOCPOLICY |
|
---|---|
Format | One of the valid settings for the parameter NODEALLOCATIONPOLICY |
Description | Specifies how node resources should be selected and allocated to the job. (See Node Allocation Policies for more information.) |
Example |
> qsub -l nallocpolicy=minresource bw.cmd Job should use the minresource node allocation policy. |
NCPUS |
|
---|---|
Format | <INTEGER> |
Description |
The number of processors in one task where a task cannot span nodes. If NCPUS is used, then the resource manager's SUBMITPOLICY should be set to NODECENTRIC to get correct behavior. -l ncpus=<#> is equivalent to -l nodes=1:ppn=<#>when JOBNODEMATCHPOLICY is set to EXACTNODE. NCPUS is used when submitting jobs to an SMP. When using GPUs to submit to an SMP, use -1 ncpus=<#>:GPUs=<#>. You cannot request both ncpus and nodes in the same job. |
NMATCHPOLICY |
|
---|---|
Format | One of the valid settings for the parameter JOBNODEMATCHPOLICY |
Description | Specifies how node resources should be selected and allocated to the job. |
Example |
> qsub -l nodes=2 -W x=nmatchpolicy:exactnode bw.cmd Job should use the EXACTNODEJOBNODEMATCHPOLICY. |
NODESET | |
---|---|
Format | <SETTYPE>:<SETATTR>[:<SETLIST>] |
Description | Specifies nodeset constraints for job resource allocation (See the NodeSet Overview for more information.). |
Example |
> qsub -l nodeset=ONEOF:FEATURE:fastos:hiprio:bigmem bw.cmd |
NODESETCOUNT | |
---|---|
Format | <INTEGER> |
Description | Specifies how many node sets a job uses. |
Example |
> msub -l nodesetcount=2 |
NODESETISOPTIONAL | |
---|---|
Format | <BOOLEAN> |
Description | Specifies whether the nodeset constraint is optional (See the NodeSet Overview for more information.).
Requires SCHEDCFG[] FLAGS=allowperjobnodesetisoptional. |
Example |
> msub -l nodesetisoptional=true bw.cmd |
OPSYS | |
---|---|
Format | <OperatingSystem> |
Description | Specifies the job's required operating system. |
Example |
> qsub -l nodes=1,opsys=rh73 chem92.cmd |
PARTITION | |
---|---|
Format | <STRING>[:<STRING>]... |
Description | Specifies the partition (or partitions) in
which the job must run.
The job must have access to this partition based on system wide or credential based partition access lists. |
Example |
> qsub -l nodes=1,partition=math:geology The job must only run in the math partition or the geology partition. |
PREF | |
---|---|
Format | [{feature|variable}:]<STRING>[:<STRING>]...
If feature or variable are not specified, then feature is assumed. |
Description | Specifies which node
features are preferred by the job and should be allocated if available. If
preferred node criteria are specified, Moab favors the allocation of matching
resources but is not bound to only consider these resources.
Preferences are not honored unless the node allocation policy is set to PRIORITY and the PREF priority component is set within the node's PRIORITYF attribute. |
Example |
> qsub -l nodes=1,pref=bigmem The job may run on any nodes but prefers to allocate nodes with the bigmem feature. |
QoS | |
---|---|
Format | <STRING> |
Description | Requests the specified QoS for the job. |
Example |
> qsub -l walltime=1000,qos=highprio biojob.cmd |
REQATTR | |
---|---|
Format | Required node attributes with version number support: reqattr=[<must|must not|should|should not>]:<ATTRIBUTE>[{>=|>|<=|<|=}<VERSION>] |
Description |
Indicates required node attributes. Values may include letters, numbers, dashes, underscores, periods, and spaces. You can choose one of four requirement types for each node attribute you request:
If you do not specify a requirement type, Moab assumes "must." For information about using reqattr to request dynamic features, see Configuring dynamic features in Torque and Moab. |
Example |
> qsub -l reqattr=matlab=7.1 testj.txt |
RESFAILPOLICY | |
---|---|
Format | One of CANCEL, HOLD, IGNORE, NOTIFY, or REQUEUE |
Description | Specifies the action to take on an executing job if one or more allocated nodes fail. This setting overrides the global value specified with the NODEALLOCRESFAILUREPOLICY parameter. |
Example |
> msub -l resfailpolicy=ignore For this particular job, ignore node failures. |
SPRIORITY | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Allows Moab administrators to set a system priority on a job (similar to setspri). This only works if the job submitter is an administrator. |
Example |
> qsub -l nodes=16,spriority=100 job.cmd |
TEMPLATE | |
---|---|
Format | <STRING> |
Description | Specifies a job template to be used as a set template. The requested template must have SELECT=TRUE (See Job Templates.). |
Example |
> msub -l walltime=1000,nodes=16,template=biojob job.cmd |
TERMTIME | |
---|---|
Format | <TIMESPEC> |
Default | 0 |
Description | Specifies the time at which Moab should cancel a queued or active job (See Job Deadline Support.). |
Example |
> msub -l nodes=10,walltime=600,termtime=12:00_Jun/14 job.cmd |
TRIG | |
---|---|
Format: | <TRIGSPEC> |
Description: | Adds trigger(s) to the job (See Creating a Trigger for specific syntax.).
Job triggers can only be specified if allowed by the QoS flag trigger. See Enabling Job Triggers for more information. |
Example: |
> qsub -l trig=etype=start\&atype=exec\&action="/tmp/email.sh job.cmd" |
TRL (Format 1) | |
---|---|
Format | <INTEGER>[@<INTEGER>][:<INTEGER>[@<INTEGER>]]... |
Default: | 0 |
Description: | Specifies alternate task requests with their optional walltimes (See Malleable Jobs.). |
Example: |
> msub -l trl=2@500:4@250:8@125:16@62 job.cmd or > qsub -l trl=2:3:4 |
TRL (Format 2) | |
---|---|
Format | <INTEGER>-<INTEGER> |
Default | 0 |
Description | Specifies a range of task requests that require the same walltime (See Malleable Jobs.). |
Example |
> msub -l trl=32-64 job.cmd For optimization purposes Moab does not perform an exhaustive search of all possible values but will at least do the beginning, the end, and 4 equally distributed choices in between. |
VC | |
---|---|
Format | vc=<NAME> |
Description | Submits the job or workflow to a virtual container (VC). |
Example |
vc=vc13 |
11.3.3 Resource Manager Extension Examples
If more than one extension is required in a given job, extensions can be concatenated with a semicolon separator using the format <ATTR>:<VALUE>[;<ATTR>:<VALUE>]...
Example 11-1:
#@comment="HOSTLIST:node1,node2;QOS:special;SID:silverA"
Job must run on nodes node1 and node2 using the QoS special. The job is also associated with the system ID silverAallowing the silver daemon to monitor and control the job.
Example 11-2:
# PBS -W x=\"NODESET:ONEOF:NETWORK;DMEM:64\"
Job will have resources allocated subject to network based nodeset constraints. Further, each task will dedicate 64 MB of memory.
Example 11-3:
> qsub -l nodes=4,walltime=1:00:00 -W x="FLAGS:ADVRES:john.1"
Job will be forced to run within the john.1 reservation.
11.3.4 Configuring dynamic features in Torque and Moab
Used together, the reqattr RM extension and Torque $varattr parameter allow you to create jobs that request resources that may change or disappear. For example, if you wanted a job to request a certain version of Octave but different versions are configured on each node and updated at any time, you can create a script that searches for the feature and version on the nodes at a specified interval. Your Moab job can then retrieve the dynamic node attributes from the latest poll and use them for scheduling.
This functionality is available when you use the Torque $varattr parameter to configure a script that regularly retrieves updates on the nodes' feature(s) and the reqattr RM extension to require a feature with a certain value.
To set up a dynamic feature in Torque and Moab
#!/bin/bash # pull the version string for octave and print it for $varattr version_str='octave -v | grep version' [[ $version_str =~ ([[:digit:]].[[:digit:]].[[:digit:]]) ]] echo "octave=${BASH_REMATCH[1]}"
$varattr 30 /usr/local/scripts/octave.sh
> msub -l reqattr=octave=3.2.4 myJob.sh
Your job requests a node with Octave version 3.2.4. Torque passes the most recent (pulled within the last 30 seconds) version of Octave on each node. Moab then schedules the job on a node that currently has Octave 3.2.4.
Related Topics