Under TORQUE, queue configuration is accomplished using the qmgr command. With this tool, the first step is to create the queue. This is accomplished using the create subcommand of qmgr as in the following example:
> qmgr -c "create queue batch queue_type=execution"
Once created, the queue must be configured to be operational. At a minimum, this includes setting the options started and enabled. Further configuration is possible using any combination of the attributes listed in what follows.
For boolean attributes, T, t, 1, Y, and y are all synonymous with true, and F, f, 0, N, and n all mean false.
For queue_type, E and R are synonymous with Execution and Routing.
acl_groups | |||
Format: | <GROUP>[@<HOST>][+<USER>[@<HOST>]]... | ||
Default: | --- | ||
Description: | Specifies the list of groups which may submit jobs to the queue. If acl_group_enable is set to true, only users with a primary group listed in acl_groups may utilize the queue.
| ||
Example: |
> qmgr -c "set queue batch acl_groups=staff" > qmgr -c "set queue batch acl_groups+=ops@h2" > qmgr -c "set queue batch acl_groups+=staff@h3"
|
||
acl_group_enable | |||
Format: | <BOOLEAN> | ||
Default: | FALSE | ||
Description: | If TRUE, constrains TORQUE to only allow jobs submitted from groups specified by the acl_groups parameter. | ||
Example: | qmgr -c "set queue batch acl_group_enable=true" |
||
acl_group_sloppy | |||
Format: | <BOOLEAN> | ||
Default: | FALSE | ||
Description: | If TRUE, acl_groups will be checked against all groups of which the job user is a member. | ||
Example: | --- | ||
acl_hosts | |||
Format: | <HOST>[+<HOST>]... | ||
Default: | --- | ||
Description: | Specifies the list of hosts that may submit jobs to the queue. | ||
Example: | qmgr -c "set queue batch acl_hosts=h1+h2+h3"
|
||
acl_host_enable | |||
Format: | <BOOLEAN> | ||
Default: | FALSE | ||
Description: | If TRUE, constrains TORQUE to only allow jobs submitted from hosts specified by the acl_hosts parameter. | ||
Example: | qmgr -c "set queue batch acl_host_enable=true" |
||
acl_logic_or | |||
Format: | <BOOLEAN> | ||
Default: | FALSE | ||
Description: | If TRUE, user and group acls are logically OR'd together, meaning that either acl may be met to allow access. If false or unset, then both acls are AND'd, meaning that both acls must be satisfied. | ||
Example: | qmgr -c "set queue batch acl_logic_or=true" |
||
acl_users | |||
Format: | <USER>[@<HOST>][+<USER>[@<HOST>]]... | ||
Default: | --- | ||
Description: | Specifies the list of users who may submit jobs to the queue. If acl_user_enable is set to TRUE, only users listed in acl_users may use the queue | ||
Example: |
> qmgr -c "set queue batch acl_users=john" > qmgr -c "set queue batch acl_users+=steve@h2" > qmgr -c "set queue batch acl_users+=stevek@h3"
|
||
acl_user_enable | |||
Format: | <BOOLEAN> | ||
Default: | FALSE | ||
Description: | If TRUE, constrains TORQUE to only allow jobs submitted from users specified by the acl_users parameter. | ||
Example: | qmgr -c "set queue batch acl_user_enable=true" |
||
disallowed_types | |||
Format: | <type>[+<type>]... | ||
Default: | --- | ||
Description: | Specifies classes of jobs that are not allowed to be submitted to this queue. Valid types are interactive, batch, rerunable, nonrerunable, fault_tolerant (as of version 2.4.0 and later), fault_intolerant (as of version 2.4.0 and later), and job_array (as of version 2.4.1 and later). | ||
Example: | qmgr -c "set queue batch disallowed_types = interactive" qmgr -c "set queue batch disallowed_types += job_array" |
||
enabled | |||
Format: | <BOOLEAN> | ||
Default: | FALSE | ||
Description: | Specifies whether the queue accepts new job submissions. | ||
Example: | qmgr -c "set queue batch enabled=true" |
||
keep_completed | |||
Format: | <INTEGER> | ||
Default: | 0 | ||
Description: | Specifies the number of seconds jobs should be held in the Completed state after exiting. | ||
Example: | qmgr -c "set queue batch keep_completed=120" |
||
kill_delay | |||
Format: | <INTEGER> | ||
Default: | 2 | ||
Description: | Specifies the number of seconds between sending a SIGTERM and a SIGKILL to a job being cancelled. | ||
Example: | qmgr -c "set queue batch kill_delay=30" |
||
max_queuable | |||
Format: | <INTEGER> | ||
Default: | unlimited | ||
Description: | Specifies the maximum number of jobs allowed in the queue at any given time (includes idle, running, and blocked jobs). | ||
Example: | qmgr -c "set queue batch max_queuable=20" |
||
max_running | |||
Format: | <INTEGER> | ||
Default: | unlimited | ||
Description: | Specifies the maximum number of jobs in the queue allowed to run at any given time. | ||
Example: | qmgr -c "set queue batch max_running=20" |
||
max_user_queuable | |||
Format: | <INTEGER> | ||
Default: | unlimited | ||
Description: | Specifies the maximum number of jobs, per user, allowed in the queue at any given time (includes idle, running, and blocked jobs). Version 2.1.3 and greater. | ||
Example: | qmgr -c "set queue batch max_user_queuable=20" |
||
max_user_run | |||
Format: | <INTEGER> | ||
Default: | unlimited | ||
Description: | Specifies the maximum number of jobs, per user, in the queue allowed to run at any given time. | ||
Example: | qmgr -c "set queue batch max_user_run=10" |
||
priority | |||
Format: | <INTEGER> | ||
Default: | 0 | ||
Description: | Specifies the priority value associated with the queue. | ||
Example: | qmgr -c "set queue batch priority=20" |
||
queue_type | |||
Format: | one of e, execution, r, or route | ||
Default: | --- | ||
Description: | Specifies the queue type.
|
||
Example: | qmgr -c "set queue batch queue_type=execution" |
||
resources_available | |||
Format: | <STRING> | ||
Default: | --- | ||
Description: | Specifies to cumulative resources available to all jobs running in the queue. | ||
Example: | qmgr -c "set queue batch resources_available.nodect=20"
|
||
resources_default | |||
Format: | <STRING> | ||
Default: | N/A | ||
Description: | Specifies default resource requirements for jobs submitted to the queue. | ||
Example: | qmgr -c "set queue batch resources_default.walltime=3600" |
||
resources_max | |||
Format: | <STRING> | ||
Default: | N/A | ||
Description: | Specifies the maximum resource limits for jobs submitted to the queue. | ||
Example: | qmgr -c "set queue batch resources_max.nodect=16" |
||
resources_min | |||
Format: | <STRING> | ||
Default: | N/A | ||
Description: | Specifies the minimum resource limits for jobs submitted to the queue. | ||
Example: | qmgr -c "set queue batch resources_min.nodect=2" |
||
route_destinations | |||
Format: | <queue>[@<host>][+<queue>[@<host>]]... | ||
Default: | N/A | ||
Description: | Specifies the potential destination queues for jobs submitted to the associated routing queue.
|
||
Example: |
> qmgr -c "set queue route route_destinations=fast" > qmgr -c "set queue route route_destinations+=slow" > qmgr -c "set queue route route_destinations+=medium@hostname" |
||
started | |||
Format: | <BOOLEAN> | ||
Default: | FALSE | ||
Description: | Specifies whether jobs in the queue are allowed to execute. | ||
Example: | qmgr -c "set queue batch started=true" |
Resources may include one or more of the following: arch, mem, nodes, ncpus, nodect, procct, pvmem, and walltime. |
Administrators can use resources limits to help direct what kind of jobs go to different queues. There are four queue attributes where resource limits can be set: resources_available, resources_default, resources_max and resources_min. The list of supported resources that can be limited with these attributes are arch, mem, ncpus, nodect, nodes, procct, pvmem, vmem, and walltime.
Resource | Format | Description |
---|---|---|
arch | string | Specifies the administrator defined system architecture required. |
mem | size | Amount of physical memory used by the job. (Ignored on Darwin, Digital Unix, Free BSD, HPUX 11, IRIX, NetBSD, and SunOS. Also ignored on Linux if number of nodes is not 1. Not implemented on AIX and HPUX 10.) |
ncpus | integer | An artifact of job centric mode is that if a job does not have an attribute set, the server and routing queue defaults are not applied when queue resource limits are checked. Consequently, a job that requests 32 nodes (not ncpus=32) will not be checked against a min_resource.ncpus limit. |
nodect | integer | Sets the number of nodes available. By default, TORQUE will set the number of nodes available to the number of nodes listed in the $TORQUE_HOME/server_priv/nodes file. nodect can be set to be greater than or less than that number. Generally, it is used to set the node count higher than the number of physical nodes in the cluster. |
nodes | integer | Number of nodes. |
procct | integer | procct sets limits on the total number of execution slots (procs) allocated to a job. The number of procs is calculated by summing the products of all node and ppn entries for a job. For example qsub -l nodes=2:ppn=2+3:ppn=4 job.sh would yield a procct of 16. 2*2 (2:ppn=2) + 3*4 (3:ppn=4). |
pvmem | size | Amount of virtual memory used by any single process in a job. |
vmem | size | Amount of virtual memory used by all concurrent processes in the job. |
walltime | seconds, or [[HH:]MM:]SS | Amount of real time during which a job can be in a running state. |
The following series of qmgr commands will create and configure a queue named batch:
qmgr -c "create queue batch queue_type=execution" qmgr -c "set queue batch started=true" qmgr -c "set queue batch enabled=true" qmgr -c "set queue batch resources_default.nodes=1" qmgr -c "set queue batch resources_default.walltime=3600"
This queue will accept new jobs and, if not explicitly specified in the job, will assign a nodecount of 1 and a walltime of 1 hour to each job.
By default, a job must explicitly specify which queue it is to run in. To change this behavior, the server parameter default_queue may be specified as in the following example:
qmgr -c "set server default_queue=batch"
TORQUE does not currently provide a simple mechanism for mapping queues to nodes. However, schedulers such as Moab and Maui can provide this functionality.
The simplest method is using default_resources.neednodes on an execution queue, setting it to a particular node attribute. Maui/Moab will use this information to ensure that jobs in that queue will be assigned nodes with that attribute. For example, suppose we have some nodes bought with money from the chemistry department, and some nodes paid by the biology department.
$TORQUE_HOME/server_priv/nodes: node01 np=2 chem node02 np=2 chem node03 np=2 bio node04 np=2 bio qmgr: set queue chem resources_default.neednodes=chem set queue bio resources_default.neednodes=bio
This example does not preclude other queues from accessing those nodes. One solution is to use some other generic attribute with all other nodes and queues. |
More advanced configurations can be made with standing reservations and QoSes.
A routing queue will steer a job to a destination queue based on job attributes and queue constraints. It is set up by creating a queue of queue_type Route with a route_destinations attribute set, as in the following example.
# routing queue create queue route set queue route queue_type = Route set queue route route_destinations = reg_64 set queue route route_destinations += reg_32 set queue route route_destinations += reg set queue route enabled = True set queue route started = True # queue for jobs using 1-15 nodes create queue reg set queue reg queue_type = Execution set queue reg resources_min.ncpus = 1 set queue reg resources_min.nodect = 1 set queue reg resources_default.ncpus = 1 set queue reg resources_default.nodes = 1 set queue reg enabled = True set queue reg started = True # queue for jobs using 16-31 nodes create queue reg_32 set queue reg_32 queue_type = Execution set queue reg_32 resources_min.ncpus = 31 set queue reg_32 resources_min.nodes = 16 set queue reg_32 resources_default.walltime = 12:00:00 set queue reg_32 enabled = True set queue reg_32 started = True # queue for jobs using 32+ nodes create queue reg_64 set queue reg_64 queue_type = Execution set queue reg_64 resources_min.ncpus = 63 set queue reg_64 resources_min.nodes = 32 set queue reg_64 resources_default.walltime = 06:00:00 set queue reg_64 enabled = True set queue reg_64 started = True # have all jobs go through the routing queue set server default_queue = batch set server resources_default.ncpus = 1 set server resources_default.walltime = 24:00:00 ...
In this example, the compute nodes are dual processors and default walltimes are set according to the number of processors/nodes of a job. Jobs with 32 nodes (63 processors) or more will be given a default walltime of 6 hours. Also, jobs with 16-31 nodes (31-62 processors) will be given a default walltime of 12 hours. All other jobs will have the server default walltime of 24 hours.
The ordering of the route_destinations is important. In a routing queue, a job is assigned to the first possible destination queue based on the resources_max, resources_min, acl_users, and acl_groups attributes. In the preceding example, the attributes of a single processor job would first be checked against the reg_64 queue, then the reg_32 queue, and finally the reg queue.
Adding the following settings to the earlier configuration elucidates the queue resource requirements:
set queue reg resources_max.ncpus = 30 set queue reg resources_max.nodect = 15 set queue reg_16 resources_max.ncpus = 62 set queue reg_16 resources_max.ncpus = 31
The time of enforcement of server and queue defaults is important in this example. TORQUE applies server and queue defaults differently in job centric and queue centric modes. For job centric mode, TORQUE waits to apply the server and queue defaults until the job is assigned to its final execution queue. For queue centric mode, it enforces server defaults before it is placed in the routing queue. In either mode, queue defaults override the server defaults. TORQUE defaults to job centric mode. To set queue centric mode, set queue_centric_limits, as in what follows:
set server queue_centric_limits = true
An artifact of job centric mode is that if a job does not have an attribute set, the server and routing queue defaults are not applied when queue resource limits are checked. Consequently, a job that requests 32 nodes (not ncpus=32) will not be checked against a min_resource.ncpus limit. Also, for the preceding example, a job without any attributes set will be placed in the reg_64 queue, since the server ncpus default will be applied after the job is assigned to an execution queue.
Routine queue defaults are NOT applied to job attributes in versions 2.1.0 and before. |
If the error message 'qsub: Job rejected by all possible destinations' is reported when submitting a job, it may be necessary to add queue location information, (i.e., in the routing queue's route_destinations attribute, change 'batch' to 'batch@localhost'). |