TORQUE Resource Manager

4.1 Queue Configuration

  • 4.1.1 Queue Attributes
  • 4.1.2 Example Queue Configuration
  • 4.1.3 Setting a Default Queue
  • 4.1.4 Mapping a Queue to a Subset of Resources
  • 4.1.5 Creating a Routing Queue

Under TORQUE, queue configuration is accomplished using the qmgr command. With this tool, the first step is to create the queue. This is accomplished using the create subcommand of qmgr as in the following example:

> qmgr -c "create queue batch queue_type=execution"

Once created, the queue must be configured to be operational. At a minimum, this includes setting the options started and enabled. Further configuration is possible using any combination of the attributes listed in what follows.

For boolean attributes, T, t, 1, Y, and y are all synonymous with true, and F, f, 0, N, and n all mean false.

For queue_type, E and R are synonymous with Execution and Routing.

4.1.1 Queue Attributes

acl_groups
<GROUP>[@<HOST>][+<USER>[@<HOST>]]...
---
Specifies the list of groups which may submit jobs to the queue. If acl_group_enable is set to true, only users with a primary group listed in acl_groups may utilize the queue.

Note If the PBSACLUSEGROUPLIST variable is set in the pbs_server environment, acl_groups checks against all groups of which the job user is a member.

> qmgr -c "set queue batch acl_groups=staff"
> qmgr -c "set queue batch [email protected]"
> qmgr -c "set queue batch [email protected]"

Note Used in conjunction with acl_group_enable

   
acl_group_enable
<BOOLEAN>
FALSE
If TRUE, constrains TORQUE to only allow jobs submitted from groups specified by the acl_groups parameter.
qmgr -c "set queue batch acl_group_enable=true"
   
acl_group_sloppy
<BOOLEAN>
FALSE
If TRUE, acl_groups will be checked against all groups of which the job user is a member.
---
   
acl_hosts
<HOST>[+<HOST>]...
---
Specifies the list of hosts that may submit jobs to the queue.
qmgr -c "set queue batch acl_hosts=h1+h2+h3" 

Note Used in conjunction with acl_host_enable.

   
acl_host_enable
<BOOLEAN>
FALSE
If TRUE, constrains TORQUE to only allow jobs submitted from hosts specified by the acl_hosts parameter.
qmgr -c "set queue batch acl_host_enable=true"
   
acl_logic_or
<BOOLEAN>
FALSE
If TRUE, user and group acls are logically OR'd together, meaning that either acl may be met to allow access. If false or unset, then both acls are AND'd, meaning that both acls must be satisfied.
qmgr -c "set queue batch acl_logic_or=true"
   
acl_users
<USER>[@<HOST>][+<USER>[@<HOST>]]...
---
Specifies the list of users who may submit jobs to the queue. If acl_user_enable is set to TRUE, only users listed in acl_users may use the queue
> qmgr -c "set queue batch acl_users=john"
> qmgr -c "set queue batch [email protected]"
> qmgr -c "set queue batch [email protected]"

Note Used in conjunction with acl_user_enable.

   
acl_user_enable
<BOOLEAN>
FALSE
If TRUE, constrains TORQUE to only allow jobs submitted from users specified by the acl_users parameter.
qmgr -c "set queue batch acl_user_enable=true"
   
disallowed_types
<type>[+<type>]...
---
Specifies classes of jobs that are not allowed to be submitted to this queue. Valid types are interactive, batch, rerunable, nonrerunable, fault_tolerant (as of version 2.4.0 and later), fault_intolerant (as of version 2.4.0 and later), and job_array (as of version 2.4.1 and later).
qmgr -c "set queue batch disallowed_types = interactive"
qmgr -c "set queue batch disallowed_types += job_array"
   
enabled
<BOOLEAN>
FALSE
Specifies whether the queue accepts new job submissions.
qmgr -c "set queue batch enabled=true"
   
keep_completed
<INTEGER>
0
Specifies the number of seconds jobs should be held in the Completed state after exiting.
qmgr -c "set queue batch keep_completed=120"
   
kill_delay
<INTEGER>
2
Specifies the number of seconds between sending a SIGTERM and a SIGKILL to a job being cancelled.
qmgr -c "set queue batch kill_delay=30"
   
max_queuable
<INTEGER>
unlimited
Specifies the maximum number of jobs allowed in the queue at any given time (includes idle, running, and blocked jobs).
qmgr -c "set queue batch max_queuable=20"
   
max_running
<INTEGER>
unlimited
Specifies the maximum number of jobs in the queue allowed to run at any given time.
qmgr -c "set queue batch max_running=20"
   
max_user_queuable
<INTEGER>
unlimited
Specifies the maximum number of jobs, per user, allowed in the queue at any given time (includes idle, running, and blocked jobs). Version 2.1.3 and greater.
qmgr -c "set queue batch max_user_queuable=20"
   
max_user_run
<INTEGER>
unlimited
Specifies the maximum number of jobs, per user, in the queue allowed to run at any given time.
qmgr -c "set queue batch max_user_run=10"
   
priority
<INTEGER>
0
Specifies the priority value associated with the queue.
qmgr -c "set queue batch priority=20"
   
queue_type
one of e, execution, r, or route
---
Specifies the queue type.

Note This value must be explicitly set for all queues.

qmgr -c "set queue batch queue_type=execution"
   
resources_available
<STRING>
---
Specifies to cumulative resources available to all jobs running in the queue.
qmgr -c "set queue batch resources_available.nodect=20"

Note pbs_server must be restarted for changes to take effect. Also, resources_available is constrained by the smallest of queue.resources_available and the server.resources_available.

   
resources_default
<STRING>
N/A
Specifies default resource requirements for jobs submitted to the queue.
qmgr -c "set queue batch resources_default.walltime=3600"
   
resources_max
<STRING>
N/A
Specifies the maximum resource limits for jobs submitted to the queue.
qmgr -c "set queue batch resources_max.nodect=16"
   
resources_min
<STRING>
N/A
Specifies the minimum resource limits for jobs submitted to the queue.
qmgr -c "set queue batch resources_min.nodect=2"
   
route_destinations
<queue>[@<host>][+<queue>[@<host>]]...
N/A
Specifies the potential destination queues for jobs submitted to the associated routing queue.

Note This attribute is only valid for routing queues.

> qmgr -c "set queue route route_destinations=fast"
> qmgr -c "set queue route route_destinations+=slow"
> qmgr -c "set queue route [email protected]"
   
started
<BOOLEAN>
FALSE
Specifies whether jobs in the queue are allowed to execute.
qmgr -c "set queue batch started=true"

Note Resources may include one or more of the following: arch, mem, nodes, ncpus, nodect, procct, pvmem, and walltime.

Assigning Queue Resource Limits

Administrators can use resources limits to help direct what kind of jobs go to different queues. There are four queue attributes where resource limits can be set: resources_available, resources_default, resources_max and resources_min. The list of supported resources that can be limited with these attributes are arch, mem, ncpus, nodect, nodes, procct, pvmem, vmem, and walltime.

Resource Format Description
string Specifies the administrator defined system architecture required.
size Amount of physical memory used by the job. (Ignored on Darwin, Digital Unix, Free BSD, HPUX 11, IRIX, NetBSD, and SunOS. Also ignored on Linux if number of nodes is not 1. Not implemented on AIX and HPUX 10.)
integer An artifact of job centric mode is that if a job does not have an attribute set, the server and routing queue defaults are not applied when queue resource limits are checked. Consequently, a job that requests 32 nodes (not ncpus=32) will not be checked against a min_resource.ncpus limit.
integer Sets the number of nodes available. By default, TORQUE will set the number of nodes available to the number of nodes listed in the $TORQUE_HOME/server_priv/nodes file. nodect can be set to be greater than or less than that number. Generally, it is used to set the node count higher than the number of physical nodes in the cluster.

integer Number of nodes.

integer procct sets limits on the total number of execution slots (procs) allocated to a job. The number of procs is calculated by summing the products of all node and ppn entries for a job. For example qsub -l nodes=2:ppn=2+3:ppn=4 job.sh would yield a procct of 16. 2*2 (2:ppn=2) + 3*4 (3:ppn=4).
size Amount of virtual memory used by any single process in a job.
size Amount of virtual memory used by all concurrent processes in the job.
seconds, or [[HH:]MM:]SS Amount of real time during which a job can be in a running state.

4.1.2 Example Queue Configuration

The following series of qmgr commands will create and configure a queue named batch:

qmgr -c "create queue batch queue_type=execution"
qmgr -c "set queue batch started=true"
qmgr -c "set queue batch enabled=true"
qmgr -c "set queue batch resources_default.nodes=1"
qmgr -c "set queue batch resources_default.walltime=3600"

This queue will accept new jobs and, if not explicitly specified in the job, will assign a nodecount of 1 and a walltime of 1 hour to each job.

4.1.3 Setting a Default Queue

By default, a job must explicitly specify which queue it is to run in. To change this behavior, the server parameter default_queue may be specified as in the following example:

qmgr -c "set server default_queue=batch"

4.1.4 Mapping a Queue to a Subset of Resources

TORQUE does not currently provide a simple mechanism for mapping queues to nodes. However, schedulers such as Moab and Maui can provide this functionality.

The simplest method is using default_resources.neednodes on an execution queue, setting it to a particular node attribute. Maui/Moab will use this information to ensure that jobs in that queue will be assigned nodes with that attribute. For example, suppose we have some nodes bought with money from the chemistry department, and some nodes paid by the biology department.

$TORQUE_HOME/server_priv/nodes:
node01 np=2 chem
node02 np=2 chem
node03 np=2 bio
node04 np=2 bio

qmgr:
set queue chem resources_default.neednodes=chem
set queue bio  resources_default.neednodes=bio

Note This example does not preclude other queues from accessing those nodes. One solution is to use some other generic attribute with all other nodes and queues.

More advanced configurations can be made with standing reservations and QoSes.

4.1.5 Creating a Routing Queue

A routing queue will steer a job to a destination queue based on job attributes and queue constraints. It is set up by creating a queue of queue_type Route with a route_destinations attribute set, as in the following example.

qmgr
# routing queue
create queue route
set queue route queue_type = Route
set queue route route_destinations = reg_64
set queue route route_destinations += reg_32
set queue route route_destinations += reg
set queue route enabled = True
set queue route started = True

# queue for jobs using 1-15 nodes
create queue reg
set queue reg queue_type = Execution
set queue reg resources_min.ncpus = 1
set queue reg resources_min.nodect = 1
set queue reg resources_default.ncpus = 1
set queue reg resources_default.nodes = 1
set queue reg enabled = True
set queue reg started = True

# queue for jobs using 16-31 nodes
create queue reg_32
set queue reg_32 queue_type = Execution
set queue reg_32 resources_min.ncpus = 31
set queue reg_32 resources_min.nodes = 16
set queue reg_32 resources_default.walltime = 12:00:00
set queue reg_32 enabled = True
set queue reg_32 started = True

# queue for jobs using 32+ nodes
create queue reg_64
set queue reg_64 queue_type = Execution
set queue reg_64 resources_min.ncpus = 63
set queue reg_64 resources_min.nodes = 32
set queue reg_64 resources_default.walltime = 06:00:00
set queue reg_64 enabled = True
set queue reg_64 started = True

# have all jobs go through the routing queue
set server default_queue = batch
set server resources_default.ncpus = 1
set server resources_default.walltime = 24:00:00
  ...

In this example, the compute nodes are dual processors and default walltimes are set according to the number of processors/nodes of a job. Jobs with 32 nodes (63 processors) or more will be given a default walltime of 6 hours. Also, jobs with 16-31 nodes (31-62 processors) will be given a default walltime of 12 hours. All other jobs will have the server default walltime of 24 hours.

The ordering of the route_destinations is important. In a routing queue, a job is assigned to the first possible destination queue based on the resources_max, resources_min, acl_users, and acl_groups attributes. In the preceding example, the attributes of a single processor job would first be checked against the reg_64 queue, then the reg_32 queue, and finally the reg queue.

Adding the following settings to the earlier configuration elucidates the queue resource requirements:

qmgr
set queue reg resources_max.ncpus = 30
set queue reg resources_max.nodect = 15

set queue reg_16 resources_max.ncpus = 62
set queue reg_16 resources_max.ncpus = 31

The time of enforcement of server and queue defaults is important in this example. TORQUE applies server and queue defaults differently in job centric and queue centric modes. For job centric mode, TORQUE waits to apply the server and queue defaults until the job is assigned to its final execution queue. For queue centric mode, it enforces server defaults before it is placed in the routing queue. In either mode, queue defaults override the server defaults. TORQUE defaults to job centric mode. To set queue centric mode, set queue_centric_limits, as in what follows:

qmgr
set server queue_centric_limits = true

An artifact of job centric mode is that if a job does not have an attribute set, the server and routing queue defaults are not applied when queue resource limits are checked. Consequently, a job that requests 32 nodes (not ncpus=32) will not be checked against a min_resource.ncpus limit. Also, for the preceding example, a job without any attributes set will be placed in the reg_64 queue, since the server ncpus default will be applied after the job is assigned to an execution queue.

Note Routine queue defaults are NOT applied to job attributes in versions 2.1.0 and before.

Note If the error message 'qsub: Job rejected by all possible destinations' is reported when submitting a job, it may be necessary to add queue location information, (i.e., in the routing queue's route_destinations attribute, change 'batch' to '[email protected]').

See Also