TORQUE Resource Manager > Setting Server Policies > Queue Configuration > Creating a Routing Queue

Creating a Routing Queue

A routing queue will steer a job to a destination queue based on job attributes and queue constraints. It is set up by creating a queue of queue_type "Route" with a route_destinations attribute set, as in the following example.

qmgr

 

# routing queue

create queue route

set queue route queue_type = Route

set queue route route_destinations = reg_64

set queue route route_destinations += reg_32

set queue route route_destinations += reg

set queue route enabled = True

set queue route started = True

 

# queue for jobs using 1-15 nodes

create queue reg

set queue reg queue_type = Execution

set queue reg resources_min.ncpus = 1

set queue reg resources_min.nodect = 1

set queue reg resources_default.ncpus = 1

set queue reg resources_default.nodes = 1

set queue reg enabled = True

set queue reg started = True

 

# queue for jobs using 16-31 nodes

create queue reg_32

set queue reg_32 queue_type = Execution

set queue reg_32 resources_min.ncpus = 31

set queue reg_32 resources_min.nodes = 16

set queue reg_32 resources_default.walltime = 12:00:00

set queue reg_32 enabled = True

set queue reg_32 started = True

 

# queue for jobs using 32+ nodes

create queue reg_64

set queue reg_64 queue_type = Execution

set queue reg_64 resources_min.ncpus = 63

set queue reg_64 resources_min.nodes = 32

set queue reg_64 resources_default.walltime = 06:00:00

set queue reg_64 enabled = True

set queue reg_64 started = True

 

# have all jobs go through the routing queue

set server default_queue = batch

set server resources_default.ncpus = 1

set server resources_default.walltime = 24:00:00

  ...

In this example, the compute nodes are dual processors and default walltimes are set according to the number of processors/nodes of a job. Jobs with 32 nodes (63 processors) or more will be given a default walltime of 6 hours. Also, jobs with 16-31 nodes (31-62 processors) will be given a default walltime of 12 hours. All other jobs will have the server default walltime of 24 hours.

The ordering of the route_destinations is important. In a routing queue, a job is assigned to the first possible destination queue based on the resources_max, resources_min, acl_users, and acl_groups attributes. In the preceding example, the attributes of a single processor job would first be checked against the reg_64 queue, then the reg_32 queue, and finally the reg queue.

Adding the following settings to the earlier configuration elucidates the queue resource requirements:

qmgr

 

set queue reg resources_max.ncpus = 30

set queue reg resources_max.nodect = 15

set queue reg_16 resources_max.ncpus = 62

set queue reg_16 resources_max.nodect = 31

The time of enforcement of server and queue defaults is important in this example. TORQUE applies server and queue defaults differently in job centric and queue centric modes. For job centric mode, TORQUE waits to apply the server and queue defaults until the job is assigned to its final execution queue. For queue centric mode, it enforces server defaults before it is placed in the routing queue. In either mode, queue defaults override the server defaults. TORQUE defaults to job centric mode. To set queue centric mode, set queue_centric_limits, as in what follows:

qmgr

 

set server queue_centric_limits = true

An artifact of job centric mode is that if a job does not have an attribute set, the server and routing queue defaults are not applied when queue resource limits are checked. Consequently, a job that requests 32 nodes (not ncpus=32) will not be checked against a min_resource.ncpus limit. Also, for the preceding example, a job without any attributes set will be placed in the reg_64 queue, since the server ncpus default will be applied after the job is assigned to an execution queue.

Routine queue defaults are not applied to job attributes in versions 2.1.0 and before.

If the error message "qsub: Job rejected by all possible destinations" is reported when submitting a job, it may be necessary to add queue location information, (i.e., in the routing queue's route_destinations attribute, change "batch" to "batch@localhost").

Related Topics 

© 2015 Adaptive Computing