5.412 Usage Limits/Throttling Policies

A number of Moab policies allow an administrator to control job flow through the system. These throttling policies work as filters allowing or disallowing a job to be considered for scheduling by specifying limits regarding system usage for any given moment. These policies may be specified as global or specific constraints specified on a per user, group, account, QoS, or class basis.

Fairness via Throttling Policies
- Basic Fairness Policies
- Multi-Dimension Fairness Policies
Override Limits
Idle Job Limits
Hard and Soft Limits
Per-partition Limits
Usage-based limits

5.412.1 Fairness via Throttling Policies

Moab allows significant flexibility with usage limits, or throttling policies. At a high level, Moab allows resource usage limits to be specified in three primary workload categories: (1) active, (2) idle, and (3) system job limits.

5.412.1.A Basic Fairness Policies

Workload category	Description
Active job limits	Constrain the total cumulative resources available to active jobs at a given time.
Idle job limits	Constrain the total cumulative resources available to idle jobs at a given time.
System job limits	Constrain the maximum resource requirements of any single job.

These limits can be applied to any job credential (user, group, account, QoS, and class), or on a system-wide basis. Using the keyword DEFAULT, a site may also specify the default setting for the desired user, group, account, QoS, and class. Additionally, you may configure QoS to allow limit overrides to any particular policy.

To run, a job must meet all policy limits. Limits are applied using the *CFG set of parameters, particularly USERCFG, GROUPCFG, ACCOUNTCFG, QOSCFG, CLASSCFG, and SYSCFG. Limits are specified by associating the desired limit to the individual or default object. The usage limits currently supported are listed in the following table.

MAXARRAYJOB
Units	Number of simultaneous active array job sub-jobs.
Description	Limits the number of simultaneously active (starting or running) array sub-jobs a credential can have.
Example	USERCFG[gertrude] MAXARRAYJOB=10 Gertrude can have a maximum of 10 active job array sub-jobs.

MAXGRES
Units	# of concurrent uses of a generic resource
Description	Limits the concurrent usage of a generic resource to a specific quantity or quantity range.
Example	USERCFG[joe] MAXGRES[matlab]=2 USERCFG[jim] MAXGRES[matlab]=2,4

MAXJOB
Units	# of jobs
Description	Limits the number of jobs a credential may have active (starting or running) at any given time. Moab places a hold on all new jobs submitted by that credential once it has reached its maximum number of allowable jobs. `MAXJOB=0` is not supported. You can, however, achieve similar results by using the HOLD attribute of the USERCFG parameter: USERCFG[john] HOLD=yes
Example	USERCFG[DEFAULT] MAXJOB=8 GROUPCFG[staff] MAXJOB=2,4

MAXMEM
Units	total memory in MB
Description	Limits the total amount of dedicated memory (in MB) that can be allocated by a credential's active jobs at any given time.
Example	ACCOUNTCFG[jasper] MAXMEM=2048

MAXNODE
Units	# of nodes
Description	Limits the total number of compute nodes that can be in use by active jobs at any given time. Adaptive Computing recommends that you set JOBNODEMATCHPOLICY EXACTNODE when using MAXNODE. This ensures jobs submitted using the msub/qsub "-l nodes=#" syntax will have a node count associated with the request. On some systems (including Torque/PBS), nodes have been softly defined rather than strictly defined; that is, a job may request 2 nodes but Torque will translate this request into 1 node with 2 processors. This can prevent Moab from enforcing a MAXNODE policy correctly for a single job. Correct behavior can be achieved using MAXPROC.
Example	CLASSCFG[batch] MAXNODE=64

MAXPE
Units	# of processor equivalents
Description	Limits the total number of dedicated processor-equivalents that can be allocated by active jobs at any given time.
Example	QOSCFG[base] MAXPE=128

MAXPROC
Units	# of processors
Description	Limits the total number of dedicated processors that can be allocated by active jobs at any given time per credential. To set MAXPROC per job, use msub -W.
Example	CLASSCFG[debug] MAXPROC=32

MAXPS
Units	<# of processors> * <walltime>
Description	Limits the number of outstanding processor-seconds a credential may have allocated at any given time. For example, if a user has a 4-processor job that will complete in 1 hour and a 2-processor job that will complete in 6 hours, they have 4 * 1 * 3600 + 2 * 6 * 3600 = 16 * 3600 outstanding processor-seconds. The outstanding processor-second usage of each credential is updated each scheduling iteration, decreasing as jobs approach their completion time.
Example	USERCFG[DEFAULT] MAXPS=720000

MAXSUBMITJOBS

Units

# of jobs

Description

Limits the number of jobs a credential may submit and have in the system at once. Moab will reject any job submitted beyond this limit.

If you use a Torque resource manager, you should also set max_user_queuable in case the user submits jobs via qsub instead of msub. See "Queue Attributes in the Torque 6.1.0 Administrator Guide for more information.

Example

USERCFG[DEFAULT] MAXSUBMITJOBS=5

MAXWC
Units	job duration [[[DD:]HH:]MM:]SS
Description	Limits the cumulative remaining walltime a credential may have associated with active jobs. It behaves identically to the MAXPS limit (listed earlier) only lacking the processor weighting. Like MAXPS, the cumulative remaining walltime of each credential is also updated each scheduling iteration. MAXWC does not limit the maximum wallclock limit per job. For this capability, use MAX.WCLIMIT.
Example	USERCFG[ops] MAXWC=72:00:00

The following example demonstrates a simple limit specification:

USERCFG[DEFAULT]  MAXJOB=4
USERCFG[john]     MAXJOB=8

This example allows user john to run up to 8 jobs while all other users may only run up to 4.

Simultaneous limits of different types may be applied per credential and multiple types of credentials may have limits specified. The next example demonstrates this mixing of limits and is a bit more complicated.

USERCFG[steve]    MAXJOB=2 MAXNODE=30
GROUPCFG[staff]   MAXJOB=5
CLASSCFG[DEFAULT] MAXNODE=16
CLASSCFG[batch]   MAXNODE=32

This configuration may potentially apply multiple limits to a single job. As discussed previously, a job may only run if it satisfies all applicable limits. Thus, in this example, the scheduler will be constrained to allow at most 2 simultaneous user steve jobs with an aggregate node consumption of no more than 30 nodes. However, if the job is submitted to a class other than batch, it may be limited further. Here, only 16 total nodes may be used simultaneously by jobs running in any given class with the exception of the class batch. If steve submitted a job to run in the class interactive, for example, and there were jobs already running in this class using a total of 14 nodes, his job would be blocked unless it requested 2 or fewer nodes by the default limit of 16 nodes per class.

5.412.1.B Multi-Dimension Fairness Policies and Per Credential Overrides

Multi-dimensional fairness policies allow a site to specify policies based on combinations of job credentials. A common example might be setting a maximum number of jobs allowed per queue per user or a total number of processors per group per QoS. As with basic fairness policies, multi-dimension policies are specified using the *CFG parameters or through the identity manager interface. Moab supports the most commonly used multi-dimensional fairness policies (listed in the table below) using the following format:

*CFG[X] <LIMITTYPE>[<CRED>]=<LIMITVALUE>

*CFG is one of USERCFG, GROUPCFG, ACCOUNTCFG, QOSCFG, or CLASSCFG, the <LIMITTYPE> policy is one of the policies listed in the table in section 6.2.1.1, and <CRED> is of the format <CREDTYPE>[:<VALUE>] with CREDTYPE being one of USER, GROUP, ACCT, QoS, or CLASS. The optional <VALUE> setting can be used to specify that the policy only applies to a specific credential value. For example, the following configuration sets limits on the class fast, controlling the maximum number of jobs any group can have active at any given time and the number of processors in use at any given time for user steve.

CLASSCFG[fast] MAXJOB[GROUP]=12
CLASSCFG[fast] MAXPROC[USER:steve]=50
CLASSCFG[fast] MAXIJOB[USER]=10

The following example configuration may clarify further:

# allow class batch to run up the 3 simultaneous jobs
# allow any user to use up to 8 total nodes within class
CLASSCFG[batch] MAXJOB=3 MAXNODE[USER]=8
# allow users steve and bob to use up to 3 and 4 total processors respectively within class
CLASSCFG[fast] MAXPROC[USER:steve]=3 MAXPROC[USER:bob]=4

Multi-dimensional policies cannot be applied on DEFAULT credentials.

The table below lists the currently implemented, multi-dimensional usage limit permutations. The "slmt" stands for "Soft Limit" and "hlmt" stands for "Hard Limit."

Multi-dimension usage limit permutations
ACCOUNTCFG[name]	MAXIJOB[QOS]=hlmt MAXIJOB[QOS:qosname]=hlmt
MAXIPROC[QOS]=hlmt MAXIPROC[QOS:qosname]=hlmt
MAXJOB[QOS]=slmt,hlmt MAXJOB[QOS:qosname]=slmt,hlmt
MAXJOB[USER]=slmt,hlmt MAXJOB[USER:username]=slmt,hlmt
MAXMEM[USER]=slmt,hlmt MAXMEM[USER:username]=slmt,hlmt
MAXNODE[USER]=slmt,hlmt MAXNODE[USER:username]=slmt,hlmt
MAXPE[QOS]=slmt,hlmt MAXPE[QOS:qosname]=slmt,hlmt
MAXPROC[USER]=slmt,hlmt MAXPROC[USER:username]=slmt,hlmt
MAXPROC[QOS]=slmt,hlmt MAXPROC[QOS:qosname]=slmt,hlmt
MAXPROC[USER]=slmt,hlmt MAXPROC[USER:username]=slmt,hlmt
MAXPS[QOS]=slmt,hlmt MAXPS[QOS:qosname]=slmt,hlmt
MAXPS[USER]=slmt,hlmt MAXPS[USER:username]=slmt,hlmt
MAXWC[USER]=slmt,hlmt MAXWC[USER:username]=slmt,hlmt
CLASSCFG[name]	MAXIJOB[USER]=hlmt MAXJOB[GROUP]=slmt,hlmt MAXJOB[GROUP:groupname]=slmt,hlmt
MAXJOB[QOS:qosname]=hlmt
MAXJOB[USER]=slmt,hlmt MAXJOB[USER:username]=slmt,hlmt
MAXMEM[GROUP]=slmt,hlmt MAXMEM[GROUP]=slmt,hlmt
MAXMEM[GROUP]=slmt,hlmt MAXMEM[GROUP:groupname]=slmt,hlmt
MAXMEM[QOS:qosname]=hlmt
MAXMEM[USER]=slmt,hlmt MAXMEM[USER:username]=slmt,hlmt
MAXNODE[GROUP]=slmt,hlmt MAXNODE[GROUP:groupname]=slmt,hlmt
MAXNODE[QOS:qosname]=hlmt
MAXNODE[USER]=slmt,hlmt MAXNODE[USER:username]=slmt,hlmt
MAXPE[GROUP]=slmt,hlmt MAXPE[GROUP:groupname]=slmt,hlmt
MAXPE[QOS:qosname]=hlmt
MAXPE[USER]=slmt,hlmt MAXPE[USER:username]=slmt,hlmt
MAXPROC[GROUP]=slmt,hlmt MAXPROC[GROUP:groupname]=slmt,hlmt
MAXPROC[QOS:qosname]=hlmt
MAXPROC[USER]=slmt,hlmt MAXPROC[USER:username]=slmt,hlmt
MAXPS[GROUP]=slmt,hlmt MAXPS[GROUP:groupname]=slmt,hlmt
MAXPS[QOS:qosname]=hlmt
MAXPS[USER]=slmt,hlmt MAXPS[USER:username]=slmt,hlmt
MAXWC[GROUP]=slmt,hlmt MAXWC[GROUP:groupname]=slmt,hlmt
MAXWC[QOS:qosname]=hlmt
MAXWC[USER]=slmt,hlmt MAXWC[USER:username]=slmt,hlmt
GROUPCFG[name]	MAXJOB[CLASS:classname]=slmt,hlmt
MAXJOB[USER]=slmt,hlmt MAXJOB[USER:username]=slmt,hlmt
MAXMEM[CLASS:classname]=slmt,hlmt
MAXMEM[USER]=slmt,hlmt MAXMEM[USER:username]=slmt,hlmt
MAXNODE[CLASS:classname]=slmt,hlmt
MAXNODE[USER]=slmt,hlmt MAXNODE[USER:username]=slmt,hlmt
MAXPE[CLASS:classname]=slmt,hlmt
MAXPE[USER]=slmt,hlmt MAXPE[USER:username]=slmt,hlmt
MAXPROC[CLASS:classname]=slmt,hlmt
MAXPROC[USER]=slmt,hlmt MAXPROC[USER:username]=slmt,hlmt
MAXPS[CLASS:classname]=slmt,hlmt
MAXPS[USER]=slmt,hlmt MAXPS[USER:username]=slmt,hlmt
MAXWC[CLASS:classname]=slmt,hlmt
MAXWC[USER]=slmt,hlmt MAXWC[USER:username]=slmt,hlmt
QOSCFG[name]	MAXIJOB[ACCT]=hlmt MAXIJOB[ACCT:accountname]=hlmt MAXIJOB[USER]=hlmt MAXIJOB[USER:class+classname]=hlmt
MAXINODE[ACCT]=slmt,hlmt MAXINODE[ACCT:accountname]=slmt,hlmt
MAXINODE[USER]=hlmt MAXINODE[USER:username]=slmt,hlmt
MAXIPROC[ACCT]=hlmt MAXIPROC[ACCT:accountname]=hlmt
MAXJOB[ACCT]=slmt,hlmt MAXJOB[ACCT:accountname]=slmt,hlmt
MAXJOB[USER]=slmt,hlmt MAXJOB[USER:username]=slmt,hlmt
MAXMEM[USER]=slmt,hlmt MAXMEM[USER:username]=slmt,hlmt
MAXNODE[USER]=slmt,hlmt MAXNODE[USER:username]=slmt,hlmt
MAXPE[ACCT]=slmt,hlmt MAXPE[ACCT:accountname]=slmt,hlmt
MAXPE[USER]=slmt,hlmt MAXPE[USER:username]=slmt,hlmt
MAXPROC[ACCT]=slmt,hlmt MAXPROC[ACCT:accountname]=slmt,hlmt
MAXPROC[USER]=slmt,hlmt MAXPROC[USER:username]=slmt,hlmt
MAXPS[ACCT]=slmt,hlmt MAXPS[ACCT:accountname]=slmt,hlmt
MAXPS[USER]=slmt,hlmt MAXPS[USER:username]=slmt,hlmt
MAXWC[USER]=slmt,hlmt MAXWC[USER:username]=slmt,hlmt
USERCFG[name]	MAXJOB[GROUP]=slmt,hlmt MAXJOB[GROUP:groupname]=slmt,hlmt
MAXMEM[GROUP]=slmt,hlmt MAXMEM[GROUP:groupname]=slmt,hlmt
MAXNODE[GROUP]=slmt,hlmt MAXNODE[GROUP:groupname]=slmt,hlmt
MAXPE[GROUP]=slmt,hlmt MAXPE[GROUP:groupname]=slmt,hlmt
MAXPROC[GROUP]=slmt,hlmt MAXPROC[GROUP:groupname]=slmt,hlmt
MAXPS[GROUP]=slmt,hlmt MAXPS[GROUP:groupname]=slmt,hlmt
MAXWC[GROUP]=slmt,hlmt MAXWC[GROUP:groupname]=slmt,hlmt

Multi-dimension usage limit permutations

ACCOUNTCFG[name]

MAXIJOB[QOS]=hlmt

MAXIJOB[QOS:qosname]=hlmt

MAXIPROC[QOS]=hlmt

MAXIPROC[QOS:qosname]=hlmt

MAXJOB[QOS]=slmt,hlmt

MAXJOB[QOS:qosname]=slmt,hlmt

MAXJOB[USER]=slmt,hlmt

MAXJOB[USER:username]=slmt,hlmt

MAXMEM[USER]=slmt,hlmt

MAXMEM[USER:username]=slmt,hlmt

MAXNODE[USER]=slmt,hlmt

MAXNODE[USER:username]=slmt,hlmt

MAXPE[QOS]=slmt,hlmt

MAXPE[QOS:qosname]=slmt,hlmt

MAXPROC[USER]=slmt,hlmt

MAXPROC[USER:username]=slmt,hlmt

MAXPROC[QOS]=slmt,hlmt

MAXPROC[QOS:qosname]=slmt,hlmt

MAXPROC[USER]=slmt,hlmt

MAXPROC[USER:username]=slmt,hlmt

MAXPS[QOS]=slmt,hlmt

MAXPS[QOS:qosname]=slmt,hlmt

MAXPS[USER]=slmt,hlmt

MAXPS[USER:username]=slmt,hlmt

MAXWC[USER]=slmt,hlmt

MAXWC[USER:username]=slmt,hlmt

CLASSCFG[name]

MAXIJOB[USER]=hlmt

MAXJOB[GROUP]=slmt,hlmt

MAXJOB[GROUP:groupname]=slmt,hlmt

MAXJOB[QOS:qosname]=hlmt

MAXJOB[USER]=slmt,hlmt

MAXJOB[USER:username]=slmt,hlmt

MAXMEM[GROUP]=slmt,hlmt

MAXMEM[GROUP:groupname]=slmt,hlmt

MAXMEM[QOS:qosname]=hlmt

MAXMEM[USER]=slmt,hlmt

MAXMEM[USER:username]=slmt,hlmt

MAXNODE[GROUP]=slmt,hlmt

MAXNODE[GROUP:groupname]=slmt,hlmt

MAXNODE[QOS:qosname]=hlmt

MAXNODE[USER]=slmt,hlmt

MAXNODE[USER:username]=slmt,hlmt

MAXPE[GROUP]=slmt,hlmt

MAXPE[GROUP:groupname]=slmt,hlmt

MAXPE[QOS:qosname]=hlmt

MAXPE[USER]=slmt,hlmt

MAXPE[USER:username]=slmt,hlmt

MAXPROC[GROUP]=slmt,hlmt

MAXPROC[GROUP:groupname]=slmt,hlmt

MAXPROC[QOS:qosname]=hlmt

MAXPROC[USER]=slmt,hlmt

MAXPROC[USER:username]=slmt,hlmt

MAXPS[GROUP]=slmt,hlmt

MAXPS[GROUP:groupname]=slmt,hlmt

MAXPS[QOS:qosname]=hlmt

MAXPS[USER]=slmt,hlmt

MAXPS[USER:username]=slmt,hlmt

MAXWC[GROUP]=slmt,hlmt

MAXWC[GROUP:groupname]=slmt,hlmt

MAXWC[QOS:qosname]=hlmt

MAXWC[USER]=slmt,hlmt

MAXWC[USER:username]=slmt,hlmt

GROUPCFG[name]

MAXJOB[CLASS:classname]=slmt,hlmt

MAXJOB[USER]=slmt,hlmt

MAXJOB[USER:username]=slmt,hlmt

MAXMEM[CLASS:classname]=slmt,hlmt

MAXMEM[USER]=slmt,hlmt

MAXMEM[USER:username]=slmt,hlmt

MAXNODE[CLASS:classname]=slmt,hlmt

MAXNODE[USER]=slmt,hlmt

MAXNODE[USER:username]=slmt,hlmt

MAXPE[CLASS:classname]=slmt,hlmt

MAXPE[USER]=slmt,hlmt

MAXPE[USER:username]=slmt,hlmt

MAXPROC[CLASS:classname]=slmt,hlmt

MAXPROC[USER]=slmt,hlmt

MAXPROC[USER:username]=slmt,hlmt

MAXPS[CLASS:classname]=slmt,hlmt

MAXPS[USER]=slmt,hlmt

MAXPS[USER:username]=slmt,hlmt

MAXWC[CLASS:classname]=slmt,hlmt

MAXWC[USER]=slmt,hlmt

MAXWC[USER:username]=slmt,hlmt

QOSCFG[name]

MAXIJOB[ACCT]=hlmt

MAXIJOB[ACCT:accountname]=hlmt

MAXIJOB[USER]=hlmt

MAXIJOB[USER:class+classname]=hlmt

MAXINODE[ACCT]=slmt,hlmt

MAXINODE[ACCT:accountname]=slmt,hlmt

MAXINODE[USER]=hlmt

MAXINODE[USER:username]=slmt,hlmt

MAXIPROC[ACCT]=hlmt

MAXIPROC[ACCT:accountname]=hlmt

MAXJOB[ACCT]=slmt,hlmt

MAXJOB[ACCT:accountname]=slmt,hlmt

MAXJOB[USER]=slmt,hlmt

MAXJOB[USER:username]=slmt,hlmt

MAXMEM[USER]=slmt,hlmt

MAXMEM[USER:username]=slmt,hlmt

MAXNODE[USER]=slmt,hlmt

MAXNODE[USER:username]=slmt,hlmt

MAXPE[ACCT]=slmt,hlmt

MAXPE[ACCT:accountname]=slmt,hlmt

MAXPE[USER]=slmt,hlmt

MAXPE[USER:username]=slmt,hlmt

MAXPROC[ACCT]=slmt,hlmt

MAXPROC[ACCT:accountname]=slmt,hlmt

MAXPROC[USER]=slmt,hlmt

MAXPROC[USER:username]=slmt,hlmt

MAXPS[ACCT]=slmt,hlmt

MAXPS[ACCT:accountname]=slmt,hlmt

MAXPS[USER]=slmt,hlmt

MAXPS[USER:username]=slmt,hlmt

MAXWC[USER]=slmt,hlmt

MAXWC[USER:username]=slmt,hlmt

USERCFG[name]

MAXJOB[GROUP]=slmt,hlmt

MAXJOB[GROUP:groupname]=slmt,hlmt

MAXMEM[GROUP]=slmt,hlmt

MAXMEM[GROUP:groupname]=slmt,hlmt

MAXNODE[GROUP]=slmt,hlmt

MAXNODE[GROUP:groupname]=slmt,hlmt

MAXPE[GROUP]=slmt,hlmt

MAXPE[GROUP:groupname]=slmt,hlmt

MAXPROC[GROUP]=slmt,hlmt

MAXPROC[GROUP:groupname]=slmt,hlmt

MAXPS[GROUP]=slmt,hlmt

MAXPS[GROUP:groupname]=slmt,hlmt

MAXWC[GROUP]=slmt,hlmt

MAXWC[GROUP:groupname]=slmt,hlmt

5.412.2 Override Limits

Like all job credentials, the QoS object may be associated with resource usage limits. However, this credential can also be given special override limits that supersede the limits of other credentials, effectively causing all other limits of the same type to be ignored. See QoS Usage Limits and Overrides for a complete list of policies that can be overridden. The following configuration provides an example of this in the last line:

USERCFG[steve]    MAXJOB=2   MAXNODE=30
GROUPCFG[staff]   MAXJOB=5
CLASSCFG[DEFAULT] MAXNODE=16
CLASSCFG[batch]   MAXNODE=32
QOSCFG[hiprio]    OMAXJOB=3  OMAXNODE=64

Only 3 hiprio QoS jobs may run simultaneously and hiprio QoS jobs may run with up to 64 nodes per credential ignoring other credential MAXNODE limits.

Given the preceding configuration, assume a job is submitted with the credentials, user steve, group staff, class batch, and QoS hiprio.

Such a job will start so long as running it does not lead to any of the following conditions:

Total nodes used by user steve does not exceed 64.
Total active jobs associated with user steve does not exceed 2.
Total active jobs associated with group staff does not exceed 5.
Total nodes dedicated to class batch does not exceed 64.
Total active jobs associated with QoS hiprio does not exceed 3.

While the preceding example is a bit complicated for most sites, similar combinations may be required to enforce policies found on many systems.

5.412.3 Idle Job Limits

Idle (or queued) job limits control which jobs are eligible for scheduling. To be eligible for scheduling, a job must meet the following conditions:

Be idle as far as the resource manager is concerned (no holds).
Have all job prerequisites satisfied (no outstanding job or data dependencies).
Meet all idle job throttling policies.

If a job fails to meet any of these conditions, it will not be considered for scheduling and will not accrue service based job prioritization. (See Service (SERVICE) Component and JOBPRIOACCRUALPOLICY.) The primary purpose of idle job limits is to ensure fairness among competing users by preventing queue stuffing and other similar abuses. Queue stuffing occurs when a single entity submits large numbers of jobs, perhaps thousands, all at once so they begin accruing queue time based priority and remain first to run despite subsequent submissions by other users.

Idle limits are specified in a manner almost identical to active job limits with the insertion of the capital letter I into the middle of the limit name. The following tables describe the MAXIARRAYJOB, MAXIJOB, and MAXINODE limits, which are idle limit equivalents to MAXARRAYJOB, MAXJOB, and MAXNODE limits, respectively.

MAXIARRAYJOB
Units	Number of simultaneous idle array job sub-jobs.
Description	Limits the number of simultaneously idle (eligible) job array sub-jobs across all job arrays submitted by a credential.
Example	USERCFG[gertrude] MAXARRAYJOB=10 MAXIARRAYJOB=5 Gertrude can have a maximum of 10 active job array sub-jobs and 5 eligible job array sub-jobs.

MAXIJOB
Units	# of jobs
Description	Limits the number of idle (eligible) jobs a credential may have at any given time.
Example	USERCFG[DEFAULT] MAXIJOB=8 GROUPCFG[staff] MAXIJOB=4

MAXINODE
Units	# of nodes
Description	Limits the total number of compute nodes that can be requested by jobs in the eligible/idle queue at any time. Once the limit is exceeded, the remaining jobs will be placed in the blocked queue. The number of nodes is determined by <tasks> / <maximumProcsOnOneNode> or, if using JOBNODEMATCHPOLICY EXACTNODE, by the number of nodes requested.
Example	USERCFG[DEFAULT] MAXINODE=2

Idle limits can constrain the total number of jobs considered to be eligible on a per credential basis. Further, like active job limits, idle job limits can also constrain eligible jobs based on aggregate requested resources. This could, for example, allow a site to indicate that for a given user, only jobs requesting up to a total of 64 processors, or 3200 processor-seconds would be considered at any given time. Which jobs to select is accomplished by prioritizing all idle jobs and then adding jobs to the eligible list one at a time in priority order until jobs can no longer be added. This eligible job selection is done only once per scheduling iteration, so, consequently, idle job limits only support a single hard limit specification. Any specified soft limit is ignored.

All single dimensional job limit types supported as active job limits are also supported as idle job limits. In addition, Moab also supports MAXIJOB[USER] and MAXIPROC[USER] policies on a per class basis. (See Basic Fairness Policies.)

Example:

USERCFG[steve]    MAXIJOB=2
GROUPCFG[staff]   MAXIJOB=5
CLASSCFG[batch]   MAXIJOB[USER]=2 MAXIJOB[USER:john]=6
QOSCFG[hiprio]    MAXIJOB=3

5.412.4 Hard and Soft Limits

Hard and soft limit specification allows a site to balance both fairness and utilization on a given system. Typically, throttling limits are used to constrain the quantity of resources a given credential (such as user or group) is allowed to consume. These limits can be very effective in enforcing fair usage among a group of users. However, in a lightly loaded system, or one in which there are significant swings in usage from project to project, these limits can reduce system utilization by blocking jobs even when no competing jobs are queued.

Soft limits help address this problem by providing additional scheduling flexibility. They allow sites to specify two tiers of limits; the more constraining limits soft limits are in effect in heavily loaded situations and reflect tight fairness constraints. The more flexible hard limits specify how flexible the scheduler can be in selecting jobs when there are idle resources available after all jobs meeting the tighter soft limits have started. Soft and hard limits are specified in the format [<SOFTLIMIT>,]<HARDLIMIT>. For example, a given site may want to use the following configuration:

USERCFG[DEFAULT]  MAXJOB=2,8

With this configuration, the scheduler would select all jobs that meet the per user MAXJOB limit of 2. It would then attempt to start and reserve resources for all of these selected jobs. If after doing so there still remain available resources, the scheduler would then select all jobs that meet the less constraining hard per user MAXJOB limit of 8 jobs. These jobs would then be scheduled and reserved as available resources allow.

If no soft limit is specified or the soft limit is less constraining than the hard limit, the soft limit is set equal to the hard limit.

Example:

USERCFG[steve]    MAXJOB=2,4 MAXNODE=15,30
GROUPCFG[staff]   MAXJOB=2,5
CLASSCFG[DEFAULT] MAXNODE=16,32
CLASSCFG[batch]   MAXNODE=12,32
QOSCFG[hiprio]    MAXJOB=3,5 MAXNODE=32,64

Job preemption status can be adjusted based on whether the job violates a soft policy using the ENABLESPVIOLATIONPREEMPTION parameter.

5.412.5 Per-partition Limits

Per-partition scheduling can set limits and enforce credentials and polices on a per-partition basis.

To enable per-partition scheduling, add the following to moab.cfg:

PERPARTITIONSCHEDULING TRUE
JOBMIGRATEPOLICY JUSTINTIME

With per-partition scheduling, it is recommended that limits go on the specific partitions and not on the global level. If limits are specified on both levels, Moab will take the more constricting of the limits. Also, please note that a DEFAULT policy on the global partition is not overridden by any policy on a specific partition.

5.412.5.A Per-partition Limits

You can configure per-job limits and credential usage limits on a per-partition basis in the moab.cfg file. Here is a sample configuration for partitions g02 and g03 in moab.cfg.

PARCFG[g02]   CONFIGFILE=/opt/moab/parg02.cfg
PARCFG[g03]   CONFIGFILE=/opt/moab/parg03.cfg

You can then add per-partition limits in each partition configuration file:

# /opt/moab/parg02.cfg
CLASSCFG[pbatch]   MAXJOB=5

# /opt/moab/parg03.cfg
CLASSCFG[pbatch]   MAXJOB=10

You can configure Moab so that jobs submitted to any partition besides g02and g03 get the default limits in moab.cfg:

stl

CLASSCFG[pbatch]  MAXJOB=2

5.412.5.B Supported Credentials and Limits

The user, group, account, QoS, and class credentials are supported in per-partition scheduling.

The following per-job limits are supported:

The following credential usage limits are supported:

MAXJOB
MAXNODE
MAXPROC
MAXWC
MAXSUBMITJOBS

Multi-dimensional limits are supported for the listed credentials and per-job limits. For example:

CLASSCFG[pbatch]   MAXJOB[user:frank]=10

5.412.6 Usage-based limits

Resource usage limits constrain the amount of resources a given job may consume. These limits are generally proportional to the resources requested and may include walltime, any standard resource, or any specified generic resource. The parameter RESOURCELIMITPOLICY controls which resources are limited, what limit policy is enforced per resource, and what actions the scheduler should take in the event of a policy violation.

5.412.6.A Configuring Actions

The RESOURCELIMITPOLICY parameter accepts a number of policies, resources, and actions using the format and values defined below.

If walltime is the resource to be limited, be sure that the resource manager is configured to not interfere if a job surpasses its given walltime. For Torque, this is done by using $ignwalltime in the configuration on each MOM node.

5.412.6.B Format

RESOURCELIMITPOLICY<RESOURCE>:[<SPOLICY>,]<HPOLICY>:[<SACTION>,]<HACTION>[:[<SVIOLATIONTIME>,]<HVIOLATIONTIME>]...

Resource	Description
CPUTIME	Maximum total job proc-seconds used by any single job (allows scheduler enforcement of cpulimit).
DISK	Local disk space (in MB) used by any single job task.
JOBMEM	Maximum real memory/RAM (in MB) used by any single job. JOBMEM will only work with the MAXMEM flag.
JOBPROC	Maximum processor load associated with any single job. You must set MAXPROC to use JOBPROC.
MEM	Maximum real memory/RAM (in MB) used by any single job task.
MINJOBPROC	Minimum processor load associated with any single job (action taken if job is using 5% or less of potential CPU usage).
NETWORK	Maximum network load associated with any single job task.
PROC	Maximum processor load associated with any single job task.
SWAP	Maximum virtual memory/SWAP (in MB) used by any single job task.
WALLTIME	Requested job walltime.

Policy	Description
ALWAYS	take action whenever a violation is detected
EXTENDEDVIOLATION	take action only if a violation is detected and persists for greater than the specified time limit
BLOCKEDWORKLOADONLY	take action only if a violation is detected and the constrained resource is required by another job

Action	Description
CANCEL	terminate the job
CHECKPOINT	checkpoint and terminate job
MIGRATE	requeue the job and require a different set of hosts for execution
NOTIFY	notify admins and job owner regarding violation
REQUEUE	terminate and requeue the job
SUSPEND	suspend the job and leave it suspended for an amount of time defined by the MINADMINSTIME parameter

Example 5-190: Notify and then cancel job if requested memory is exceeded

# if job exceeds memory usage, immediately notify owner
# if job exceeds memory usage for more than 5 minutes, cancel the job
RESOURCELIMITPOLICY MEM:ALWAYS,EXTENDEDVIOLATION:NOTIFY,CANCEL:00:05:00

Example 5-191: Checkpoint job on walltime violations

# if job exceeds requested walltime, checkpoint job
RESOURCELIMITPOLICY WALLTIME:ALWAYS:CHECKPOINT
# when checkpointing, send term signal, followed by kill 1 minute later
RMCFG[base] TYPE=PBS CHECKPOINTTIMEOUT=00:01:00 CHECKPOINTSIG=SIGTERM

Example 5-192: Cancel jobs that use 5% or less of potential CPU usage for more than 5 minutes

RESOURCELIMITPOLICY MINJOBPROC:EXTENDEDVIOLATION:CANCEL:5:00

Example 5-193: Migrating a job when it blocks other workload

RESOURCELIMITPOLICY JOBPROC:BLOCKEDWORKLOADONLY:MIGRATE

5.412.6.C Specifying Hard and Soft Policy Violations

Moab is able to perform different actions for both hard and soft policy violations. In most resource management systems, a mechanism does not exist to allow the user to specify both hard and soft limits. To address this, Moab provides the RESOURCELIMITMULTIPLIER parameter that allows per partition and per resource multiplier factors to be specified to generate the actual hard and soft limits to be used. If the factor is less than one, the soft limit will be lower than the specified value and a Moab action will be taken before the specified limit is reached. If the factor is greater than one, the hard limit will be set higher than the specified limit allowing a buffer space before the hard limit action is taken.

In the following example, job owners will be notified by email when their memory reaches 100% of the target, and the job will be canceled if it reaches 125% of the target. For wallclock usage, the job will be requeued when it reaches 90% of the specified limit if another job is waiting for its resources, and it will be checkpointed when it reaches the full limit.

RESOURCELIMITPOLICY       MEM:ALWAYS,ALWAYS:NOTIFY,CANCEL
RESOURCELIMITPOLICY       WALLTIME:BLOCKEDWORKLOADONLY,ALWAYS:REQUEUE,CHECKPOINT
RESOURCELIMITMULTIPLIER   MEM:1.25,WALLTIME:0.9

5.412.6.D Constraining Walltime Usage

While Moab constrains walltime using the parameter RESOURCELIMITPOLICY like other resources, it also allows walltime exception policies which are not available with other resources. In particular, Moab allows jobs to exceed the requested wallclock limit by an amount specified on a global basis using the JOBMAXOVERRUN parameter or on a per credential basis using the WCOVERRUN attribute of the CLASSCFG parameter.

JOBMAXOVERRUN    00:10:00
CLASSCFG[debug]  wcoverrun=00:00:30