Moab Workload Manager

6.1 Fairness Overview

The concept of cluster fairness varies widely from person to person and site to site. While some interpret it as giving all users equal access to compute resources, more complicated concepts incorporating historical resource usage, political issues, and job value are equally valid. While no scheduler can address all possible definitions of fair, Moab provides one of the industry's most comprehensive and flexible set of tools allowing most sites the ability to address their many and varied fairness management needs.

Under Moab, most fairness policies are addressed by a combination of the facilities described in the following table:

Job Prioritization
Specifies what is most important to the scheduler. Using service based priority factors allows a site to balance job turnaround time, expansion factor, or other scheduling performance metrics.
SERVICEWEIGHT    1
QUEUETIMEWEIGHT 10

Causes jobs to increase in priority by 10 points for every minute they remain in the queue.

   
Usage Limits (Throttling Policies)
Specifies limits on exactly what resources can be used at any given instant.
USERCFG[john]     MAXJOB=3
GROUPCFG[DEFAULT] MAXPROC=64
GROUPCFG[staff]   MAXPROC=128

Allows john to only run 3 jobs at a time. Allows the group staff to use up to 128 total processors and all other groups to use up to 64 processors.

   
Fairshare
Specifies usage targets to limit resource access or adjust priority based on historical cluster and grid level resource usage.
USERCFG[steve] FSTARGET=25.0+
FSWEIGHT       1
FSUSERWEIGHT   10

Enables priority based fairshare and specifies a fairshare target for user steve such that his jobs are favored in an attempt to keep his jobs using at least 25.0% of delivered compute cycles.

   
Allocation Management
Specifies long term, credential-based resource usage limits.
AMCFG[bank] TYPE=GOLD HOST=server.sys.net

Enables the GOLD allocation management interface. Within the allocation manager, project or account based allocations may be configured. These allocations may, for example, do such things as allow project X to use up to 100,000 processor-hours per quarter, provide various QoS sensitive charge rates, and share allocation access.

   
Quality of Service
Specifies additional resource and service access for particular users, groups, and accounts. QoS facilities can provide special priorities, policy exemptions, reservation access, and other benefits (as well as special charge rates).
QOSCFG[orion] PRIORITY=1000 XFTARGET=1.2 
QOSCFG[orion] QFLAGS=PREEMPTOR,IGNSYSTEM,RESERVEALWAYS

Enables jobs requesting the orion QoS a priority increase, an expansion factor target to improve response time, the ability to preempt other jobs, an exemption from system level job size policies, and the ability to always reserve needed resources if it cannot start immediately.

   
Standing Reservations
Reserves blocks of resources within the cluster for specific, periodic time frames under the constraints of a flexible access control list.
SRCFG[jupiter] HOSTLIST=node01[1-4]
SRCFG[jupiter] STARTTIME=9:00:00 ENDTIME=17:00:00
SRCFG[jupiter] USERLIST=john,steve ACCOUNTLIST=jupiter

Reserve nodes node011 through node014 from 9:00 AM until 5:00 PM for use by jobs from user john or steve or from the project jupiter.

   
Class/Queue Constraints
Associates users, resources, priorities, and limits with cluster classes or cluster queues that can be assigned to or selected by end-users.
CLASSCFG[long] MIN.WCLIMIT=24:00:00
SRCFG[jupiter] PRIORITY=10000
SRCFG[jupiter] HOSTLIST=acn[1-4][0-9]

Assigns long jobs a high priority but only allow them to run on certain nodes.

Selecting the Correct Policy Approach

Moab supports a rich set of policy controls in some cases allowing a particular policy to be enforced in more than one way. For example, cycle distribution can be controlled using usage limits, fairshare, or even queue definitions. Selecting the most correct policy depends on site objectives and needs; consider the following when making such a decision:

  • Minimal End-user Training
    • Does the solution use an approach familiar to or easily learned by existing users?
  • End-user Transparency
    • Can the configuration be enabled/disabled without impacting user behavior or job submission?
  • Impact on System Utilization and System Responsiveness
  • Solution Complexity
    • Is the impact of the configuration readily intuitive and is it easy to identify possible side effects?
  • Solution Extensibility and Flexibility
    • Will the proposed approach allow the solution to be easily tuned and extended as cluster needs evolve?

See Also