(Click to open topic with navigation)
In addition to standing and administrative reservations, Moab can also create priority reservations. These reservations are used to allow the benefits of out-of-order execution (such as is available with backfill) without the side effect of job starvation. Starvation can occur in any system where the potential exists for a job to be overlooked by the scheduler for an indefinite period. In the case of backfill, small jobs may continue to run on available resources as they become available while a large job sits in the queue, never able to find enough nodes available simultaneously on which to run.
To avoid such situations, priority reservations are created for high priority jobs that cannot run immediately. When making these reservations, the scheduler determines the earliest time the job could start and then reserves these resources for use by this job at that future time.
6.5.1.A Priority Reservation Creation Policy
Organizations have the ability to control how priority reservations are created and maintained. It is possible that one job can be at the top of the priority queue for a time and then get bypassed by another job submitted later. The parameter RESERVATIONPOLICY allows a site to determine how existing reservations should be handled when new reservations are made.
All jobs that have ever received a priority reservation up to the RESERVATIONDEPTH number will maintain that reservation until they run, even if other jobs later bypass them in priority value.
For example, if there are four jobs with priorities of 8, 10,12, and 20.
RESERVATIONPOLICY HIGHEST RESERVATIONDEPTH 3
Only jobs 20, 12, and 10 get priority reservations. Later, if a job with priority higher than 20 is submitted into the queue, it will also get a priority reservation along with the jobs listed previously. If four jobs higher than 20 were to be submitted into the queue, only three would get priority reservations, in accordance with the condition set in the RESERVATIONDEPTH policy.
With HIGHEST, Moab may appear to exceed the RESERVATIONDEPTH if it has already scheduled the maximum number of priority reservations and then users submit jobs with higher priority than those already given a priority reservation. Moab keeps all of the previously-created priority reservations and creates new ones for jobs with higher priority (again up to the quantity specified with RESERVATIONDEPTH). This means that, if your RESERVATIONDEPTH is set to 3, Moab can potentially schedule up to 3 new priority reservations each scheduling iteration, as long as new higher-priority jobs are continually submitted. This behavior ensures that the highest-priority jobs receive attention while the former highest-priority jobs do not lose their priority reservation.
|CURRENTHIGHEST||Only the current top <RESERVATIONDEPTH> priority jobs receive reservations. Under this policy, all job reservations are destroyed each iteration when the queue is re-prioritized. The top jobs in the queue are then given new reservations.|
|NEVER||No priority reservations are made.|
6.5.1.B Priority Reservation Depth
By default, only the highest priority job receives a priority reservation. However, this behavior is configurable via the RESERVATIONDEPTH policy. Moab's default behavior of only reserving the highest priority job allows backfill to be used in a form known as liberal backfill. Liberal backfill tends to maximize system utilization and minimize overall average job turnaround time. However, it does lead to the potential of some lower priority jobs being indirectly delayed and may lead to greater variance in job turnaround time. The RESERVATIONDEPTH parameter can be set to a very large value, essentially enabling what is called conservative backfill where every job that cannot run is given a reservation. Most sites prefer the liberal backfill approach associated with the default RESERVATIONDEPTH of 1 or else select a slightly higher value. It is important to note that to prevent starvation in conjunction with reservations, monotonically increasing priority factors such as queue time or job XFactor should be enabled. See the Prioritization Overview for more information on priority factors.
Another important consequence of backfill and reservation depth is how they affect job priority. In Moab, all jobs are prioritized. Backfill allows jobs to be run out of order and thus, to some extent, job priority to be ignored. This effect, known as priority dilution, can cause many site policies implemented via Moab prioritization policies to be ineffective. Setting the RESERVATIONDEPTH parameter to a higher value gives job priority more teeth at the cost of slightly lower system utilization. This lower utilization results from the constraints of these additional reservations, decreasing the scheduler's freedom and its ability to find additional optimizing schedules. Anecdotal evidence indicates that these utilization losses are fairly minor, rarely exceeding 8%.
It is difficult a priori to know the right setting for the RESERVATIONDEPTH parameter. Surveys indicate that the vast majority of sites use the default value of 1. Sites that do modify this value typically set it somewhere in the range of 2 to 10. The following guidelines may be useful in determining if and how to adjust this parameter:
6.5.1.C Reasons to Increase RESERVATIONDEPTH
6.5.1.D Reasons to Decrease RESERVATIONDEPTH
6.5.1.E Assigning Per-QoS Reservation Creation Rules
QoS based reservation depths can be enabled via the RESERVATIONQOSLIST parameter. This parameter allows varying reservation depths to be associated with different sets of job QoSs. For example, the following configuration creates two reservation depth groupings:
RESERVATIONDEPTH 8 RESERVATIONQOSLIST highprio,interactive,debug RESERVATIONDEPTH 2 RESERVATIONQOSLIST batch
This example causes that the top 8 jobs belonging to the aggregate group of highprio, interactive, and debug QoS jobs will receive priority reservations. Additionally, the top two batch QoS jobs will also receive priority reservations. Use of this feature allows sites to maintain high throughput for important jobs by guaranteeing that a significant proportion of these jobs progress toward starting through use of the priority reservation.
By default, the following parameters are set inside Moab:
RESERVATIONDEPTH[DEFAULT] 1 RESERVATIONQOSLIST[DEFAULT] ALL
This allows one job with the highest priority to get a reservation. These values can be overwritten by modifying the DEFAULT policy.
Moab allows organizations to control how to best respond to a number of real-world issues. Occasionally when a reservation becomes active and a job attempts to start, various resource manager race conditions or corrupt state situations will prevent the job from starting. By default, Moab assumes the resource manager is corrupt, releases the reservation, and attempts to re-create the reservation after a short timeout. However, in the interval between the reservation release and the re-creation timeout, other priority reservations may allocate the newly available resources, reserving them before the original reservation gets an opportunity to reallocate them. Thus, when the original job reservation is re-established, its original resource may be unavailable and the resulting new reservation may be delayed several hours from the earlier start time. The parameter RESERVATIONRETRYTIME allows a site that is experiencing frequent resource manager race conditions and/or corruption situations to tell Moab to hold on to the reserved resource for a period of time in an attempt to allow the resource manager to correct its state.
By default, when a standing or administrative reservation is created, Moab allocates nodes in accordance with the specified taskcount, node expression, node constraints, and the MINRESOURCE node allocation policy.
If an accounting manager is configured within Moab, resources consumed by jobs are tracked and charged by default. However, resources dedicated to a reservation are not charged although they are recorded within the reservation event record. In particular, total processor-seconds reserved by the reservation are recorded as are total unused processor-seconds reserved (processor-seconds not consumed by an active job). While this information is available in real-time using the mdiag -r command (see the "Active PH" field), it is not written to the event log until reservation completion.
The default behavior for reservation tracking and charging via an accounting manager is defined by the AMCFG ALWAYSCHARGERESERVATIONS parameter. The default value for this attribute is False, meaning that charging will not normally occur for reservations (administrative or standing), unless specifically requested for the individual reservation. Likewise, if ALWAYSCHARGERESERVATIONS is set to True, idle cycles will be charged for all reservations (administrative or standing) unless specifically disabled for the individual reservation.
If ALWAYSCHARGERESERVATIONS is set to False (the default), charging may be enabled for individual reservations by specifying the CHARGEACCOUNT and CHARGEUSER attributes for the reservation. For standing reservations, these are set via the SRCFG CHARGEACCOUNT and CHARGEUSER parameters. For administrative reservations, these are set via the -S aaccount and auser options.
Example 6-3: Enabling charging in a standing reservation
SRCFG[foo] PERIOD=DAY DAYS=Mon,Tue,Wed,Thu,Fri DEPTH=1 USERLIST=amy CHARGEACCOUNT=chemistry CHARGEUSER=amy RESOURCES=PROCS:1 TASKCOUNT=2
Example 6-4: Enabling charging in an administrative reservation
mrsvctl -c -a USER=amy -S aaccount=chemistry -S auser=amy -R procs=1 -t 1 -d 7200
If ALWAYSCHARGERESERVATIONS is set to True, charging may be disabled for individual reservations by specifying the reservation Charge attribute with a value of False. For standing reservations, this are set via the SRCFG CHARGE parameter. For administrative reservations, this is set via the -S charge options.
Example 6-5: Disabling charging in a standing reservation
SRCFG[foo] PERIOD=DAY DAYS=Mon,Tue,Wed,Thu,Fri DEPTH=1 USERLIST=amy CHARGE=False RESOURCES=PROCS:1 TASKCOUNT=2
Example 6-6: Disabling charging in an administrative reservation
mrsvctl -c -a USER=amy -S charge=False -R procs=1 -t 1 -d 7200