Moab Workload Manager

16.3.1 Simulation Overview

This section explains the following concepts:

16.3.1.1 Determining Performance Metrics

The first step of most simulations is to determine the primary purpose of the simulation. Purposes may include identifying impact of certain resource or workload changes on current cluster performance. Simulations may also focus on system utilization or workload distribution across resources or credentials. Further, simulations may also be used for training purposes, allowing risk-free evaluation of behavior, facilities, and commands. With the purpose known, metrics of success can be specified and a proper simulation created. While performance metrics may not be critical to training based simulations, they are key to successful evaluation in most other cases.

16.3.1.2 Selecting Resources

As in the real world, a simulation requires a set of resources (compute hosts) on which to run. In Moab, this is specified using a resource trace file. This resource trace file may be obtained from specific hardware or generated for the specific purpose.

16.3.1.3 Selecting Workload

In addition to resources, a simulation also requires a workload (batch jobs) to schedule onto the available resources. This workload is specified within a workload trace file. Like the resource traces, this workload information may be based on recorded data or generated to meet the need of the particular simulation.

16.3.1.4 Selecting Policies

The final aspect of a simulation is the set of policies and configuration to be used to determine how a workload is to be scheduled onto the available resources. This configuration is placed in the moab.cfg file just as would be done in production (or normal) mode operation.

16.3.1.5 Initial Configuration Using the Sample Traces

While mastering simulations may take some time, initial configuration is straightforward. To start, edit the moab.cfg file and do the following:

  • Change the SCHEDCFG attribute MODE from NORMAL or MONITOR to SIMULATION.
  • Add the following lines:
    SIMRESOURCETRACEFILE samples/resource.testcluster.txt
    SIMWORKLOADTRACEFILE samples/workload.testcluster.txt
    SIMSTOPITERATION     0
    

    The preceding steps specify that the scheduler should run in simulation mode and use the referenced resource and workload trace files. In addition, leaving the SIMSTOPITERATION parameter at zero indicates that Moab should stop before the first scheduling iteration and wait for further instructions. If you want the simulation to run as soon as you start Moab, remove (or comment out) this line. To continue scheduling, run the mschedctl -r command.

  • You also may need to add these lines to the moab.cfg file:
  • CREDDISCOVERY		TRUE
    SIMAUTOSHUTDOWN     	false
    
    SIMSTARTTIME		1196987696
    
    USERCFG[DEFAULT] 	ENABLEPROFILING=true 
    GROUPCFG[DEFAULT] 	ENABLEPROFILING=true
    ACCOUNTCFG[DEFAULT] 	ENABLEPROFILING=true
    CLASSCFG[DEFAULT] 	ENABLEPROFILING=true
    QOSCFG[DEFAULT] 	ENABLEPROFILING=true
    

    The second set of parameters is helpful if you want to generate charts or reports from Moab Cluster Manager. Since events in the workload trace may reference credentials that are not listed in your moab.cfg file, set CREDDISCOVERY to true, which allows Moab to create simulated credentials for credentials that do not yet exist. Setting SIMAUTOSHUTDOWN to false prevents Moab from terminating after it has finished running all the jobs in the workload trace, and it allows you to generate charts after all the simulated jobs have finished. Ensure that SIMSTARTTIME is set to the epoch time (in seconds) of the first event in your workload trace file. This causes the internal clock in Moab to be set to the workload trace's first event, which prevents issues caused by the difference between the time the workload trace was created and the time reported by the CPU clock. Otherwise, Moab thinks the current time is the time that the CPU clock reports, yet simulated jobs that are reported by showq as currently running will really be running at the time the workload trace was created. To avoid confusion, set the SIMSTARTTIME. The lines that specify ENABLEPROFILING=true are necessary for Moab to keep track of the statistics generated by the simulated jobs. Not setting these lines will cause charts and reports to contain all zero values.

16.3.1.6 Starting a Simulation

As in all cases, Moab should be started by issuing the command moab. It should be noted that in simulation mode, Moab does not daemonize itself and so will not background itself. Verification of proper operation is possible using any common user command such as showq. If the showq command is run, it will display the number of jobs currently in the scheduler's queue. The jobs displayed by the showq command are taken from the workload trace file specified earlier and those that are marked as running are running on resources described in the resource trace file. At any point, a detailed summary of available resources may be obtained by running the mdiag -n command.

16.3.1.7 Interactive Tutorial

The rest of this section provides an interactive tutorial to demonstrate the basics of the simulator's capacities in Moab. The commands to issue are formatted as follows: > showq along with the expected output.

The following commands are used:

Start by running Moab:

> moab&

Next, verify that Moab is running by executing showq:

> showq

active jobs------------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

fr8n01.187.0            570    Running    20  1:00:00:00  Mon Feb 16 11:54:03
fr8n01.189.0            570    Running    20  1:00:00:00  Mon Feb 16 11:54:03
fr8n01.190.0            570    Running    20  1:00:00:00  Mon Feb 16 11:54:03
fr8n01.191.0            570    Running    20  1:00:00:00  Mon Feb 16 11:54:03
fr8n01.276.0            550    Running    20  1:00:00:00  Mon Feb 16 11:54:03
fr1n04.369.0            550    Running    20  1:00:00:00  Mon Feb 16 11:54:03
fr1n04.487.0            550    Running    20  1:00:00:00  Mon Feb 16 11:54:03

     7 active jobs     140 of  196 Processors Active (71.43%)

eligible jobs----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME

fr1n04.362.0            550       Idle    20  1:00:00:00  Mon Feb 16 11:53:33
fr1n04.363.0            550       Idle    20  1:00:00:00  Mon Feb 16 11:53:33
fr1n04.365.0            550       Idle    20  1:00:00:00  Mon Feb 16 11:53:33
fr1n04.366.0            550       Idle    20  1:00:00:00  Mon Feb 16 11:53:33
fr1n04.501.0            570       Idle    20  1:00:00:00  Mon Feb 16 11:53:33
fr1n04.580.0            570       Idle    20  1:00:00:00  Mon Feb 16 11:53:33
fr1n04.597.0            570       Idle    20  1:00:00:00  Mon Feb 16 11:53:33
fr1n04.598.0            570       Idle    20  1:00:00:00  Mon Feb 16 11:53:33
fr1n04.602.0            570       Idle    20  1:00:00:00  Mon Feb 16 11:53:33

9 eligible jobs

blocked jobs-----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME

0 blocked jobs

Total jobs:  16

Out of the thousands of jobs in the workload trace, only 16 jobs are either active or eligible because of the default settings of the SIMINITIALQUEUEDEPTH parameter. Sixteen jobs are put in the idle queue, seven of which immediately run. Issuing the command showq -r allows a more detailed look at the active (or running) jobs. The output is sorted by job completion time and indicates that the first job will complete in one day (1:00:00:00).

While showq details information about the queues, scheduler statistics may be viewed using the showstats command. The field Current Active/Total Procs shows current system utilization, for example.

> showstats

moab active for      00:00:30  stats initialized on Mon Feb 16 11:53:33

Eligible/Idle Jobs:                    9/9         (100.000%)
Active Jobs:                           0
Successful/Completed Jobs:             0/0         (0.000%)
Avg/Max QTime (Hours):              0.00/0.00
Avg/Max XFactor:                    0.00/0.00

Dedicated/Total ProcHours:          1.17/1.63      (71.429%)

Current Active/Total Procs:          140/196       (71.429%)

Avg WallClock Accuracy:             N/A
Avg Job Proc Efficiency:            N/A
Est/Avg Backlog (Hours):            N/A / N/A

You might be wondering why there are only 140 of 196 Processors Active (as shown with showq) when the first job (fr1n04.362.0) in the queue only requires 20 processors. We will use the checkjob command, which reports detailed job state information and diagnostic output for a particular job to determine why it is not running:

> checkjob fr1n04.362.0

job fr1n04.362.0

State: Idle
...
Network: hps_user  Memory >= 256M  Disk >= 0  Swap >= 0
...
Job Eligibility Analysis -------

job cannot run in partition DEFAULT (idle procs do not meet requirements : 8 of 20 procs found)
idle procs:  56  feasible procs:   8

Rejection Reasons: [Memory : 48][State : 140]

Checkjob not only tells us the job's wallclock limit and the number of requested nodes (they're in the ellipsis) but explains why the job was rejected from running. The Job Eligibility Analysis tells us that 48 of the processors rejected this job due to memory limitations and that another 140 processors rejected it because of their state (that is, they're running other jobs). Notice the >= 256 M(B) memory requirement.

If you run checkjob with the ID of a running job, it would also tell us exactly which nodes have been allocated to this job. There is additional information that the checkjob command page describes in more detail.

Advancing the simulator an iteration, the following happens:

> mschedctl -S

scheduling will stop in 00:00:30 at iteration 1

The scheduler control command, mschedctl, controls various aspects of scheduling behavior. It can be used to manage scheduling activity, kill the scheduler, and create resource trace files. The -S argument indicates that the scheduler run for a single iteration and stop. Specifying a number, n, after -S causes the simulator to advance n steps. You can determine what iteration you are currently on using showstats -v.

> showstats -v

current scheduler time: Mon Feb 16 11:54:03 1998 (887655243)
moab active for      00:01:00  stats initialized on Mon Feb 16 11:53:33
statistics for iteration     1  scheduler started on Wed Dec 31 17:00:00
...

The line that starts with statistics for iteration <X> specifies the iteration you are currently on. Each iteration advances the simulator RMPOLLINTERVAL seconds. By default, RMPOLLINTERVAL is set to 30 seconds. To see what RMPOLLINTERVAL is set to, use the showconfig command:

> showconfig | grep RMPOLLINTERVAL

RMPOLLINTERVAL                  00:00:30

The showq -r command can be used to display the running (active) jobs to see what happened in the last iteration:

> showq -r

active jobs------------------------
JOBID               S PAR  EFFIC  XFACTOR  Q      USER    GROUP    MHOST PROCS   REMAINING            STARTTIME

fr8n01.804.0        R   1 ------      1.0  -       529      519   fr9n16     5    00:05:00  Mon Feb 16 11:54:03
fr8n01.187.0        R   1 ------      1.0  -       570      519   fr7n15    20  1:00:00:00  Mon Feb 16 11:54:03
...
fr8n01.960.0        R   1 ------      1.0  -       588      519   fr9n11    32  1:00:00:00  Mon Feb 16 11:54:03

     9 active jobs     177 of  196 Processors Active (90.31%)

Total jobs:  9

Notice that two new jobs started (without waiting in the eligible queue). Also notice that job fr8n01.187.0, along with the rest that are summarized in the ellipsis, did NOT advance its REMAINING or STARTTIME. The simulator needs one iteration to do a sanity check. Setting the parameter SIMSTOPITERATION to 1 causes Moab to stop after the first scheduling iteration and wait for further instructions.

The showq -i command displays the idle (eligible) jobs.

> showq -i

eligible jobs----------------------
JOBID                 PRIORITY  XFACTOR  Q      USER    GROUP  PROCS     WCLIMIT     CLASS      SYSTEMQUEUETIME

fr1n04.362.0*                1      1.0  -       550      519     20  1:00:00:00     batch  Mon Feb 16 11:53:33
fr1n04.363.0                 1      1.0  -       550      519     20  1:00:00:00     batch  Mon Feb 16 11:53:33
fr1n04.365.0                 1      1.0  -       550      519     20  1:00:00:00     batch  Mon Feb 16 11:53:33
fr1n04.366.0                 1      1.0  -       550      519     20  1:00:00:00     batch  Mon Feb 16 11:53:33
fr1n04.501.0                 1      1.0  -       570      519     20  1:00:00:00     batch  Mon Feb 16 11:53:33
fr1n04.580.0                 1      1.0  -       570      519     20  1:00:00:00     batch  Mon Feb 16 11:53:33
fr1n04.597.0                 1      1.0  -       570      519     20  1:00:00:00     batch  Mon Feb 16 11:53:33
fr1n04.598.0                 1      1.0  -       570      519     20  1:00:00:00     batch  Mon Feb 16 11:53:33
fr1n04.602.0                 1      1.0  -       570      519     20  1:00:00:00     batch  Mon Feb 16 11:53:33
fr1n04.743.0                 1      1.0  -       570      519     20  1:00:00:00     batch  Mon Feb 16 11:54:03
fr1n04.744.0                 1      1.0  -       570      519     20  1:00:00:00     batch  Mon Feb 16 11:54:03
fr1n04.746.0                 1      1.0  -       570      519     20  1:00:00:00     batch  Mon Feb 16 11:54:03
fr1n04.747.0                 1      1.0  -       570      519     20  1:00:00:00     batch  Mon Feb 16 11:54:03
fr8n01.388.0                 1      1.0  -       550      519     20  1:00:00:00     batch  Mon Feb 16 11:54:03

14 eligible jobs

Total jobs:  14

Notice how none of the eligible jobs are requesting 19 or fewer jobs (the number of idle processors). Also notice the * after the job id fr1n04.362.0. This means that this job now has a reservation. The showres command shows all reservations currently on the system.

> showres

ReservationID       Type S       Start         End    Duration    N/P    StartTime

fr8n01.187.0         Job R    00:00:00  1:00:00:00  1:00:00:00   20/20   Mon Feb 16 11:54:03
fr8n01.189.0         Job R    00:00:00  1:00:00:00  1:00:00:00   20/20   Mon Feb 16 11:54:03
fr8n01.190.0         Job R    00:00:00  1:00:00:00  1:00:00:00   20/20   Mon Feb 16 11:54:03
fr8n01.191.0         Job R    00:00:00  1:00:00:00  1:00:00:00   20/20   Mon Feb 16 11:54:03
fr8n01.276.0         Job R    00:00:00  1:00:00:00  1:00:00:00   20/20   Mon Feb 16 11:54:03
fr1n04.362.0         Job I  1:00:00:00  2:00:00:00  1:00:00:00   20/20   Tue Feb 17 11:54:03
fr1n04.369.0         Job R    00:00:00  1:00:00:00  1:00:00:00   20/20   Mon Feb 16 11:54:03
fr1n04.487.0         Job R    00:00:00  1:00:00:00  1:00:00:00   20/20   Mon Feb 16 11:54:03
fr8n01.804.0         Job R    00:00:00    00:05:00    00:05:00    5/5    Mon Feb 16 11:54:03
fr8n01.960.0         Job R    00:00:00  1:00:00:00  1:00:00:00   32/32   Mon Feb 16 11:54:03

10 reservations located

Here, the S column is the job's state(R = running, I = idle). All the active jobs have a reservation along with idle job fr1n04.362.0. This reservation was actually created by the backfill scheduler for the highest priority idle job as a way to prevent starvation while lower priority jobs were being backfilled. (The backfill documentation describes the mechanics of the backfill scheduling more fully.)

To display information about the nodes that job fr1n04.362.0 has reserved, use showres -n <JOBID>.

> showres -n fr1n04.362.0

reservations on Mon Feb 16 11:54:03

NodeName                   Type      ReservationID   JobState Task       Start    Duration  StartTime

fr5n09                      Job       fr1n04.362.0       Idle    1  1:00:00:00  1:00:00:00  Tue Feb 17 11:54:03
...
fr7n15                      Job       fr1n04.362.0       Idle    1  1:00:00:00  1:00:00:00  Tue Feb 17 11:54:03

20 nodes reserved

Now advance the simulator an iteration to allow some jobs to actually run.

> mschedctl -S

scheduling will stop in 00:00:30 at iteration 2

Next, check the queues to see what happened.

> showq

active jobs------------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

fr8n01.804.0            529    Running     5    00:04:30  Mon Feb 16 11:54:03
fr8n01.187.0            570    Running    20    23:59:30  Mon Feb 16 11:54:03
...

     9 active jobs     177 of  196 Processors Active (90.31%)

eligible jobs----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME

...
fr8n01.963.0            586       Idle    32     9:00:00  Mon Feb 16 11:54:33
fr8n01.1016.0           570       Idle    20  1:00:00:00  Mon Feb 16 11:54:33

16 eligible jobs
...

Two new jobs, fr8n01.963.0 and fr8n01.1016.0, are in the eligible queue. Also, note that the first job will now complete in 4 minutes 30 seconds rather than 5 minutes because we have just advanced now by 30 seconds, one RMPOLLINTERVAL. It is important to note that when the simulated jobs were created, both the job's wallclock limit and its actual run time were recorded. The wallclock limit is specified by the user indicating their best estimate of an upper bound on how long the job will run. The run time is how long the job actually ran before completing and releasing its allocated resources. For example, a job with a wallclock limit of 1 hour will be given the needed resources for up to an hour but may complete in only 20 minutes.

Stop the simulation at iteration 6.

> mschedctl -s 6I

scheduling will stop in 00:03:00 at iteration 6

The -s 6I argument indicates that the scheduler will stop at iteration 6 and will (I)gnore user input until it gets there. This prevents the possibility of obtaining showq output from iteration 5 rather than iteration 6.

> showq

active jobs------------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

fr8n01.804.0            529    Running     5    00:02:30  Mon Feb 16 11:54:03
...
fr1n04.501.0            570    Running    20  1:00:00:00  Mon Feb 16 11:56:33
fr8n01.388.0            550    Running    20  1:00:00:00  Mon Feb 16 11:56:33

     9 active jobs     177 of  196 Processors Active (90.31%)
...
    14 eligible jobs
...

Job fr8n01.804.0 is still 2 minutes 30 seconds away from completing as expected but notice that both jobs fr8n01.189.0 and fr8n01.191.0 have completed early. Although they had almost 24 hours remaining of wallclock limit, they terminated. In reality, they probably failed on the real world system where the trace file was being created. Their completion freed up 40 processors which the scheduler was able to immediately use by starting several more jobs.

Note the system statistics:

> showstats

...
Successful/Completed Jobs:             0/2         (0.000%)
...
Avg WallClock Accuracy:           0.150%
Avg Job Proc Efficiency:        100.000%
Est/Avg Backlog (Hours):            0.00/3652178.74

A few more fields are filled in now that some jobs have completed providing information on which to generate statistics.

Decrease the default LOGLEVEL with mschedctl -m to avoid unnecessary logging, and speed up the simulation.

> mschedctl -m LOGLEVEL 0

INFO:  parameter modified

You can use mschedctl -m to immediately change the value of any parameter. The change is only made to the currently running Moab server and is not propagated to the configuration file. Changes can also be made by modifying the configuration file and restarting the scheduler.

Stop at iteration 580 and pull up the scheduler's statistics.

> mschedctl -s 580I; showq

scheduling will stop in 4:47:00 at iteration 580

...
    11 active jobs     156 of  196 Processors Active (79.59%)

eligible jobs----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME

fr8n01.963.0            586       Idle    32     9:00:00  Mon Feb 16 11:54:33
fr8n01.1075.0           560       Idle    32    23:56:00  Mon Feb 16 11:58:33
fr8n01.1076.0           560       Idle    16    23:56:00  Mon Feb 16 11:59:33
fr1n04.1953.0           520       Idle    46     7:45:00  Mon Feb 16 12:03:03
...
16 eligible jobs
...

You may note that showq hangs a while as the scheduler simulates up to iteration 580. The output shows that currently only 156 of the 196 nodes are busy, yet at first glance 3 jobs, fr8n01.963.0, fr8n01.1075.0, and fr8n01.1076.0 appear to be ready to run.

> checkjob fr8n01.963.0; checkjob fr8n01.1075.0; checkjob fr8n01.1076.0

job fr8n01.963.0
...
Network: hps_user  Memory >= 256M  Disk >= 0  Swap >= 0
...
Job Eligibility Analysis -------

job cannot run in partition DEFAULT (idle procs do not meet requirements : 20 of 32 procs found)
idle procs:  40  feasible procs:  20

Rejection Reasons: [Memory : 20][State : 156]

job fr8n01.1075.0
...
Network: hps_user  Memory >= 256M  Disk >= 0  Swap >= 0
...
job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 32 procs found)
idle procs:  40  feasible procs:   0

Rejection Reasons: [Memory : 20][State : 156][ReserveTime : 20]

job fr8n01.1076.0
...
Network: hps_user  Memory >= 256M  Disk >= 0  Swap >= 0
...
job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 16 procs found)
idle procs:  40  feasible procs:   0

Rejection Reasons: [Memory : 20][State : 156][ReserveTime : 20]

The checkjob command reveals that job fr8n01.963.0 only found 20 of 32 processors. The remaining 20 idle processors could not be used because the configured memory on the node did not meet the jobs requirements. The other jobs cannot find enough nodes because of ReserveTime. This indicates that the processors are idle, but that they have a reservation in place that will start before the job being checked could complete.

Verify that the idle nodes do not have enough memory configured and they are already reserved with the mdiag -n command, which provides detailed information about the state of nodes Moab is currently tracking. The mdiag command can be used with various flags to obtain detailed information about accounts, fair share, groups, jobs, nodes, QoS, queues, reservations, the resource manager, and users. The command also performs a number of sanity checks on the data provided and will present warning messages if discrepancies are detected.

> mdiag -n -v | grep -e Name -e Idle

Name      State  Procs Memory         Disk          Swap      Speed  Opsys   Arch Par   Load Rsv  ...
fr10n09   Idle   1:1   256:256      9780:9780   411488:411488  1.00  AIX43  R6000 DEF   0.00 001  .
fr10n11   Idle   1:1   256:256      8772:8772   425280:425280  1.00  AIX43  R6000 DEF   0.00 001  . 
fr10n13   Idle   1:1   256:256      9272:9272   441124:441124  1.00  AIX43  R6000 DEF   0.00 001  .
fr10n15   Idle   1:1   256:256      8652:8652   440776:440776  1.00  AIX43  R6000 DEF   0.00 001  
fr11n01   Idle   1:1   256:256      7668:7668   438624:438624  1.00  AIX43  R6000 DEF   0.00 001 
fr11n03   Idle   1:1   256:256      9548:9548   424584:424584  1.00  AIX43  R6000 DEF   0.00 001 
fr11n05   Idle   1:1   256:256     11608:11608  454476:454476  1.00  AIX43  R6000 DEF   0.00 001 
fr11n07   Idle   1:1   256:256      9008:9008   425292:425292  1.00  AIX43  R6000 DEF   0.00 001 
fr11n09   Idle   1:1   256:256      8588:8588   424684:424684  1.00  AIX43  R6000 DEF   0.00 001 
fr11n11   Idle   1:1   256:256      9632:9632   424936:424936  1.00  AIX43  R6000 DEF   0.00 001 
fr11n13   Idle   1:1   256:256      9524:9524   425432:425432  1.00  AIX43  R6000 DEF   0.00 001 
fr11n15   Idle   1:1   256:256      9388:9388   425728:425728  1.00  AIX43  R6000 DEF   0.00 001 
fr14n01   Idle   1:1   256:256      6848:6848   424260:424260  1.00  AIX43  R6000 DEF   0.00 001 
fr14n03   Idle   1:1   256:256      9752:9752   424192:424192  1.00  AIX43  R6000 DEF   0.00 001 
fr14n05   Idle   1:1   256:256      9920:9920   434088:434088  1.00  AIX43  R6000 DEF   0.00 001 
fr14n07   Idle   1:1   256:256      2196:2196   434224:434224  1.00  AIX43  R6000 DEF   0.00 001 
fr14n09   Idle   1:1   256:256      9368:9368   434568:434568  1.00  AIX43  R6000 DEF   0.00 001 
fr14n11   Idle   1:1   256:256      9880:9880   434172:434172  1.00  AIX43  R6000 DEF   0.00 001 
fr14n13   Idle   1:1   256:256      9760:9760   433952:433952  1.00  AIX43  R6000 DEF   0.00 001 
fr14n15   Idle   1:1   256:256     25000:25000  434044:434044  1.00  AIX43  R6000 DEF   0.00 001 
fr17n05   Idle   1:1   128:128     10016:10016  182720:182720  1.00  AIX43  R6000 DEF   0.00 000 
...
Total Nodes: 196  (Active: 156  Idle: 40  Down: 0)

The grep gets the command header and the idle nodes listed. All the idle nodes with 256 MB of memory installed already have a reservation. (See the Rsv column.) The rest of the idle nodes only have 128 MB of memory.

> checknode fr10n09

node fr10n09

State:      Idle  (in current state for 4:21:00)
Configured Resources: PROCS: 1  MEM: 256M  SWAP: 401G  DISK: 9780M
Utilized   Resources: [NONE]
Dedicated  Resources: [NONE]
..
Total Time: 4:50:00  Up: 4:50:00 (100.00%)  Active: 00:34:30 (11.90%)

Reservations:
  Job 'fr8n01.963.0'(x1)  3:25:00 -> 12:25:00 (9:00:00)

Using checknode revealed that Job fr8n01.963.0 has the reservation.

Moving ahead:

> mschedctl -S 500I;showstats -v

scheduling will stop in 4:10:00 at iteration 1080
...
Eligible/Idle Jobs:                   16/16        (100.000%)
Active Jobs:                          11
Successful/Completed Jobs:             2/25        (8.000%)
Preempt Jobs:                          0
Avg/Max QTime (Hours):              0.00/0.00
Avg/Max XFactor:                    0.00/1.04
Avg/Max Bypass:                     0.00/13.00

Dedicated/Total ProcHours:       1545.44/1765.63   (87.529%)
Preempt/Dedicated ProcHours:        0.00/1545.44   (0.000%)

Current Active/Total Procs:          156/196       (79.592%)

Avg WallClock Accuracy:           9.960%
Avg Job Proc Efficiency:        100.000%
Min System Utilization:          79.592% (on iteration 33)
Est/Avg Backlog (Hours):            0.00/20289.84

We now know that the scheduler is scheduling efficiently. So far, system utilization as reported by showstats -v looks very good. An important and subjective question is whether the scheduler is scheduling fairly. Look at the user and group statistics to see if there are any glaring problems.

> showstats -u

statistics initialized Wed Dec 31 17:00:00
         |------ Active ------|--------------------------------- Completed -----------------------------------|
user      Jobs Procs ProcHours Jobs    %    PHReq    %    PHDed    %   FSTgt  AvgXF  MaxXF  AvgQH  Effic  WCAcc
520          1    46    172.88    1   0.00  356.5   0.00  541.3   0.00 -----   1.04   0.00   0.35 100.00 100.00
550          1    20    301.83    7   0.00 3360.0   0.00  283.7   0.00 -----   0.03   0.00   0.06 100.00   3.17
524          1    32    239.73 ---- ------ ------ ------  272.3   0.00 ----- ------ ------ ------ 100.00 ------
570          1    20    301.00   14   0.00 6720.0   0.00  199.5   0.00 -----   0.01   0.00   0.20 100.00   0.34
588          0     0      0.00    1   0.00  768.0   0.00  159.7   0.00 -----   0.21   0.00   0.00 100.00  20.80
578          6     6    146.82 ---- ------ ------ ------   53.2   0.00 ----- ------ ------ ------ 100.00 ------
586          1    32    265.07 ---- ------ ------ ------   22.9   0.00 ----- ------ ------ ------ 100.00 ------
517          0     0      0.00    1   0.00  432.0   0.00    4.8   0.00 -----   0.02   0.00   0.12 100.00   1.10
529          0     0      0.00    1   0.00    0.4   0.00    1.3   0.00 -----   1.00   0.00   0.00 100.00 100.00

> showstats -g

statistics initialized Wed Dec 31 17:00:00

         |------ Active ------|--------------------------------- Completed -----------------------------------|
group     Jobs Procs ProcHours Jobs    %    PHReq    %    PHDed    %   FSTgt  AvgXF  MaxXF  AvgQH  Effic  WCAcc
503          1    32    239.73    1   0.00  432.0   0.00  277.1   0.00 -----   0.02   0.00   0.12 100.00   1.10
501          1    32    265.07 ---- ------ ------ ------   22.9   0.00 ----- ------ ------ ------ 100.00 ------
519          9    92    922.54   24   0.00 11204.9   0.00 1238.6   0.00 -----   0.11   0.00   0.15 100.00  10.33

Suppose you need to now take down the entire system for maintenance on Thursday from 2:00 to 8:00 a.m. To do this, create a reservation with mrsvctl -c.

> mrsvctl -c -t ALL -s 2:00_02/17 -d 6:00:00

Shut down the scheduler.

> mschedctl -k

moab will be shutdown immediately