The first step of most simulations is to determine the primary purpose of the simulation. Purposes may include identifying impact of certain resource or workload changes on current cluster performance. Simulations may also focus on system utilization or workload distribution across resources or credentials. Further, simulations may also be used for training purposes, allowing risk-free evaluation of behavior, facilities, and commands. With the purpose known, metrics of success can be specified and a proper simulation created. While performance metrics may not be critical to training based simulations, they are key to successful evaluation in most other cases.
As in the real world, a simulation requires a set of resources (compute hosts) on which to run. In Moab, this is specified using a resource trace file. This resource trace file may be obtained from specific hardware or generated for the specific purpose.
In addition to resources, a simulation also requires a workload (batch jobs) to schedule onto the available resources. This workload is specified within a workload trace file. Like the resource traces, this workload information may be based on recorded data or generated to meet the need of the particular simulation.
The final aspect of a simulation is the set of policies and configuration to be used to determine how a workload is to be scheduled onto the available resources. This configuration is placed in the moab.cfg file just as would be done in production (or normal) mode operation.
While mastering simulations may take some time, initial configuration is straightforward. To start, edit the moab.cfg file and do the following:
SIMRESOURCETRACEFILE samples/resource.testcluster.txt SIMWORKLOADTRACEFILE samples/workload.testcluster.txt SIMSTOPITERATION 0
The preceding steps specify that the scheduler should run in simulation mode and use the referenced resource and workload trace files. In addition, leaving the SIMSTOPITERATION parameter at zero indicates that Moab should stop before the first scheduling iteration and wait for further instructions. If you want the simulation to run as soon as you start Moab, remove (or comment out) this line. To continue scheduling, run the mschedctl -r command.
CREDDISCOVERY TRUE SIMAUTOSHUTDOWN false SIMSTARTTIME 1196987696 USERCFG[DEFAULT] ENABLEPROFILING=true GROUPCFG[DEFAULT] ENABLEPROFILING=true ACCOUNTCFG[DEFAULT] ENABLEPROFILING=true CLASSCFG[DEFAULT] ENABLEPROFILING=true QOSCFG[DEFAULT] ENABLEPROFILING=true
The second set of parameters is helpful if you want to generate charts or reports from Moab Cluster Manager. Since events in the workload trace may reference credentials that are not listed in your moab.cfg file, set CREDDISCOVERY to true, which allows Moab to create simulated credentials for credentials that do not yet exist. Setting SIMAUTOSHUTDOWN to false prevents Moab from terminating after it has finished running all the jobs in the workload trace, and it allows you to generate charts after all the simulated jobs have finished. Ensure that SIMSTARTTIME is set to the epoch time (in seconds) of the first event in your workload trace file. This causes the internal clock in Moab to be set to the workload trace's first event, which prevents issues caused by the difference between the time the workload trace was created and the time reported by the CPU clock. Otherwise, Moab thinks the current time is the time that the CPU clock reports, yet simulated jobs that are reported by showq as currently running will really be running at the time the workload trace was created. To avoid confusion, set the SIMSTARTTIME. The lines that specify ENABLEPROFILING=true are necessary for Moab to keep track of the statistics generated by the simulated jobs. Not setting these lines will cause charts and reports to contain all zero values.
As in all cases, Moab should be started by issuing the command moab. It should be noted that in simulation mode, Moab does not daemonize itself and so will not background itself. Verification of proper operation is possible using any common user command such as showq. If the showq command is run, it will display the number of jobs currently in the scheduler's queue. The jobs displayed by the showq command are taken from the workload trace file specified earlier and those that are marked as running are running on resources described in the resource trace file. At any point, a detailed summary of available resources may be obtained by running the mdiag -n command.
The rest of this section provides an interactive tutorial to demonstrate the basics of the simulator's capacities in Moab. The commands to issue are formatted as follows: > showq along with the expected output.
The following commands are used:
Start by running Moab:
> moab&
Next, verify that Moab is running by executing showq:
> showq active jobs------------------------ JOBNAME USERNAME STATE PROC REMAINING STARTTIME fr8n01.187.0 570 Running 20 1:00:00:00 Mon Feb 16 11:54:03 fr8n01.189.0 570 Running 20 1:00:00:00 Mon Feb 16 11:54:03 fr8n01.190.0 570 Running 20 1:00:00:00 Mon Feb 16 11:54:03 fr8n01.191.0 570 Running 20 1:00:00:00 Mon Feb 16 11:54:03 fr8n01.276.0 550 Running 20 1:00:00:00 Mon Feb 16 11:54:03 fr1n04.369.0 550 Running 20 1:00:00:00 Mon Feb 16 11:54:03 fr1n04.487.0 550 Running 20 1:00:00:00 Mon Feb 16 11:54:03 7 active jobs 140 of 196 Processors Active (71.43%) eligible jobs---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME fr1n04.362.0 550 Idle 20 1:00:00:00 Mon Feb 16 11:53:33 fr1n04.363.0 550 Idle 20 1:00:00:00 Mon Feb 16 11:53:33 fr1n04.365.0 550 Idle 20 1:00:00:00 Mon Feb 16 11:53:33 fr1n04.366.0 550 Idle 20 1:00:00:00 Mon Feb 16 11:53:33 fr1n04.501.0 570 Idle 20 1:00:00:00 Mon Feb 16 11:53:33 fr1n04.580.0 570 Idle 20 1:00:00:00 Mon Feb 16 11:53:33 fr1n04.597.0 570 Idle 20 1:00:00:00 Mon Feb 16 11:53:33 fr1n04.598.0 570 Idle 20 1:00:00:00 Mon Feb 16 11:53:33 fr1n04.602.0 570 Idle 20 1:00:00:00 Mon Feb 16 11:53:33 9 eligible jobs blocked jobs----------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 0 blocked jobs Total jobs: 16
Out of the thousands of jobs in the workload trace, only 16 jobs are either active or eligible because of the default settings of the SIMINITIALQUEUEDEPTH parameter. Sixteen jobs are put in the idle queue, seven of which immediately run. Issuing the command showq -r allows a more detailed look at the active (or running) jobs. The output is sorted by job completion time and indicates that the first job will complete in one day (1:00:00:00).
While showq details information about the queues, scheduler statistics may be viewed using the showstats command. The field Current Active/Total Procs shows current system utilization, for example.
> showstats moab active for 00:00:30 stats initialized on Mon Feb 16 11:53:33 Eligible/Idle Jobs: 9/9 (100.000%) Active Jobs: 0 Successful/Completed Jobs: 0/0 (0.000%) Avg/Max QTime (Hours): 0.00/0.00 Avg/Max XFactor: 0.00/0.00 Dedicated/Total ProcHours: 1.17/1.63 (71.429%) Current Active/Total Procs: 140/196 (71.429%) Avg WallClock Accuracy: N/A Avg Job Proc Efficiency: N/A Est/Avg Backlog (Hours): N/A / N/A
You might be wondering why there are only 140 of 196 Processors Active (as shown with showq) when the first job (fr1n04.362.0) in the queue only requires 20 processors. We will use the checkjob command, which reports detailed job state information and diagnostic output for a particular job to determine why it is not running:
> checkjob fr1n04.362.0 job fr1n04.362.0 State: Idle ... Network: hps_user Memory >= 256M Disk >= 0 Swap >= 0 ... Job Eligibility Analysis ------- job cannot run in partition DEFAULT (idle procs do not meet requirements : 8 of 20 procs found) idle procs: 56 feasible procs: 8 Rejection Reasons: [Memory : 48][State : 140]
Checkjob not only tells us the job's wallclock limit and the number of requested nodes (they're in the ellipsis) but explains why the job was rejected from running. The Job Eligibility Analysis tells us that 48 of the processors rejected this job due to memory limitations and that another 140 processors rejected it because of their state (that is, they're running other jobs). Notice the >= 256 M(B) memory requirement.
If you run checkjob with the ID of a running job, it would also tell us exactly which nodes have been allocated to this job. There is additional information that the checkjob command page describes in more detail.
Advancing the simulator an iteration, the following happens:
> mschedctl -S scheduling will stop in 00:00:30 at iteration 1
The scheduler control command, mschedctl, controls various aspects of scheduling behavior. It can be used to manage scheduling activity, kill the scheduler, and create resource trace files. The -S argument indicates that the scheduler run for a single iteration and stop. Specifying a number, n, after -S causes the simulator to advance n steps. You can determine what iteration you are currently on using showstats -v.
> showstats -v current scheduler time: Mon Feb 16 11:54:03 1998 (887655243) moab active for 00:01:00 stats initialized on Mon Feb 16 11:53:33 statistics for iteration 1 scheduler started on Wed Dec 31 17:00:00 ...
The line that starts with statistics for iteration <X> specifies the iteration you are currently on. Each iteration advances the simulator RMPOLLINTERVAL seconds. By default, RMPOLLINTERVAL is set to 30 seconds. To see what RMPOLLINTERVAL is set to, use the showconfig command:
> showconfig | grep RMPOLLINTERVAL RMPOLLINTERVAL 00:00:30
The showq -r command can be used to display the running (active) jobs to see what happened in the last iteration:
> showq -r active jobs------------------------ JOBID S PAR EFFIC XFACTOR Q USER GROUP MHOST PROCS REMAINING STARTTIME fr8n01.804.0 R 1 ------ 1.0 - 529 519 fr9n16 5 00:05:00 Mon Feb 16 11:54:03 fr8n01.187.0 R 1 ------ 1.0 - 570 519 fr7n15 20 1:00:00:00 Mon Feb 16 11:54:03 ... fr8n01.960.0 R 1 ------ 1.0 - 588 519 fr9n11 32 1:00:00:00 Mon Feb 16 11:54:03 9 active jobs 177 of 196 Processors Active (90.31%) Total jobs: 9
Notice that two new jobs started (without waiting in the eligible queue). Also notice that job fr8n01.187.0, along with the rest that are summarized in the ellipsis, did NOT advance its REMAINING or STARTTIME. The simulator needs one iteration to do a sanity check. Setting the parameter SIMSTOPITERATION to 1 causes Moab to stop after the first scheduling iteration and wait for further instructions.
The showq -i command displays the idle (eligible) jobs.
> showq -i eligible jobs---------------------- JOBID PRIORITY XFACTOR Q USER GROUP PROCS WCLIMIT CLASS SYSTEMQUEUETIME fr1n04.362.0* 1 1.0 - 550 519 20 1:00:00:00 batch Mon Feb 16 11:53:33 fr1n04.363.0 1 1.0 - 550 519 20 1:00:00:00 batch Mon Feb 16 11:53:33 fr1n04.365.0 1 1.0 - 550 519 20 1:00:00:00 batch Mon Feb 16 11:53:33 fr1n04.366.0 1 1.0 - 550 519 20 1:00:00:00 batch Mon Feb 16 11:53:33 fr1n04.501.0 1 1.0 - 570 519 20 1:00:00:00 batch Mon Feb 16 11:53:33 fr1n04.580.0 1 1.0 - 570 519 20 1:00:00:00 batch Mon Feb 16 11:53:33 fr1n04.597.0 1 1.0 - 570 519 20 1:00:00:00 batch Mon Feb 16 11:53:33 fr1n04.598.0 1 1.0 - 570 519 20 1:00:00:00 batch Mon Feb 16 11:53:33 fr1n04.602.0 1 1.0 - 570 519 20 1:00:00:00 batch Mon Feb 16 11:53:33 fr1n04.743.0 1 1.0 - 570 519 20 1:00:00:00 batch Mon Feb 16 11:54:03 fr1n04.744.0 1 1.0 - 570 519 20 1:00:00:00 batch Mon Feb 16 11:54:03 fr1n04.746.0 1 1.0 - 570 519 20 1:00:00:00 batch Mon Feb 16 11:54:03 fr1n04.747.0 1 1.0 - 570 519 20 1:00:00:00 batch Mon Feb 16 11:54:03 fr8n01.388.0 1 1.0 - 550 519 20 1:00:00:00 batch Mon Feb 16 11:54:03 14 eligible jobs Total jobs: 14
Notice how none of the eligible jobs are requesting 19 or fewer jobs (the number of idle processors). Also notice the * after the job id fr1n04.362.0. This means that this job now has a reservation. The showres command shows all reservations currently on the system.
> showres ReservationID Type S Start End Duration N/P StartTime fr8n01.187.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03 fr8n01.189.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03 fr8n01.190.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03 fr8n01.191.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03 fr8n01.276.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03 fr1n04.362.0 Job I 1:00:00:00 2:00:00:00 1:00:00:00 20/20 Tue Feb 17 11:54:03 fr1n04.369.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03 fr1n04.487.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03 fr8n01.804.0 Job R 00:00:00 00:05:00 00:05:00 5/5 Mon Feb 16 11:54:03 fr8n01.960.0 Job R 00:00:00 1:00:00:00 1:00:00:00 32/32 Mon Feb 16 11:54:03 10 reservations located
Here, the S column is the job's state(R = running, I = idle). All the active jobs have a reservation along with idle job fr1n04.362.0. This reservation was actually created by the backfill scheduler for the highest priority idle job as a way to prevent starvation while lower priority jobs were being backfilled. (The backfill documentation describes the mechanics of the backfill scheduling more fully.)
To display information about the nodes that job fr1n04.362.0 has reserved, use showres -n <JOBID>.
> showres -n fr1n04.362.0 reservations on Mon Feb 16 11:54:03 NodeName Type ReservationID JobState Task Start Duration StartTime fr5n09 Job fr1n04.362.0 Idle 1 1:00:00:00 1:00:00:00 Tue Feb 17 11:54:03 ... fr7n15 Job fr1n04.362.0 Idle 1 1:00:00:00 1:00:00:00 Tue Feb 17 11:54:03 20 nodes reserved
Now advance the simulator an iteration to allow some jobs to actually run.
> mschedctl -S scheduling will stop in 00:00:30 at iteration 2
Next, check the queues to see what happened.
> showq active jobs------------------------ JOBNAME USERNAME STATE PROC REMAINING STARTTIME fr8n01.804.0 529 Running 5 00:04:30 Mon Feb 16 11:54:03 fr8n01.187.0 570 Running 20 23:59:30 Mon Feb 16 11:54:03 ... 9 active jobs 177 of 196 Processors Active (90.31%) eligible jobs---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME ... fr8n01.963.0 586 Idle 32 9:00:00 Mon Feb 16 11:54:33 fr8n01.1016.0 570 Idle 20 1:00:00:00 Mon Feb 16 11:54:33 16 eligible jobs ...
Two new jobs, fr8n01.963.0 and fr8n01.1016.0, are in the eligible queue. Also, note that the first job will now complete in 4 minutes 30 seconds rather than 5 minutes because we have just advanced now by 30 seconds, one RMPOLLINTERVAL. It is important to note that when the simulated jobs were created, both the job's wallclock limit and its actual run time were recorded. The wallclock limit is specified by the user indicating their best estimate of an upper bound on how long the job will run. The run time is how long the job actually ran before completing and releasing its allocated resources. For example, a job with a wallclock limit of 1 hour will be given the needed resources for up to an hour but may complete in only 20 minutes.
Stop the simulation at iteration 6.
> mschedctl -s 6I scheduling will stop in 00:03:00 at iteration 6
The -s 6I argument indicates that the scheduler will stop at iteration 6 and will (I)gnore user input until it gets there. This prevents the possibility of obtaining showq output from iteration 5 rather than iteration 6.
> showq active jobs------------------------ JOBNAME USERNAME STATE PROC REMAINING STARTTIME fr8n01.804.0 529 Running 5 00:02:30 Mon Feb 16 11:54:03 ... fr1n04.501.0 570 Running 20 1:00:00:00 Mon Feb 16 11:56:33 fr8n01.388.0 550 Running 20 1:00:00:00 Mon Feb 16 11:56:33 9 active jobs 177 of 196 Processors Active (90.31%) ... 14 eligible jobs ...
Job fr8n01.804.0 is still 2 minutes 30 seconds away from completing as expected but notice that both jobs fr8n01.189.0 and fr8n01.191.0 have completed early. Although they had almost 24 hours remaining of wallclock limit, they terminated. In reality, they probably failed on the real world system where the trace file was being created. Their completion freed up 40 processors which the scheduler was able to immediately use by starting several more jobs.
Note the system statistics:
> showstats ... Successful/Completed Jobs: 0/2 (0.000%) ... Avg WallClock Accuracy: 0.150% Avg Job Proc Efficiency: 100.000% Est/Avg Backlog (Hours): 0.00/3652178.74
A few more fields are filled in now that some jobs have completed providing information on which to generate statistics.
Decrease the default LOGLEVEL with mschedctl -m to avoid unnecessary logging, and speed up the simulation.
> mschedctl -m LOGLEVEL 0 INFO: parameter modified
You can use mschedctl -m to immediately change the value of any parameter. The change is only made to the currently running Moab server and is not propagated to the configuration file. Changes can also be made by modifying the configuration file and restarting the scheduler.
Stop at iteration 580 and pull up the scheduler's statistics.
> mschedctl -s 580I; showq scheduling will stop in 4:47:00 at iteration 580 ... 11 active jobs 156 of 196 Processors Active (79.59%) eligible jobs---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME fr8n01.963.0 586 Idle 32 9:00:00 Mon Feb 16 11:54:33 fr8n01.1075.0 560 Idle 32 23:56:00 Mon Feb 16 11:58:33 fr8n01.1076.0 560 Idle 16 23:56:00 Mon Feb 16 11:59:33 fr1n04.1953.0 520 Idle 46 7:45:00 Mon Feb 16 12:03:03 ... 16 eligible jobs ...
You may note that showq hangs a while as the scheduler simulates up to iteration 580. The output shows that currently only 156 of the 196 nodes are busy, yet at first glance 3 jobs, fr8n01.963.0, fr8n01.1075.0, and fr8n01.1076.0 appear to be ready to run.
> checkjob fr8n01.963.0; checkjob fr8n01.1075.0; checkjob fr8n01.1076.0 job fr8n01.963.0 ... Network: hps_user Memory >= 256M Disk >= 0 Swap >= 0 ... Job Eligibility Analysis ------- job cannot run in partition DEFAULT (idle procs do not meet requirements : 20 of 32 procs found) idle procs: 40 feasible procs: 20 Rejection Reasons: [Memory : 20][State : 156] job fr8n01.1075.0 ... Network: hps_user Memory >= 256M Disk >= 0 Swap >= 0 ... job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 32 procs found) idle procs: 40 feasible procs: 0 Rejection Reasons: [Memory : 20][State : 156][ReserveTime : 20] job fr8n01.1076.0 ... Network: hps_user Memory >= 256M Disk >= 0 Swap >= 0 ... job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 16 procs found) idle procs: 40 feasible procs: 0 Rejection Reasons: [Memory : 20][State : 156][ReserveTime : 20]
The checkjob command reveals that job fr8n01.963.0 only found 20 of 32 processors. The remaining 20 idle processors could not be used because the configured memory on the node did not meet the jobs requirements. The other jobs cannot find enough nodes because of ReserveTime. This indicates that the processors are idle, but that they have a reservation in place that will start before the job being checked could complete.
Verify that the idle nodes do not have enough memory configured and they are already reserved with the mdiag -n command, which provides detailed information about the state of nodes Moab is currently tracking. The mdiag command can be used with various flags to obtain detailed information about accounts, fair share, groups, jobs, nodes, QoS, queues, reservations, the resource manager, and users. The command also performs a number of sanity checks on the data provided and will present warning messages if discrepancies are detected.
> mdiag -n -v | grep -e Name -e Idle Name State Procs Memory Disk Swap Speed Opsys Arch Par Load Rsv ... fr10n09 Idle 1:1 256:256 9780:9780 411488:411488 1.00 AIX43 R6000 DEF 0.00 001 . fr10n11 Idle 1:1 256:256 8772:8772 425280:425280 1.00 AIX43 R6000 DEF 0.00 001 . fr10n13 Idle 1:1 256:256 9272:9272 441124:441124 1.00 AIX43 R6000 DEF 0.00 001 . fr10n15 Idle 1:1 256:256 8652:8652 440776:440776 1.00 AIX43 R6000 DEF 0.00 001 fr11n01 Idle 1:1 256:256 7668:7668 438624:438624 1.00 AIX43 R6000 DEF 0.00 001 fr11n03 Idle 1:1 256:256 9548:9548 424584:424584 1.00 AIX43 R6000 DEF 0.00 001 fr11n05 Idle 1:1 256:256 11608:11608 454476:454476 1.00 AIX43 R6000 DEF 0.00 001 fr11n07 Idle 1:1 256:256 9008:9008 425292:425292 1.00 AIX43 R6000 DEF 0.00 001 fr11n09 Idle 1:1 256:256 8588:8588 424684:424684 1.00 AIX43 R6000 DEF 0.00 001 fr11n11 Idle 1:1 256:256 9632:9632 424936:424936 1.00 AIX43 R6000 DEF 0.00 001 fr11n13 Idle 1:1 256:256 9524:9524 425432:425432 1.00 AIX43 R6000 DEF 0.00 001 fr11n15 Idle 1:1 256:256 9388:9388 425728:425728 1.00 AIX43 R6000 DEF 0.00 001 fr14n01 Idle 1:1 256:256 6848:6848 424260:424260 1.00 AIX43 R6000 DEF 0.00 001 fr14n03 Idle 1:1 256:256 9752:9752 424192:424192 1.00 AIX43 R6000 DEF 0.00 001 fr14n05 Idle 1:1 256:256 9920:9920 434088:434088 1.00 AIX43 R6000 DEF 0.00 001 fr14n07 Idle 1:1 256:256 2196:2196 434224:434224 1.00 AIX43 R6000 DEF 0.00 001 fr14n09 Idle 1:1 256:256 9368:9368 434568:434568 1.00 AIX43 R6000 DEF 0.00 001 fr14n11 Idle 1:1 256:256 9880:9880 434172:434172 1.00 AIX43 R6000 DEF 0.00 001 fr14n13 Idle 1:1 256:256 9760:9760 433952:433952 1.00 AIX43 R6000 DEF 0.00 001 fr14n15 Idle 1:1 256:256 25000:25000 434044:434044 1.00 AIX43 R6000 DEF 0.00 001 fr17n05 Idle 1:1 128:128 10016:10016 182720:182720 1.00 AIX43 R6000 DEF 0.00 000 ... Total Nodes: 196 (Active: 156 Idle: 40 Down: 0)
The grep gets the command header and the idle nodes listed. All the idle nodes with 256 MB of memory installed already have a reservation. (See the Rsv column.) The rest of the idle nodes only have 128 MB of memory.
> checknode fr10n09 node fr10n09 State: Idle (in current state for 4:21:00) Configured Resources: PROCS: 1 MEM: 256M SWAP: 401G DISK: 9780M Utilized Resources: [NONE] Dedicated Resources: [NONE] .. Total Time: 4:50:00 Up: 4:50:00 (100.00%) Active: 00:34:30 (11.90%) Reservations: Job 'fr8n01.963.0'(x1) 3:25:00 -> 12:25:00 (9:00:00)
Using checknode revealed that Job fr8n01.963.0 has the reservation.
Moving ahead:
> mschedctl -S 500I;showstats -v scheduling will stop in 4:10:00 at iteration 1080 ... Eligible/Idle Jobs: 16/16 (100.000%) Active Jobs: 11 Successful/Completed Jobs: 2/25 (8.000%) Preempt Jobs: 0 Avg/Max QTime (Hours): 0.00/0.00 Avg/Max XFactor: 0.00/1.04 Avg/Max Bypass: 0.00/13.00 Dedicated/Total ProcHours: 1545.44/1765.63 (87.529%) Preempt/Dedicated ProcHours: 0.00/1545.44 (0.000%) Current Active/Total Procs: 156/196 (79.592%) Avg WallClock Accuracy: 9.960% Avg Job Proc Efficiency: 100.000% Min System Utilization: 79.592% (on iteration 33) Est/Avg Backlog (Hours): 0.00/20289.84
We now know that the scheduler is scheduling efficiently. So far, system utilization as reported by showstats -v looks very good. An important and subjective question is whether the scheduler is scheduling fairly. Look at the user and group statistics to see if there are any glaring problems.
> showstats -u statistics initialized Wed Dec 31 17:00:00 |------ Active ------|--------------------------------- Completed -----------------------------------| user Jobs Procs ProcHours Jobs % PHReq % PHDed % FSTgt AvgXF MaxXF AvgQH Effic WCAcc 520 1 46 172.88 1 0.00 356.5 0.00 541.3 0.00 ----- 1.04 0.00 0.35 100.00 100.00 550 1 20 301.83 7 0.00 3360.0 0.00 283.7 0.00 ----- 0.03 0.00 0.06 100.00 3.17 524 1 32 239.73 ---- ------ ------ ------ 272.3 0.00 ----- ------ ------ ------ 100.00 ------ 570 1 20 301.00 14 0.00 6720.0 0.00 199.5 0.00 ----- 0.01 0.00 0.20 100.00 0.34 588 0 0 0.00 1 0.00 768.0 0.00 159.7 0.00 ----- 0.21 0.00 0.00 100.00 20.80 578 6 6 146.82 ---- ------ ------ ------ 53.2 0.00 ----- ------ ------ ------ 100.00 ------ 586 1 32 265.07 ---- ------ ------ ------ 22.9 0.00 ----- ------ ------ ------ 100.00 ------ 517 0 0 0.00 1 0.00 432.0 0.00 4.8 0.00 ----- 0.02 0.00 0.12 100.00 1.10 529 0 0 0.00 1 0.00 0.4 0.00 1.3 0.00 ----- 1.00 0.00 0.00 100.00 100.00
> showstats -g statistics initialized Wed Dec 31 17:00:00 |------ Active ------|--------------------------------- Completed -----------------------------------| group Jobs Procs ProcHours Jobs % PHReq % PHDed % FSTgt AvgXF MaxXF AvgQH Effic WCAcc 503 1 32 239.73 1 0.00 432.0 0.00 277.1 0.00 ----- 0.02 0.00 0.12 100.00 1.10 501 1 32 265.07 ---- ------ ------ ------ 22.9 0.00 ----- ------ ------ ------ 100.00 ------ 519 9 92 922.54 24 0.00 11204.9 0.00 1238.6 0.00 ----- 0.11 0.00 0.15 100.00 10.33
Suppose you need to now take down the entire system for maintenance on Thursday from 2:00 to 8:00 a.m. To do this, create a reservation with mrsvctl -c.
> mrsvctl -c -t ALL -s 2:00_02/17 -d 6:00:00
> mschedctl -k moab will be shutdown immediately