(Click to open topic with navigation)
The showres command shows all reservations currently on the system.
> showres
ReservationID Type S Start End Duration N/P StartTime
fr8n01.187.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03
fr8n01.189.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03
fr8n01.190.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03
fr8n01.191.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03
fr8n01.276.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03
fr1n04.362.0 Job I 1:00:00:00 2:00:00:00 1:00:00:00 20/20 Tue Feb 17 11:54:03
fr1n04.369.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03
fr1n04.487.0 Job R 00:00:00 1:00:00:00 1:00:00:00 20/20 Mon Feb 16 11:54:03
fr8n01.804.0 Job R 00:00:00 00:05:00 00:05:00 5/5 Mon Feb 16 11:54:03
fr8n01.960.0 Job R 00:00:00 1:00:00:00 1:00:00:00 32/32 Mon Feb 16 11:54:03
10 reservations located
Here, the S column is the job's state(R = running, I = idle). All the active jobs have a reservation along with idle job fr1n04.362.0. This reservation was actually created by the backfill scheduler for the highest priority idle job as a way to prevent starvation while lower priority jobs were being backfilled (The backfill documentation describes the mechanics of the backfill scheduling more fully.).
To display information about the nodes that job fr1n04.362.0 has reserved, use showres -n <JOBID>.
> showres -n fr1n04.362.0 reservations on Mon Feb 16 11:54:03 NodeName Type ReservationID JobState Task Start Duration StartTime fr5n09 Job fr1n04.362.0 Idle 1 1:00:00:00 1:00:00:00 Tue Feb 17 11:54:03 ... fr7n15 Job fr1n04.362.0 Idle 1 1:00:00:00 1:00:00:00 Tue Feb 17 11:54:03 20 nodes reserved
Now advance the simulator an iteration to allow some jobs to actually run.
> mschedctl -S scheduling will stop in 00:00:30 at iteration 2
Next, check the queues to see what happened.
> showq active jobs------------------------ JOBNAME USERNAME STATE PROC REMAINING STARTTIME fr8n01.804.0 529 Running 5 00:04:30 Mon Feb 16 11:54:03 fr8n01.187.0 570 Running 20 23:59:30 Mon Feb 16 11:54:03 ... 9 active jobs 177 of 196 Processors Active (90.31%) eligible jobs---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME ... fr8n01.963.0 586 Idle 32 9:00:00 Mon Feb 16 11:54:33 fr8n01.1016.0 570 Idle 20 1:00:00:00 Mon Feb 16 11:54:33 16 eligible jobs ...
Two new jobs, fr8n01.963.0 and fr8n01.1016.0, are in the eligible queue. Also, note that the first job will now complete in 4 minutes 30 seconds rather than 5 minutes because we have just advanced now by 30 seconds, one RMPOLLINTERVAL. It is important to note that when the simulated jobs were created, both the job's wallclock limit and its actual run time were recorded. The wallclock limit is specified by the user indicating their best estimate of an upper bound on how long the job will run. The run time is how long the job actually ran before completing and releasing its allocated resources. For example, a job with a wallclock limit of 1 hour will be given the needed resources for up to an hour but may complete in only 20 minutes.
Stop the simulation at iteration 6.
> mschedctl -s 6I scheduling will stop in 00:03:00 at iteration 6
The -s 6I argument indicates that the scheduler will stop at iteration 6 and will (I)gnore user input until it gets there. This prevents the possibility of obtaining showq output from iteration 5 rather than iteration 6.
> showq active jobs------------------------ JOBNAME USERNAME STATE PROC REMAINING STARTTIME fr8n01.804.0 529 Running 5 00:02:30 Mon Feb 16 11:54:03 ... fr1n04.501.0 570 Running 20 1:00:00:00 Mon Feb 16 11:56:33 fr8n01.388.0 550 Running 20 1:00:00:00 Mon Feb 16 11:56:33 9 active jobs 177 of 196 Processors Active (90.31%) ... 14 eligible jobs ...
Job fr8n01.804.0 is still 2 minutes 30 seconds away from completing as expected but notice that both jobs fr8n01.189.0 and fr8n01.191.0 have completed early. Although they had almost 24 hours remaining of wallclock limit, they terminated. In reality, they probably failed on the real world system where the trace file was being created. Their completion freed up 40 processors which the scheduler was able to immediately use by starting several more jobs.
Note the system statistics:
> showstats ... Successful/Completed Jobs: 0/2 (0.000%) ... Avg WallClock Accuracy: 0.150% Avg Job Proc Efficiency: 100.000% Est/Avg Backlog (Hours): 0.00/3652178.74
A few more fields are filled in now that some jobs have completed providing information on which to generate statistics.
Decrease the default LOGLEVEL with mschedctl -m to avoid unnecessary logging, and speed up the simulation.
> mschedctl -m LOGLEVEL 0 INFO: parameter modified
You can use mschedctl -m to immediately change the value of any parameter. The change is only made to the currently running Moab server and is not propagated to the configuration file. Changes can also be made by modifying the configuration file and restarting the scheduler.
Stop at iteration 580 and pull up the scheduler's statistics.
> mschedctl -s 580I; showq scheduling will stop in 4:47:00 at iteration 580 ... 11 active jobs 156 of 196 Processors Active (79.59%) eligible jobs---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME fr8n01.963.0 586 Idle 32 9:00:00 Mon Feb 16 11:54:33 fr8n01.1075.0 560 Idle 32 23:56:00 Mon Feb 16 11:58:33 fr8n01.1076.0 560 Idle 16 23:56:00 Mon Feb 16 11:59:33 fr1n04.1953.0 520 Idle 46 7:45:00 Mon Feb 16 12:03:03 ... 16 eligible jobs ...
You may note that showq hangs a while as the scheduler simulates up to iteration 580. The output shows that currently only 156 of the 196 nodes are busy, yet at first glance 3 jobs, fr8n01.963.0, fr8n01.1075.0, and fr8n01.1076.0 appear to be ready to run.
> checkjob fr8n01.963.0; checkjob fr8n01.1075.0; checkjob fr8n01.1076.0 job fr8n01.963.0 ... Network: hps_user Memory >= 256M Disk >= 0 Swap >= 0 ... Job Eligibility Analysis ------- job cannot run in partition DEFAULT (idle procs do not meet requirements : 20 of 32 procs found) idle procs: 40 feasible procs: 20 Rejection Reasons: [Memory : 20][State : 156] job fr8n01.1075.0 ... Network: hps_user Memory >= 256M Disk >= 0 Swap >= 0 ... job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 32 procs found) idle procs: 40 feasible procs: 0 Rejection Reasons: [Memory : 20][State : 156][ReserveTime : 20] job fr8n01.1076.0 ... Network: hps_user Memory >= 256M Disk >= 0 Swap >= 0 ... job cannot run in partition DEFAULT (idle procs do not meet requirements : 0 of 16 procs found) idle procs: 40 feasible procs: 0 Rejection Reasons: [Memory : 20][State : 156][ReserveTime : 20]
The checkjob command reveals that job fr8n01.963.0 only found 20 of 32 processors. The remaining 20 idle processors could not be used because the configured memory on the node did not meet the jobs requirements. The other jobs cannot find enough nodes because of ReserveTime. This indicates that the processors are idle, but that they have a reservation in place that will start before the job being checked could complete.
Verify that the idle nodes do not have enough memory configured and they are already reserved with the mdiag -n command, which provides detailed information about the state of nodes Moab is currently tracking. The mdiag command can be used with various flags to obtain detailed information about accounts, blocked jobs, fairshare, groups, jobs, nodes, QoS, reservations, the resource manager, and users. The command also performs a number of sanity checks on the data provided and will present warning messages if discrepancies are detected.
> mdiag -n -v | grep -e Name -e Idle
Name State Procs Memory Disk Swap Speed Opsys Arch Par Load Rsv ...
fr10n09 Idle 1:1 256:256 9780:9780 411488:411488 1.00 AIX43 R6000 DEF 0.00 001 .
fr10n11 Idle 1:1 256:256 8772:8772 425280:425280 1.00 AIX43 R6000 DEF 0.00 001 .
fr10n13 Idle 1:1 256:256 9272:9272 441124:441124 1.00 AIX43 R6000 DEF 0.00 001 .
fr10n15 Idle 1:1 256:256 8652:8652 440776:440776 1.00 AIX43 R6000 DEF 0.00 001
fr11n01 Idle 1:1 256:256 7668:7668 438624:438624 1.00 AIX43 R6000 DEF 0.00 001
fr11n03 Idle 1:1 256:256 9548:9548 424584:424584 1.00 AIX43 R6000 DEF 0.00 001
fr11n05 Idle 1:1 256:256 11608:11608 454476:454476 1.00 AIX43 R6000 DEF 0.00 001
fr11n07 Idle 1:1 256:256 9008:9008 425292:425292 1.00 AIX43 R6000 DEF 0.00 001
fr11n09 Idle 1:1 256:256 8588:8588 424684:424684 1.00 AIX43 R6000 DEF 0.00 001
fr11n11 Idle 1:1 256:256 9632:9632 424936:424936 1.00 AIX43 R6000 DEF 0.00 001
fr11n13 Idle 1:1 256:256 9524:9524 425432:425432 1.00 AIX43 R6000 DEF 0.00 001
fr11n15 Idle 1:1 256:256 9388:9388 425728:425728 1.00 AIX43 R6000 DEF 0.00 001
fr14n01 Idle 1:1 256:256 6848:6848 424260:424260 1.00 AIX43 R6000 DEF 0.00 001
fr14n03 Idle 1:1 256:256 9752:9752 424192:424192 1.00 AIX43 R6000 DEF 0.00 001
fr14n05 Idle 1:1 256:256 9920:9920 434088:434088 1.00 AIX43 R6000 DEF 0.00 001
fr14n07 Idle 1:1 256:256 2196:2196 434224:434224 1.00 AIX43 R6000 DEF 0.00 001
fr14n09 Idle 1:1 256:256 9368:9368 434568:434568 1.00 AIX43 R6000 DEF 0.00 001
fr14n11 Idle 1:1 256:256 9880:9880 434172:434172 1.00 AIX43 R6000 DEF 0.00 001
fr14n13 Idle 1:1 256:256 9760:9760 433952:433952 1.00 AIX43 R6000 DEF 0.00 001
fr14n15 Idle 1:1 256:256 25000:25000 434044:434044 1.00 AIX43 R6000 DEF 0.00 001
fr17n05 Idle 1:1 128:128 10016:10016 182720:182720 1.00 AIX43 R6000 DEF 0.00 000
...
Total Nodes: 196 (Active: 156 Idle: 40 Down: 0)
The grep gets the command header and the idle nodes listed. All the idle nodes with 256 MB of memory installed already have a reservation. (See the Rsv column.) The rest of the idle nodes only have 128 MB of memory.
> checknode fr10n09 node fr10n09 State: Idle (in current state for 4:21:00) Configured Resources: PROCS: 1 MEM: 256M SWAP: 401G DISK: 9780M Utilized Resources: [NONE] Dedicated Resources: [NONE] .. Total Time: 4:50:00 Up: 4:50:00 (100.00%) Active: 00:34:30 (11.90%) Reservations: Job 'fr8n01.963.0'(x1) 3:25:00 -> 12:25:00 (9:00:00)
Using checknode revealed that Job fr8n01.963.0 has the reservation.
Moving ahead:
> mschedctl -S 500I;showstats -v scheduling will stop in 4:10:00 at iteration 1080 ... Eligible/Idle Jobs: 16/16 (100.000%) Active Jobs: 11 Successful/Completed Jobs: 2/25 (8.000%) Preempt Jobs: 0 Avg/Max QTime (Hours): 0.00/0.00 Avg/Max XFactor: 0.00/1.04 Avg/Max Bypass: 0.00/13.00 Dedicated/Total ProcHours: 1545.44/1765.63 (87.529%) Preempt/Dedicated ProcHours: 0.00/1545.44 (0.000%) Current Active/Total Procs: 156/196 (79.592%) Avg WallClock Accuracy: 9.960% Avg Job Proc Efficiency: 100.000% Min System Utilization: 79.592% (on iteration 33) Est/Avg Backlog (Hours): 0.00/20289.84
We now know that the scheduler is scheduling efficiently. So far, system utilization as reported by showstats -v looks very good.