You are here: 8.0 Optimizing Scheduling Behavior - Backfill, Node Sets, and Preemption > Managing preemption > Choosing a PREEMPTPOLICY type > Using SUSPEND
|
|
8.4.3.1 Using SUSPEND |
The SUSPEND attribute for PREEMPTPOLICY causes active jobs to stop executing but to remain in memory on the allocated compute nodes.
You must mark a job as SUSPENDABLE if you want it to suspend. If not, the job will be requeued or canceled. If supported by the resource manager, you can set the job SUSPENDABLE flag when submitting the job by using the msub -r option. Otherwise, use the JOBFLAGS attribute of the associated class or QoS credential, as in this example: CLASSCFG[low] JOBFLAGS=SUSPENDABLE |
The following outlines some benefits of using SUSPEND and also lists some things you should be aware of if you choose to use it.
Advantages:
Cautions:
When using SUSPEND, you must increase your JOBRETRYTIME. By default, JOBRETRYTIME is set to 60 seconds, but when you use SUSPEND, it is recommended that you increase the time to 300 seconds (5 minutes). |
To use SUSPEND
The following steps explain and illustrate how to set up preemption with SUSPEND.
GUARANTEEDPREEMPTION TRUE PREEMPTPOLICY SUSPEND QOSCFG[test1] QFLAGS=PREEMPTEE JOBFLAGS=RESTARTABLE,SUSPENDABLE MEMBERULIST=john PRIORITY=100 QOSCFG[test2] QFLAGS=PREEMPTOR MEMBERULIST=john PRIORITY=10000
PREEMPTPOLICY SUSPEND
QOSCFG[test1] QFLAGS=PREEMPTEE JOBFLAGS=RESTARTABLE,SUSPENDABLE MEMBERULIST=john PRIORITY=100
QOSCFG[test1] QFLAGS=PREEMPTEE JOBFLAGS=RESTARTABLE,SUSPENDABLE MEMBERULIST=john PRIORITY=100
QOSCFG[test2] QFLAGS=PREEMPTOR MEMBERULIST=john PRIORITY=10000
[john@g06]$ echo sleep 120 | msub -l procs=128,walltime=120 -l qos=test1
Examine the following output for showq:
Moab.7 [john@g06]# showq active jobs------------------------ JOBID USERNAME STATE PROCS REMAINING STARTTIME Moab.7 john Running 128 00:01:59 Thu Nov 10 12:28:44 1 active job 128 of 128 processors in use by local jobs (100.00%) 2 of 2 nodes active (100.00%) eligible jobs---------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 eligible jobs blocked jobs----------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 blocked jobs Total job: 1
[john@g06]$ echo sleep 120 | msub -l procs=128,walltime=120 -l qos=test2
Examine the following output for showq:
Moab.8 [john@g06]# showq active jobs------------------------ JOBID USERNAME STATE PROCS REMAINING STARTTIME Moab.7 john Suspended 128 00:01:56 Thu Nov 10 12:28:44 Moab.8 john Running 128 00:02:00 Thu Nov 10 12:28:48 2 active jobs 128 of 128 processors in use by local jobs (100.00%) 2 of 2 nodes active (100.00%) eligible jobs---------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 eligible jobs blocked jobs----------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 blocked jobs Total jobs: 2
Note that when a job is suspended, it stays in the output of showq (see the example above). This is normal behavior for a suspended job. Moab should only suspend a job once.
checkjob test1:
[john@g06]$ checkjob Moab.9 job Moab.9 State: Suspended Creds: user:john group:john qos:test1 WallTime: 00:00:02 of 00:02:00 SubmitTime: Thu Nov 10 12:36:29 (Time Queued Total: 00:00:07 Eligible: 00:00:00) Total Requested Tasks: 128 Req[0] TaskCount: 128 Partition: licenses NodeCount: 2 Allocated Nodes: node[01-02]*64 IWD: /opt/native SubmitDir: /opt/native Executable: /opt/native/spool/moab.job.UFe8sQ StartCount: 1 Flags: RESTARTABLE,SUSPENDABLE,PREEMPTEE,GLOBALQUEUE,PROCSPECIFIED Attr: PREEMPTEE StartPriority: 100 job cannot be resumed: preemption required but job is conditional preemptor with no targets BLOCK MSG: non-idle state 'Running' (recorded at last scheduling iteration)
checkjob test2:
[john@g06]$ checkjob Moab.10 job Moab.10 State: Running Creds: user:john group:john qos:test2 WallTime: 00:00:00 of 00:02:00 SubmitTime: Thu Nov 10 12:36:31 (Time Queued Total: 00:00:00 Eligible: 00:00:00) StartTime: Thu Nov 10 12:36:31 Total Requested Tasks: 128 Req[0] TaskCount: 128 Partition: licenses Allocated Nodes: node[01-02]*64 IWD: /opt/native SubmitDir: /opt/native Executable: /opt/native/spool/moab.job.CZavjU StartCount: 1 Flags: HASPREEMPTED,PREEMPTOR,GLOBALQUEUE,PROCSPECIFIED StartPriority: 10000 Reservation 'Moab.10' (-00:00:07 -> 00:01:53 Duration: 00:02:00)
Very rarely, Moab will keep a job from restarting, holding it in a suspended state for a long period of time, if it thinks the job cannot restart. For example, if a job could write to I/O before it was suspended, and now it cannot, Moab would realize the job is unable to start and would leave it in a suspended state. |
Copyright © 2012 Adaptive Computing Enterprises, Inc.®