23.0 Preemption > Preemption tasks > Suspending jobs with preemption

Conventions

23.1.4 Suspending jobs with preemption

You must mark a job as SUSPENDABLE if you want it to suspend. If you do not, the job will be requeued or canceled when it is preempted.

If supported by the resource manager, you can set the job SUSPENDABLE flag when submitting the job by using the msub -r option. Otherwise, use the JOBFLAGS attribute of the associated class or QoS credential, as in this example:

CLASSCFG[low] JOBFLAGS=SUSPENDABLE

For more information, see Job Flags.

To preempt jobs using SUSPEND

When you use SUSPEND, you must increase your JOBRETRYTIME. By default, JOBRETRYTIME is set to 60 seconds, but when you use SUSPEND, it is recommended that you increase the time to 300 seconds (5 minutes).

  1. Make the following configurations to the moab.cfg file:
    1. Set GUARANTEEDPREEMPTION to TRUE. (This causes Moab to lock PREEMPTOR jobs until JOBRETRYTIME expires.)
    2. Make sure that JOBNODEMATCHPOLICY is not set to EXACTNODE, which is not currently supported for preemption (for more information, see Testing and troubleshooting preemption).
    3. Set PREEMPTPOLICY to SUSPEND (for more information, see PREEMPTPOLICY types).
    4. For the PREEMPTEE job, set JOBFLAGS=RESTARTABLE,SUSPENDABLE.
    5. Make sure that the PREEMPTEE job has a lower priority than the PREEMPTOR job (for more information, see Preemption flags).
  2. For example:

    GUARANTEEDPREEMPTION TRUE
    PREEMPTPOLICY SUSPEND
     
    QOSCFG[test1] QFLAGS=PREEMPTEE JOBFLAGS=RESTARTABLE,SUSPENDABLE MEMBERULIST=john PRIORITY=100
    QOSCFG[test2] QFLAGS=PREEMPTOR MEMBERULIST=john PRIORITY=10000
  3. Submit a job to the preemptee QoS (test1). For example:
  4. [john@g06]$ echo sleep 120 | msub -l procs=128,walltime=120 -l qos=test1

    (Optional) Examine the output for showq:

    Moab.7
    [john@g06]# showq
     
    active jobs------------------------
    JOBID     USERNAME    STATE      PROCS     REMAINING     STARTTIME
    Moab.7    john        Running    128       00:01:59      Thu Nov 10 12:28:44
     
    1 active job     128 of 128 processors in use by local jobs (100.00%)
                     2 of 2 nodes active (100.00%)
     
    eligible jobs----------------------
    JOBID     USERNAME    STATE     PROCS      WCLIMIT       QUEUETIME
     
    0 eligible jobs
     
    blocked jobs-----------------------
    JOBID     USERNAME    STATE     PROCS      WCLIMIT       QUEUETIME
     
    0 blocked jobs
     
    Total job: 1
  5. Now submit a job to the preemptor QoS (test2). For example:
  6. [john@g06]$ echo sleep 120 | msub -l procs=128,walltime=120 -l qos=test2

    (Optional) Examine the output for showq:

    Moab.8
    [john@g06]# showq
     
    active jobs------------------------
    JOBID     USERNAME    STATE      PROCS     REMAINING     STARTTIME
    Moab.7    john        Suspended  128       00:01:56      Thu Nov 10 12:28:44
    Moab.8    john        Running    128       00:02:00      Thu Nov 10 12:28:48
     
    2 active jobs 128 of 128 processors in use by local jobs (100.00%)
                  2 of 2 nodes active (100.00%)
     
    eligible jobs----------------------
    JOBID     USERNAME    STATE     PROCS      WCLIMIT       QUEUETIME
     
    0 eligible jobs
     
    blocked jobs-----------------------
    JOBID     USERNAME    STATE     PROCS      WCLIMIT       QUEUETIME
     
    0 blocked jobs
     
    Total jobs: 2

    Note that when a job is suspended, it stays in the output of showq. This is normal behavior for a suspended job. Moab should only suspend a job once.

  7. (Optional) Examine the checkjob outputs for these two jobs.
  8. [john@g06]$ checkjob Moab.9
    job Moab.9
     
    State: Suspended
    Creds: user:john group:john qos:test1
    WallTime: 00:00:02 of 00:02:00
    SubmitTime: Thu Nov 10 12:36:29
    (Time Queued Total: 00:00:07 Eligible: 00:00:00)
     
    Total Requested Tasks: 128
     
    Req[0] TaskCount: 128 Partition: licenses
    NodeCount: 2
     
    Allocated Nodes:
    node[01-02]*64
     
     
    IWD: /opt/native
    SubmitDir: /opt/native
    Executable: /opt/native/spool/moab.job.UFe8sQ
     
    StartCount: 1
    Flags: RESTARTABLE,SUSPENDABLE,PREEMPTEE,GLOBALQUEUE,PROCSPECIFIED
    Attr: PREEMPTEE
    StartPriority: 100
    job cannot be resumed: preemption required but job is conditional preemptor with no targets
    BLOCK MSG: non-idle state 'Running' (recorded at last scheduling iteration)
    [john@g06]$ checkjob Moab.10
    job Moab.10
     
    State: Running
    Creds: user:john group:john qos:test2
    WallTime: 00:00:00 of 00:02:00
    SubmitTime: Thu Nov 10 12:36:31
    (Time Queued Total: 00:00:00 Eligible: 00:00:00)
     
    StartTime: Thu Nov 10 12:36:31
    Total Requested Tasks: 128
     
    Req[0] TaskCount: 128 Partition: licenses
     
    Allocated Nodes:
    node[01-02]*64
     
     
    IWD: /opt/native
    SubmitDir: /opt/native
    Executable: /opt/native/spool/moab.job.CZavjU
     
    StartCount: 1
    Flags: HASPREEMPTED,PREEMPTOR,GLOBALQUEUE,PROCSPECIFIED
    StartPriority: 10000
    Reservation 'Moab.10' (-00:00:07 -> 00:01:53 Duration: 00:02:00)

Occasionally, Moab will keep a job from restarting, holding it in a suspended state for a long period of time, if it thinks the job cannot restart. For example, if a job could write to I/O before it was suspended, and now it cannot, Moab would realize the job is unable to start and would leave it in a suspended state.

Related topics 

© 2014 Adaptive Computing