8.4.3.2 Using CHECKPOINT

For systems that allow checkpointing, the CHECKPOINT attribute for PREEMPTPOLICY allows a job to save its current state and either terminate or continue running. A checkpointed job may restart at any time and resume execution from its most recent checkpoint.

You can tune checkpointing behavior on a per-resource manager-basis by setting the CHECKPOINTSIG and CHECKPOINTTIMEOUT attributes of the RMCFG parameter.

The following outlines some benefits of using CHECKPOINT and also lists some things you should be aware of if you choose to use it.

Advantages:

Cautions:

To use CHECKPOINT

Make the following configurations to the moab.cfg file:

GUARANTEEDPREEMPTION TRUE
PREEMPTPOLICY CHECKPOINT

QOSCFG[test1] QFLAGS=PREEMPTEE MEMBERULIST=john PRIORITY=100 
QOSCFG[test2] QFLAGS=PREEMPTOR MEMBERULIST=john PRIORITY=10000
  1. Set GUARANTEEDPREEMPTION to TRUE. (This locks the job on a node and keeps trying to preempt.)
  2. Make sure that JOBNODEMATCHPOLICY is not set to EXACTNDODE.
  3. Set PREEMPTPOLCY to CHECKPOINT.
  4. PREEMPTPOLICY CHECKPOINT

  5. Make sure that the PREEMPTEE job has a lower priority than the PREEMPTOR job.
  6. QOSCFG[test1] QFLAGS=PREEMPTEE MEMBERULIST=john PRIORITY=100
    QOSCFG[test2] QFLAGS=PREEMPTOR MEMBERULIST=john PRIORITY=10000

See Also

Copyright © 2012 Adaptive Computing Enterprises, Inc.®