5.670 Job Preemption

Torque supports job preemption by allowing authorized users to suspend and resume jobs. This is supported using one of two methods. If the node supports OS-level preemption, Torque will recognize that during the configure process and enable it. Otherwise, the MOM may be configured to launch a custom checkpoint script in order to support preempting a job. Using a custom checkpoint script requires that the job understand how to resume itself from a checkpoint after the preemption occurs.

5.670.1 Configuring a Checkpoint Script on a MOM

To configure the MOM to support a checkpoint script, the $checkpoint_script parameter must be set in the MOM's configuration file found in TORQUE_HOME/mom_priv/config. The checkpoint script should have execute permissions set. A typical configuration file might look as follows:

mom_priv/config:

$pbsserver          node06

$logevent           255

$restricted          *.mycluster.org

$checkpoint_script  /opt/moab/tools/mom-checkpoint.sh

The second thing that must be done to enable the checkpoint script is to change the value of MOM_CHECKPOINT to 1 in /src/include/pbs_config.h. (In some instances, MOM_CHECKPOINT may already be defined as 1.) The new line should be as follows:

/src/include/pbs_config.h:

#define MOM_CHECKPOINT 1

Related Topics 

© 2016 Adaptive Computing