Introduction
This test determines if the job can be restarted from a previous checkpoint image.
Test steps
Start the job with the option -c enabled,periodic,interval=1 and look in the checkpoint directory for checkpoint images to be generated about every minute. Do a qhold on the job to stop it. Change the attribute checkpoint_name with the qalter command. Then do a qrls to restart the job.
> qsub -c enabled,periodic,interval=1 test.sh 999.xxx.yyy > qhold 999 > qalter -W checkpoint_name=ckpt.999.xxx.yyy.1234567 > qrls 999 |
Successful results
The job output file should be truncated back and the count should resume at an earlier number.
Related topics
© 2012 Adaptive Computing