4.317 Keeping Completed Jobs

Torque provides the ability to report on the status of completed jobs for a configurable duration after the job has completed. This can be enabled by setting the keep_completed attribute on the job execution queue or the keep_completed parameter on the server. This should be set to the number of seconds that jobs should be held in the queue. If you set keep_completed on the job execution queue, completed jobs will be reported in the C state and the exit status is seen in the exit_status job attribute.

If the Mother Superior and Torque server are on the same server, expect the following behavior:

  • When keep_completed is set, the job spool files will be deleted when the specified time arrives and Torque purges the job from memory.
  • When keep_completed is not set, Torque deletes the job spool files upon job completion.
  • If you manually purge a job (qdel -p) before the job completes or time runs out, Torque will never delete the spool files.

By maintaining status information about completed (or canceled, failed, etc.) jobs, administrators can better track failures and improve system performance. This allows Torque to better communicate with Moab Workload Manager and track the status of jobs. This gives Moab the ability to track specific failures and to schedule the workload around possible hazards. See NODEFAILURERESERVETIME in Moab Parameters in the Moab Workload Manager Administrator Guide for more information.

Related Topics 

© 2017 Adaptive Computing