(Click to open topic with navigation)
(Rerun a batch job)
Synopsis
qrerun [{-f}] <JOBID>[ <JOBID>] ...
Description
The qrerun command directs that the specified jobs are to be rerun if possible. To rerun a job is to terminate the session leader of the job and return the job to the queued state in the execution queue in which the job currently resides.
If a job is marked as not rerunable then the rerun request will fail for that job. If the mini-server running the job is down, or it rejects the request, the Rerun Job batch request will return a failure unless -f is used.
Using -f violates IEEE Batch Processing Services Standard and should be handled with great care. It should only be used under exceptional circumstances. The best practice is to fix the problem mini-server host and let qrerun run normally. The nodes may need manual cleaning (see the -r option on the qsub and qalter commands).
Options
Option | Description |
---|---|
-f | Force a rerun on a job |
qrerun -f 15406
The qrerun all command is meant to be run if all of the compute nodes go down. If the machines have actually crashed, then we know that all of the jobs need to be restarted. The behavior if you don't run this would depend on how you bring up the pbs_mom daemons, but by default would be to cancel all of the jobs.
Running the command makes it so that all jobs are requeued without attempting to contact the moms on which they should be running.
Operands
The qrerun command accepts one or more job_identifier operands of the form:
sequence_number[.server_name][@server]
Standard error
The qrerun command will write a diagnostic message to standard error for each error occurrence.
Exit status
Upon successful processing of all the operands presented to the qrerun command, the exit status will be a value of zero.
If the qrerun command fails to process any operand, the command exits with a value greater than zero.
Examples
> qrerun 3233
(Job 3233 will be re-run.)
Related Topics
Non-Adaptive Computing topics