If a job gets stuck in TORQUE, try these suggestions to resolve the issue:
> qsig -s 0 <JOBID> |
> momctl -c 58925 -h compute-5-20 |
> qmgr -c "set server mom_job_sync = True" |
To check and see if this is already set, use:
> qmgr -c "p s" |
> qdel -p <JOBID> |
For additional troubleshooting, run a tracejob on one of the stuck jobs. You can then create an online support ticket with the full server log for the time period displayed in the trace job.
Related topics
© 2012 Adaptive Computing