TORQUE has a diagnostic script to assist you in giving TORQUE Support the files they need to support issues. It should be run by a user that has access to run all TORQUE commands and access to all TORQUE directories (this is usually root).
The script (contrib/diag/tdiag.sh) is available in TORQUE 2.3.8, TORQUE 2.4.3, and later. The script grabs the nodefile, server and MOM logfiles, and captures the output of qmgr -c 'p s'. These are put in a tarfile.
The script also has the following options (this can be shown in the command line by entering ./tdiag.sh -h):
USAGE: ./torque_diag [-d DATE] [-h] [-o OUTPUT_FILE] [-t TORQUE_HOME]
Table D-1: TORQUE error codes
| Error code name | Number | Description | 
|---|---|---|
| PBSE_NONE | 15000 | No error | 
| PBSE_UNKJOBID | 15001 | Unknown job identifier | 
| PBSE_NOATTR | 15002 | Undefined attribute | 
| PBSE_ATTRRO | 15003 | Attempt to set READ ONLY attribute | 
| PBSE_IVALREQ | 15004 | Invalid request | 
| PBSE_UNKREQ | 15005 | Unknown batch request | 
| PBSE_TOOMANY | 15006 | Too many submit retries | 
| PBSE_PERM | 15007 | No permission | 
| PBSE_BADHOST | 15008 | Access from host not allowed | 
| PBSE_JOBEXIST | 15009 | Job already exists | 
| PBSE_SYSTEM | 15010 | System error occurred | 
| PBSE_INTERNAL | 15011 | Internal server error occurred | 
| PBSE_REGROUTE | 15012 | Parent job of dependent in rte queue | 
| PBSE_UNKSIG | 15013 | Unknown signal name | 
| PBSE_BADATVAL | 15014 | Bad attribute value | 
| PBSE_MODATRRUN | 15015 | Cannot modify attribute in run state | 
| PBSE_BADSTATE | 15016 | Request invalid for job state | 
| PBSE_UNKQUE | 15018 | Unknown queue name | 
| PBSE_BADCRED | 15019 | Invalid credential in request | 
| PBSE_EXPIRED | 15020 | Expired credential in request | 
| PBSE_QUNOENB | 15021 | Queue not enabled | 
| PBSE_QACESS | 15022 | No access permission for queue | 
| PBSE_BADUSER | 15023 | Bad user - no password entry | 
| PBSE_HOPCOUNT | 15024 | Max hop count exceeded | 
| PBSE_QUEEXIST | 15025 | Queue already exists | 
| PBSE_ATTRTYPE | 15026 | Incompatible queue attribute type | 
| PBSE_QUEBUSY | 15027 | Queue busy (not empty) | 
| PBSE_QUENBIG | 15028 | Queue name too long | 
| PBSE_NOSUP | 15029 | Feature/function not supported | 
| PBSE_QUENOEN | 15030 | Cannot enable queue, needs add def | 
| PBSE_PROTOCOL | 15031 | Protocol (ASN.1) error | 
| PBSE_BADATLST | 15032 | Bad attribute list structure | 
| PBSE_NOCONNECTS | 15033 | No free connections | 
| PBSE_NOSERVER | 15034 | No server to connect to | 
| PBSE_UNKRESC | 15035 | Unknown resource | 
| PBSE_EXCQRESC | 15036 | Job exceeds queue resource limits | 
| PBSE_QUENODFLT | 15037 | No default queue defined | 
| PBSE_NORERUN | 15038 | Job not rerunnable | 
| PBSE_ROUTEREJ | 15039 | Route rejected by all destinations | 
| PBSE_ROUTEEXPD | 15040 | Time in route queue expired | 
| PBSE_MOMREJECT | 15041 | Request to the MOM failed | 
| PBSE_BADSCRIPT | 15042 | (qsub) cannot access script file | 
| PBSE_STAGEIN | 15043 | Stage In of files failed | 
| PBSE_RESCUNAV | 15044 | Resources temporarily unavailable | 
| PBSE_BADGRP | 15045 | Bad group specified | 
| PBSE_MAXQUED | 15046 | Max number of jobs in queue | 
| PBSE_CKPBSY | 15047 | Checkpoint busy, may be retries | 
| PBSE_EXLIMIT | 15048 | Limit exceeds allowable | 
| PBSE_BADACCT | 15049 | Bad account attribute value | 
| PBSE_ALRDYEXIT | 15050 | Job already in exit state | 
| PBSE_NOCOPYFILE | 15051 | Job files not copied | 
| PBSE_CLEANEDOUT | 15052 | Unknown job id after clean init | 
| PBSE_NOSYNCMSTR | 15053 | No master in Sync Set | 
| PBSE_BADDEPEND | 15054 | Invalid dependency | 
| PBSE_DUPLIST | 15055 | Duplicate entry in List | 
| PBSE_DISPROTO | 15056 | Bad DIS based request protocol | 
| PBSE_EXECTHERE | 15057 | Cannot execute there | 
| PBSE_SISREJECT | 15058 | Sister rejected | 
| PBSE_SISCOMM | 15059 | Sister could not communicate | 
| PBSE_SVRDOWN | 15060 | Requirement rejected -server shutting down | 
| PBSE_CKPSHORT | 15061 | Not all tasks could checkpoint | 
| PBSE_UNKNODE | 15062 | Named node is not in the list | 
| PBSE_UNKNODEATR | 15063 | Node-attribute not recognized | 
| PBSE_NONODES | 15064 | Server has no node list | 
| PBSE_NODENBIG | 15065 | Node name is too big | 
| PBSE_NODEEXIST | 15066 | Node name already exists | 
| PBSE_BADNDATVAL | 15067 | Bad node-attribute value | 
| PBSE_MUTUALEX | 15068 | State values are mutually exclusive | 
| PBSE_GMODERR | 15069 | Error(s) during global modification of nodes | 
| PBSE_NORELYMOM | 15070 | Could not contact the MOM | 
| PBSE_NOTSNODE | 15071 | No time-shared nodes |