(Click to open topic with navigation)
TORQUE has a diagnostic script to assist you in giving TORQUE Support the files they need to support issues. It should be run by a user that has access to run all TORQUE commands and access to all TORQUE directories (this is usually root).
The script (contrib/diag/tdiag.sh) is available in TORQUE 2.3.8, TORQUE 2.4.3, and later. The script grabs the node file, server and MOM log files, and captures the output of qmgr -c 'p s'. These are put in a tar file.
The script also has the following options (this can be shown in the command line by entering ./tdiag.sh -h):
USAGE: ./torque_diag [-d DATE] [-h] [-o OUTPUT_FILE] [-t TORQUE_HOME]
Table D-1: TORQUE error codes
| Error code name | Number | Description | 
|---|---|---|
| PBSE_FLOOR | 15000 | No error | 
| PBSE_UNKJOBID | 15001 | Unknown job ID error | 
| PBSE_NOATTR | 15002 | Undefined attribute | 
| PBSE_ATTRRO | 15003 | Cannot set attribute, read only or insufficient permission | 
| PBSE_IVALREQ | 15004 | Invalid request | 
| PBSE_UNKREQ | 15005 | Unknown request | 
| PBSE_TOOMANY | 15006 | Too many submit retries | 
| PBSE_PERM | 15007 | Unauthorized Request | 
| PBSE_IFF_NOT_FOUND | 15008 | trqauthd unable to authenticate | 
| PBSE_MUNGE_NOT_FOUND | 15009 | Munge executable not found, unable to authenticate | 
| PBSE_BADHOST | 15010 | Access from host not allowed, or unknown host | 
| PBSE_JOBEXIST | 15011 | Job with requested ID already exists | 
| PBSE_SYSTEM | 15012 | System error | 
| PBSE_INTERNAL | 15013 | PBS server internal error | 
| PBSE_REGROUTE | 15014 | Dependent parent job currently in routing queue | 
| PBSE_UNKSIG | 15015 | Unknown/illegal signal name | 
| PBSE_BADATVAL | 15016 | Illegal attribute or resource value for | 
| PBSE_MODATRRUN | 15017 | Cannot modify attribute while job running | 
| PBSE_BADSTATE | 15018 | Request invalid for state of job | 
| PBSE_UNKQUE | 15020 | Unknown queue | 
| PBSE_BADCRED | 15021 | Invalid credential | 
| PBSE_EXPIRED | 15022 | Expired credential | 
| PBSE_QUNOENB | 15023 | Queue is not enabled | 
| PBSE_QACESS | 15024 | Access to queue is denied | 
| PBSE_BADUSER | 15025 | Bad UID for job execution | 
| PBSE_HOPCOUNT | 15026 | Job routing over too many hops | 
| PBSE_QUEEXIST | 15027 | Queue already exists | 
| PBSE_ATTRTYPE | 15028 | Incompatible type | 
| PBSE_QUEBUSY | 15029 | Cannot delete busy queue | 
| PBSE_QUENBIG | 15030 | Queue name too long | 
| PBSE_NOSUP | 15031 | No support for requested service | 
| PBSE_QUENOEN | 15032 | Cannot enable queue, incomplete definition | 
| PBSE_PROTOCOL | 15033 | Batch protocol error | 
| PBSE_BADATLST | 15034 | Bad attribute list structure | 
| PBSE_NOCONNECTS | 15035 | No free connections | 
| PBSE_NOSERVER | 15036 | No server specified | 
| PBSE_UNKRESC | 15037 | Unknown resource type | 
| PBSE_EXCQRESC | 15038 | Job exceeds queue resource limits | 
| PBSE_QUENODFLT | 15039 | No default queue specified | 
| PBSE_NORERUN | 15040 | Job is not rerunnable | 
| PBSE_ROUTEREJ | 15041 | Job rejected by all possible destinations (check syntax, queue resources, …) | 
| PBSE_ROUTEEXPD | 15042 | Time in Route Queue Expired | 
| PBSE_MOMREJECT | 15043 | Execution server rejected request | 
| PBSE_BADSCRIPT | 15044 | (qsub) cannot access script file | 
| PBSE_STAGEIN | 15045 | Stage-in of files failed | 
| PBSE_RESCUNAV | 15046 | Resource temporarily unavailable | 
| PBSE_BADGRP | 15047 | Bad GID for job execution | 
| PBSE_MAXQUED | 15048 | Maximum number of jobs already in queue | 
| PBSE_CKPBSY | 15049 | Checkpoint busy, may retry | 
| PBSE_EXLIMIT | 15050 | Resource limit exceeds allowable | 
| PBSE_BADACCT | 15051 | Invalid Account | 
| PBSE_ALRDYEXIT | 15052 | Job already in exit state | 
| PBSE_NOCOPYFILE | 15053 | Job files not copied | 
| PBSE_CLEANEDOUT | 15054 | Unknown job id after clean init | 
| PBSE_NOSYNCMSTR | 15055 | No master found for sync job set | 
| PBSE_BADDEPEND | 15056 | Invalid Job Dependency | 
| PBSE_DUPLIST | 15057 | Duplicate entry in list | 
| PBSE_DISPROTO | 15058 | Bad DIS based Request Protocol | 
| PBSE_EXECTHERE | 15059 | Cannot execute at specified host because of checkpoint or stagein files | 
| PBSE_SISREJECT | 15060 | Sister rejected | 
| PBSE_SISCOMM | 15061 | Sister could not communicate | 
| PBSE_SVRDOWN | 15062 | Request not allowed: Server shutting down | 
| PBSE_CKPSHORT | 15063 | Not all tasks could checkpoint | 
| PBSE_UNKNODE | 15064 | Unknown node | 
| PBSE_UNKNODEATR | 15065 | Unknown node-attribute | 
| PBSE_NONODES | 15066 | Server has no node list | 
| PBSE_NODENBIG | 15067 | Node name is too big | 
| PBSE_NODEEXIST | 15068 | Node name already exists | 
| PBSE_BADNDATVAL | 15069 | Illegal value for | 
| PBSE_MUTUALEX | 15070 | Mutually exclusive values for | 
| PBSE_GMODERR | 15071 | Modification failed for | 
| PBSE_NORELYMOM | 15072 | Server could not connect to MOM | 
| PBSE_NOTSNODE | 15073 | No time-share node available | 
| PBSE_JOBTYPE | 15074 | Wrong job type | 
| PBSE_BADACLHOST | 15075 | Bad ACL entry in host list | 
| PBSE_MAXUSERQUED | 15076 | Maximum number of jobs already in queue for user | 
| PBSE_BADDISALLOWTYPE | 15077 | Bad type in disallowed_types list | 
| PBSE_NOINTERACTIVE | 15078 | Queue does not allow interactive jobs | 
| PBSE_NOBATCH | 15079 | Queue does not allow batch jobs | 
| PBSE_NORERUNABLE | 15080 | Queue does not allow rerunable jobs | 
| PBSE_NONONRERUNABLE | 15081 | Queue does not allow nonrerunable jobs | 
| PBSE_UNKARRAYID | 15082 | Unknown Array ID | 
| PBSE_BAD_ARRAY_REQ | 15083 | Bad Job Array Request | 
| PBSE_BAD_ARRAY_DATA | 15084 | Bad data reading job array from file | 
| PBSE_TIMEOUT | 15085 | Time out | 
| PBSE_JOBNOTFOUND | 15086 | Job not found | 
| PBSE_NOFAULTTOLERANT | 15087 | Queue does not allow fault tolerant jobs | 
| PBSE_NOFAULTINTOLERANT | 15088 | Queue does not allow fault intolerant jobs | 
| PBSE_NOJOBARRAYS | 15089 | Queue does not allow job arrays | 
| PBSE_RELAYED_TO_MOM | 15090 | Request was relayed to a MOM | 
| PBSE_MEM_MALLOC | 15091 | Error allocating memory - out of memory | 
| PBSE_MUTEX | 15092 | Error allocating controling mutex (lock/unlock) | 
| PBSE_THREADATTR | 15093 | Error setting thread attributes | 
| PBSE_THREAD | 15094 | Error creating thread | 
| PBSE_SELECT | 15095 | Error in socket select | 
| PBSE_SOCKET_FAULT | 15096 | Unable to get connection to socket | 
| PBSE_SOCKET_WRITE | 15097 | Error writing data to socket | 
| PBSE_SOCKET_READ | 15098 | Error reading data from socket | 
| PBSE_SOCKET_CLOSE | 15099 | Socket close detected | 
| PBSE_SOCKET_LISTEN | 15100 | Error listening on socket | 
| PBSE_AUTH_INVALID | 15101 | Invalid auth type in request | 
| PBSE_NOT_IMPLEMENTED | 15102 | This functionality is not yet implemented | 
| PBSE_QUENOTAVAILABLE | 15103 | Queue is currently not available | 
| PBSE_TMPDIFFOWNER | 15104 | tmpdir owned by another user | 
| PBSE_TMPNOTDIR | 15105 | tmpdir exists but is not a directory | 
| PBSE_TMPNONAME | 15106 | tmpdir cannot be named for job | 
| PBSE_CANTOPENSOCKET | 15107 | Cannot open demux sockets | 
| PBSE_CANTCONTACTSISTERS | 15108 | Cannot send join job to all sisters | 
| PBSE_CANTCREATETMPDIR | 15109 | Cannot create tmpdir for job | 
| PBSE_BADMOMSTATE | 15110 | Mom is down, cannot run job | 
| PBSE_SOCKET_INFORMATION | 15111 | Socket information is not accessible | 
| PBSE_SOCKET_DATA | 15112 | Data on socket does not process correctly | 
| PBSE_CLIENT_INVALID | 15113 | Client is not allowed/trusted | 
| PBSE_PREMATURE_EOF | 15114 | Premature End of File | 
| PBSE_CAN_NOT_SAVE_FILE | 15115 | Error saving file | 
| PBSE_CAN_NOT_OPEN_FILE | 15116 | Error opening file | 
| PBSE_CAN_NOT_WRITE_FILE | 15117 | Error writing file | 
| PBSE_JOB_FILE_CORRUPT | 15118 | Job file corrupt | 
| PBSE_JOB_RERUN | 15119 | Job can not be rerun | 
| PBSE_CONNECT | 15120 | Can not establish connection | 
| PBSE_JOBWORKDELAY | 15121 | Job function must be temporarily delayed | 
| PBSE_BAD_PARAMETER | 15122 | Parameter of function was invalid | 
| PBSE_CONTINUE | 15123 | Continue processing on job. (Not an error) | 
| PBSE_JOBSUBSTATE | 15124 | Current sub state does not allow trasaction. | 
| PBSE_CAN_NOT_MOVE_FILE | 15125 | Error moving file | 
| PBSE_JOB_RECYCLED | 15126 | Job is being recycled | 
| PBSE_JOB_ALREADY_IN_QUEUE | 15127 | Job is already in destination queue. | 
| PBSE_INVALID_MUTEX | 15128 | Mutex is NULL or otherwise invalid | 
| PBSE_MUTEX_ALREADY_LOCKED | 15129 | The mutex is already locked by this object | 
| PBSE_MUTEX_ALREADY_UNLOCKED | 15130 | The mutex has already been unlocked by this object | 
| PBSE_INVALID_SYNTAX | 15131 | Command syntax invalid | 
| PBSE_NODE_DOWN | 15132 | A node is down. Check the MOM and host | 
| PBSE_SERVER_NOT_FOUND | 15133 | Could not connect to batch server | 
| PBSE_SERVER_BUSY | 15134 | Server busy. Currently no available threads |