TORQUE Resource Manager
Appendix D: Error Codes and Diagnostics

Appendix D: Diagnostics and Error Codes

D.1 TORQUE Diagnostics

TORQUE has a diagnostic script to assist you in giving TORQUE Support the files they need to support issues. It should be run by a user that has access to run all TORQUE commands and access to all TORQUE directories (this is usually root).

The script (contrib/diag/tdiag.sh) is available in TORQUE 2.3.8, TORQUE 2.4.3, and later. The script grabs the nodefile, server and MOM logfiles, and captures the output of qmgr -c 'p s'. These are put in a tarfile.

The script also has the following options (this can be shown in the command line by entering ./tdiag.sh -h):

USAGE: ./torque_diag [-d DATE] [-h] [-o OUTPUT_FILE] [-t TORQUE_HOME]

DATE should be in the format YYYYmmdd. For example, 20091130 would be the date for November 30th, 2009. If no date is specified, today's date is used. OUTPUT_FILE is the optional name of the output file. The default output file is torque_diag<today's_date>.tar.gz. TORQUE_HOME should be the path to your TORQUE directory. If no directory is specified, /var/spool/torque is the default.

D.2 TORQUE Error Codes

Error Code Name Number Description
15000 No error
15001 Unknown job identifier
15002 Undefined attribute
15003 Attempt to set READ ONLY attribute
15004 Invalid request
15005 Unknown batch request
15006 Too many submit retries
15007 No permission
15008 Access from host not allowed
15009 Job already exists
15010 System error occurred
15011 Internal server error occurred
15012 Parent job of dependent in rte queue
15013 Unknown signal name
15014 Bad attribute value
15015 Cannot modify attribute in run state
15016 Request invalid for job state
15018 Unknown queue name
15019 Invalid credential in request
15020 Expired credential in request
15021 Queue not enabled
15022 No access permission for queue
15023 Bad user - no password entry
15024 Max hop count exceeded
15025 Queue already exists
15026 Incompatible queue attribute type
15027 Queue busy (not empty)
15028 Queue name too long
15029 Feature/function not supported
15030 Cannot enable queue, needs add def
15031 Protocol (ASN.1) error
15032 Bad attribute list structure
15033 No free connections
15034 No server to connect to
15035 Unknown resource
15036 Job exceeds queue resource limits
15037 No default queue defined
15038 Job not rerunnable
15039 Route rejected by all destinations
15040 Time in route queue expired
15041 Request to the MOM failed
15042 (qsub) cannot access script file
15043 Stage In of files failed
15044 Resources temporarily unavailable
15045 Bad group specified
15046 Max number of jobs in queue
15047 Checkpoint busy, may be retries
15048 Limit exceeds allowable
15049 Bad account attribute value
15050 Job already in exit state
15051 Job files not copied
15052 Unknown job id after clean init
15053 No master in Sync Set
15054 Invalid dependency
15055 Duplicate entry in List
15056 Bad DIS based request protocol
15057 Cannot execute there
15058 Sister rejected
15059 Sister could not communicate
15060 Requirement rejected -server shutting down
15061 Not all tasks could checkpoint
15062 Named node is not in the list
15063 Node-attribute not recognized
15064 Server has no node list
15065 Node name is too big
15066 Node name already exists
15067 Bad node-attribute value
15068 State values are mutually exclusive
15069 Error(s) during global modification of nodes
15070 Could not contact the MOM
15071 No time-shared nodes