TORQUE Resource Manager > Appendices > Appendix D: Diagnostics and Error Codes

Appendix D: Diagnostics and Error Codes

TORQUE has a diagnostic script to assist you in giving TORQUE Support the files they need to support issues. It should be run by a user that has access to run all TORQUE commands and access to all TORQUE directories (this is usually root).

The script (contrib/diag/tdiag.sh) is available in TORQUE 2.3.8, TORQUE 2.4.3, and later. The script grabs the node file, server and MOM log files, and captures the output of qmgr -c 'p s'. These are put in a tar file.

The script also has the following options (this can be shown in the command line by entering ./tdiag.sh -h):

USAGE: ./torque_diag [-d DATE] [-h] [-o OUTPUT_FILE] [-t TORQUE_HOME]

Table 4-5: TORQUE error codes

Error code name Number Description
PBSE_FLOOR 15000 No error
PBSE_UNKJOBID 15001 Unknown job ID error
PBSE_NOATTR 15002 Undefined attribute
PBSE_ATTRRO 15003 Cannot set attribute, read only or insufficient permission
PBSE_IVALREQ 15004 Invalid request
PBSE_UNKREQ 15005 Unknown request
PBSE_TOOMANY 15006 Too many submit retries
PBSE_PERM 15007 Unauthorized Request
PBSE_IFF_NOT_FOUND 15008 trqauthd unable to authenticate
PBSE_MUNGE_NOT_FOUND 15009 Munge executable not found, unable to authenticate
PBSE_BADHOST 15010 Access from host not allowed, or unknown host
PBSE_JOBEXIST 15011 Job with requested ID already exists
PBSE_SYSTEM 15012 System error
PBSE_INTERNAL 15013 PBS server internal error
PBSE_REGROUTE 15014 Dependent parent job currently in routing queue
PBSE_UNKSIG 15015 Unknown/illegal signal name
PBSE_BADATVAL 15016 Illegal attribute or resource value for
PBSE_MODATRRUN 15017 Cannot modify attribute while job running
PBSE_BADSTATE 15018 Request invalid for state of job
PBSE_UNKQUE 15020 Unknown queue
PBSE_BADCRED 15021 Invalid credential
PBSE_EXPIRED 15022 Expired credential
PBSE_QUNOENB 15023 Queue is not enabled
PBSE_QACESS 15024 Access to queue is denied
PBSE_BADUSER 15025 Bad UID for job execution
PBSE_HOPCOUNT 15026 Job routing over too many hops
PBSE_QUEEXIST 15027 Queue already exists
PBSE_ATTRTYPE 15028 Incompatible type
PBSE_QUEBUSY 15029 Cannot delete busy queue
PBSE_QUENBIG 15030 Queue name too long
PBSE_NOSUP 15031 No support for requested service
PBSE_QUENOEN 15032 Cannot enable queue, incomplete definition
PBSE_PROTOCOL 15033 Batch protocol error
PBSE_BADATLST 15034 Bad attribute list structure
PBSE_NOCONNECTS 15035 No free connections
PBSE_NOSERVER 15036 No server specified
PBSE_UNKRESC 15037 Unknown resource type
PBSE_EXCQRESC 15038 Job exceeds queue resource limits
PBSE_QUENODFLT 15039 No default queue specified
PBSE_NORERUN 15040 Job is not rerunnable
PBSE_ROUTEREJ 15041 Job rejected by all possible destinations (check syntax, queue resources, …)
PBSE_ROUTEEXPD 15042 Time in Route Queue Expired
PBSE_MOMREJECT 15043 Execution server rejected request
PBSE_BADSCRIPT 15044 (qsub) cannot access script file
PBSE_STAGEIN 15045 Stage-in of files failed
PBSE_RESCUNAV 15046 Resource temporarily unavailable
PBSE_BADGRP 15047 Bad GID for job execution
PBSE_MAXQUED 15048 Maximum number of jobs already in queue
PBSE_CKPBSY 15049 Checkpoint busy, may retry
PBSE_EXLIMIT 15050 Resource limit exceeds allowable
PBSE_BADACCT 15051 Invalid Account
PBSE_ALRDYEXIT 15052 Job already in exit state
PBSE_NOCOPYFILE 15053 Job files not copied
PBSE_CLEANEDOUT 15054 Unknown job id after clean init
PBSE_NOSYNCMSTR 15055 No master found for sync job set
PBSE_BADDEPEND 15056 Invalid Job Dependency
PBSE_DUPLIST 15057 Duplicate entry in list
PBSE_DISPROTO 15058 Bad DIS based Request Protocol
PBSE_EXECTHERE 15059 Cannot execute at specified host because of checkpoint or stagein files
PBSE_SISREJECT 15060 Sister rejected
PBSE_SISCOMM 15061 Sister could not communicate
PBSE_SVRDOWN 15062 Request not allowed: Server shutting down
PBSE_CKPSHORT 15063 Not all tasks could checkpoint
PBSE_UNKNODE 15064 Unknown node
PBSE_UNKNODEATR 15065 Unknown node-attribute
PBSE_NONODES 15066 Server has no node list
PBSE_NODENBIG 15067 Node name is too big
PBSE_NODEEXIST 15068 Node name already exists
PBSE_BADNDATVAL 15069 Illegal value for
PBSE_MUTUALEX 15070 Mutually exclusive values for
PBSE_GMODERR 15071 Modification failed for
PBSE_NORELYMOM 15072 Server could not connect to MOM
PBSE_NOTSNODE 15073 No time-share node available
PBSE_JOBTYPE 15074 Wrong job type
PBSE_BADACLHOST 15075 Bad ACL entry in host list
PBSE_MAXUSERQUED 15076 Maximum number of jobs already in queue for user
PBSE_BADDISALLOWTYPE 15077 Bad type in disallowed_types list
PBSE_NOINTERACTIVE 15078 Queue does not allow interactive jobs
PBSE_NOBATCH 15079 Queue does not allow batch jobs
PBSE_NORERUNABLE 15080 Queue does not allow rerunable jobs
PBSE_NONONRERUNABLE 15081 Queue does not allow nonrerunable jobs
PBSE_UNKARRAYID 15082 Unknown Array ID
PBSE_BAD_ARRAY_REQ 15083 Bad Job Array Request
PBSE_BAD_ARRAY_DATA 15084 Bad data reading job array from file
PBSE_TIMEOUT 15085 Time out
PBSE_JOBNOTFOUND 15086 Job not found
PBSE_NOFAULTTOLERANT 15087 Queue does not allow fault tolerant jobs
PBSE_NOFAULTINTOLERANT 15088 Queue does not allow fault intolerant jobs
PBSE_NOJOBARRAYS 15089 Queue does not allow job arrays
PBSE_RELAYED_TO_MOM 15090 Request was relayed to a MOM
PBSE_MEM_MALLOC 15091 Error allocating memory - out of memory
PBSE_MUTEX 15092 Error allocating controling mutex (lock/unlock)
PBSE_THREADATTR 15093 Error setting thread attributes
PBSE_THREAD 15094 Error creating thread
PBSE_SELECT 15095 Error in socket select
PBSE_SOCKET_FAULT 15096 Unable to get connection to socket
PBSE_SOCKET_WRITE 15097 Error writing data to socket
PBSE_SOCKET_READ 15098 Error reading data from socket
PBSE_SOCKET_CLOSE 15099 Socket close detected
PBSE_SOCKET_LISTEN 15100 Error listening on socket
PBSE_AUTH_INVALID 15101 Invalid auth type in request
PBSE_NOT_IMPLEMENTED 15102 This functionality is not yet implemented
PBSE_QUENOTAVAILABLE 15103 Queue is currently not available
PBSE_TMPDIFFOWNER 15104 tmpdir owned by another user
PBSE_TMPNOTDIR 15105 tmpdir exists but is not a directory
PBSE_TMPNONAME 15106 tmpdir cannot be named for job
PBSE_CANTOPENSOCKET 15107 Cannot open demux sockets
PBSE_CANTCONTACTSISTERS 15108 Cannot send join job to all sisters
PBSE_CANTCREATETMPDIR 15109 Cannot create tmpdir for job
PBSE_BADMOMSTATE 15110 Mom is down, cannot run job
PBSE_SOCKET_INFORMATION 15111 Socket information is not accessible
PBSE_SOCKET_DATA 15112 Data on socket does not process correctly
PBSE_CLIENT_INVALID 15113 Client is not allowed/trusted
PBSE_PREMATURE_EOF 15114 Premature End of File
PBSE_CAN_NOT_SAVE_FILE 15115 Error saving file
PBSE_CAN_NOT_OPEN_FILE 15116 Error opening file
PBSE_CAN_NOT_WRITE_FILE 15117 Error writing file
PBSE_JOB_FILE_CORRUPT 15118 Job file corrupt
PBSE_JOB_RERUN 15119 Job can not be rerun
PBSE_CONNECT 15120 Can not establish connection
PBSE_JOBWORKDELAY 15121 Job function must be temporarily delayed
PBSE_BAD_PARAMETER 15122 Parameter of function was invalid
PBSE_CONTINUE 15123 Continue processing on job. (Not an error)
PBSE_JOBSUBSTATE 15124 Current sub state does not allow trasaction.
PBSE_CAN_NOT_MOVE_FILE 15125 Error moving file
PBSE_JOB_RECYCLED 15126 Job is being recycled
PBSE_JOB_ALREADY_IN_QUEUE 15127 Job is already in destination queue.
PBSE_INVALID_MUTEX 15128 Mutex is NULL or otherwise invalid
PBSE_MUTEX_ALREADY_LOCKED 15129 The mutex is already locked by this object
PBSE_MUTEX_ALREADY_UNLOCKED 15130 The mutex has already been unlocked by this object
PBSE_INVALID_SYNTAX 15131 Command syntax invalid
PBSE_NODE_DOWN 15132 A node is down. Check the MOM and host
PBSE_SERVER_NOT_FOUND 15133 Could not connect to batch server
PBSE_SERVER_BUSY 15134 Server busy. Currently no available threads

© 2015 Adaptive Computing