TORQUE Resource Manager
11.3 Debugging

11.3 Debugging

11.3.1 Debugging Facilities

TORQUE supports a number of diagnostic and debug options including the following:
  • PBSDEBUG environment variable - If set to 'yes', this variable will prevent pbs_server, pbs_mom, and/or pbs_sched from backgrounding themselves allowing direct launch under a debugger.  Also, some client commands will provide additional diagnostic information when this value is set.
  • PBSLOGLEVEL environment variable - Can be set to any value between 0 and 7 and specifies the logging verbosity level (default = 0)
  • PBSCOREDUMP environment variable - If set, it will cause the offending resource manager daemon to create a core file if a SIGSEGV, SIGILL, SIGFPE, SIGSYS, or SIGTRAP signal is received.  The core dump will be placed in the daemon's home directory ($PBSHOME/mom_priv for pbs_mom).
  • NDEBUG #define - if set at build time, will cause additional low-level logging information to be output to stdout for pbs_server and pbs_mom daemons.
  • tracejob reporting tool - can be used to collect and report logging and accounting information for specific jobs

11.3.2 TORQUE Error Codes

Error Code Name Number Description
15000 No error
15001 Unknown job identifier
15002 Undefined attribute
15003 Attempt to set READ ONLY attribute
15004 Invalid request
15005 Unknown batch request
15006 Too many submit retries
15007 No permission
15008 Access from host not allowed
15009 Job already exists
15010 System error occurred
15011 Internal server error occurred
15012 Parent job of dependent in rte queue
15013 Unknown signal name
15014 Bad attribute value
15015 Cannot modify attribute in run state
15016 Request invalid for job state
15018 Unknown queue name
15019 Invalid credential in request
15020 Expired credential in request
15021 Queue not enabled
15022 No access permission for queue
15023 Bad user - no password entry
15024 Max hop count exceeded
15025 Queue already exists
15026 Incompatible queue attribute type
15027 Queue busy (not empty)
15028 Queue name too long
15029 Feature/function not supported
15030 Cannot enable queue,needs add def
15031 Protocol (ASN.1) error
15032 Bad attribute list structure
15033 No free connections
15034 No server to connect to
15035 Unknown resource
15036 No default queue defined
15037 Job exceeds queue resource limits
15038 Job not rerunnable
15039 Route rejected by all destinations
15040 Time in route queue expired
15041 Request to MOM failed
15042 (qsub) Cannot access script file
15043 Stage-In of files failed
15044 Resources temporarily unavailable
15045 Bad group specified
15046 Max number of jobs in queue
15047 Checkpoint busy, may be retries
15048 Limit exceeds allowable
15049 Bad account attribute value
15050 Job already in exit state
15051 Job files not copied
15052 Unknown job id after clean init
15053 No master in sync set
15054 Invalid dependency
15055 Duplicate entry in list
15056 Bad DIS based request protocol
15057 Cannot execute there
15058 Sister rejected
15059 Sister could not communicate
15060 Requirement rejected -server shutting down
15061 Not all tasks could checkpoint
15062 Named node is not in the list
15063 Node-attribute not recognized
15064 Server has no node list
15065 Node name is too big
15066 Node name already exists
15067 Bad node-attribute value
15068 State values are mutually exclusive
15069 Error(s) during global modification of nodes
15070 Could not contact Mom
15071 No time-shared nodes

See Also