11.3 Debugging
11.3.1 Debugging Facilities
TORQUE supports a number of diagnostic and debug options including the following:
- PBSDEBUG environment variable - If set to 'yes', this variable will prevent pbs_server, pbs_mom, and/or pbs_sched from backgrounding themselves allowing direct launch under a debugger. Also, some client commands will provide additional diagnostic information when this value is set.
- PBSLOGLEVEL environment variable - Can be set to any value between 0 and 7 and specifies the logging verbosity level (default = 0)
- PBSCOREDUMP environment variable - If set, it will cause the offending resource manager daemon to create a core file if a SIGSEGV, SIGILL, SIGFPE, SIGSYS, or SIGTRAP signal is received. The core dump will be placed in the daemon's home directory ($PBSHOME/mom_priv for pbs_mom).
- NDEBUG #define - if set at build time, will cause additional low-level logging information to be output to stdout for pbs_server and pbs_mom daemons.
- tracejob reporting tool - can be used to collect and report logging and accounting information for specific jobs
11.3.2 TORQUE Error Codes
Error Code Name |
Number |
Description |
PBSE_NONE |
15000 |
No error |
PBSE_UNKJOBID |
15001 |
Unknown job identifier |
PBSE_NOATTR |
15002 |
Undefined attribute |
PBSE_ATTRRO |
15003 |
Attempt to set READ ONLY attribute |
PBSE_IVALREQ |
15004 |
Invalid request |
PBSE_UNKREQ |
15005 |
Unknown batch request |
PBSE_TOOMANY |
15006 |
Too many submit retries |
PBSE_PERM |
15007 |
No permission |
PBSE_BADHOST |
15008 |
Access from host not allowed |
PBSE_JOBEXIST |
15009 |
Job already exists |
PBSE_SYSTEM |
15010 |
System error occurred |
PBSE_INTERNAL |
15011 |
Internal server error occurred |
PBSE_REGROUTE |
15012 |
Parent job of dependent in rte queue |
PBSE_UNKSIG |
15013 |
Unknown signal name |
PBSE_BADATVAL |
15014 |
Bad attribute value |
PBSE_MODATRRUN |
15015 |
Cannot modify attribute in run state |
PBSE_BADSTATE |
15016 |
Request invalid for job state |
PBSE_UNKQUE |
15018 |
Unknown queue name |
PBSE_BADCRED |
15019 |
Invalid credential in request |
PBSE_EXPIRED |
15020 |
Expired credential in request |
PBSE_QUNOENB |
15021 |
Queue not enabled |
PBSE_QACESS |
15022 |
No access permission for queue |
PBSE_BADUSER |
15023 |
Bad user - no password entry |
PBSE_HOPCOUNT |
15024 |
Max hop count exceeded |
PBSE_QUEEXIST |
15025 |
Queue already exists |
PBSE_ATTRTYPE |
15026 |
Incompatible queue attribute type |
PBSE_QUEBUSY |
15027 |
Queue busy (not empty) |
PBSE_QUENBIG |
15028 |
Queue name too long |
PBSE_NOSUP |
15029 |
Feature/function not supported |
PBSE_QUENOEN |
15030 |
Cannot enable queue,needs add def |
PBSE_PROTOCOL |
15031 |
Protocol (ASN.1) error |
PBSE_BADATLST |
15032 |
Bad attribute list structure |
PBSE_NOCONNECTS |
15033 |
No free connections |
PBSE_NOSERVER |
15034 |
No server to connect to |
PBSE_UNKRESC |
15035 |
Unknown resource |
PBSE_QUENODFLT |
15036 |
No default queue defined |
PBSE_EXCQRESC |
15037 |
Job exceeds queue resource limits |
PBSE_NORERUN |
15038 |
Job not rerunnable |
PBSE_ROUTEREJ |
15039 |
Route rejected by all destinations |
PBSE_ROUTEEXPD |
15040 |
Time in route queue expired |
PBSE_MOMREJECT |
15041 |
Request to MOM failed |
PBSE_BADSCRIPT |
15042 |
(qsub) Cannot access script file |
PBSE_STAGEIN |
15043 |
Stage-In of files failed |
PBSE_RESCUNAV |
15044 |
Resources temporarily unavailable |
PBSE_BADGRP |
15045 |
Bad group specified |
PBSE_MAXQUED |
15046 |
Max number of jobs in queue |
PBSE_CKPBSY |
15047 |
Checkpoint busy, may be retries |
PBSE_EXLIMIT |
15048 |
Limit exceeds allowable |
PBSE_BADACCT |
15049 |
Bad account attribute value |
PBSE_ALRDYEXIT |
15050 |
Job already in exit state |
PBSE_NOCOPYFILE |
15051 |
Job files not copied |
PBSE_CLEANEDOUT |
15052 |
Unknown job id after clean init |
PBSE_NOSYNCMSTR |
15053 |
No master in sync set |
PBSE_BADDEPEND |
15054 |
Invalid dependency |
PBSE_DUPLIST |
15055 |
Duplicate entry in list |
PBSE_DISPROTO |
15056 |
Bad DIS based request protocol |
PBSE_EXECTHERE |
15057 |
Cannot execute there |
PBSE_SISREJECT |
15058 |
Sister rejected |
PBSE_SISCOMM |
15059 |
Sister could not communicate |
PBSE_SVRDOWN |
15060 |
Requirement rejected -server shutting down |
PBSE_CKPSHORT |
15061 |
Not all tasks could checkpoint |
PBSE_UNKNODE |
15062 |
Named node is not in the list |
PBSE_UNKNODEATR |
15063 |
Node-attribute not recognized |
PBSE_NONODES |
15064 |
Server has no node list |
PBSE_NODENBIG |
15065 |
Node name is too big |
PBSE_NODEEXIST |
15066 |
Node name already exists |
PBSE_BADNDATVAL |
15067 |
Bad node-attribute value |
PBSE_MUTUALEX |
15068 |
State values are mutually exclusive |
PBSE_GMODERR |
15069 |
Error(s) during global modification of nodes |
PBSE_NORELYMOM |
15070 |
Could not contact Mom |
PBSE_NOTSNODE |
15071 |
No time-shared nodes |
See Also