11.3 Debugging
11.3.1 Debugging Facilities
TORQUE supports a number of diagnostic and debug options including the following:
- PBSDEBUG environment variable - If set to 'yes', this variable will prevent pbs_server, pbs_mom, and/or pbs_sched from backgrounding themselves allowing direct launch under a debugger.  Also, some client commands will provide additional diagnostic information when this value is set.
- PBSLOGLEVEL environment variable - Can be set to any value between 0 and 7 and specifies the logging verbosity level (default = 0)
- PBSCOREDUMP environment variable - If set, it will cause the offending resource manager daemon to create a core file if a SIGSEGV, SIGILL, SIGFPE, SIGSYS, or SIGTRAP signal is received.  The core dump will be placed in the daemon's home directory ($PBSHOME/mom_priv for pbs_mom).
- NDEBUG #define - if set at build time, will cause additional low-level logging information to be output to stdout for pbs_server and pbs_mom daemons.
- tracejob reporting tool - can be used to collect and report logging and accounting information for specific jobs
11.3.2 TORQUE Error Codes
  
    
      | Error Code Name | Number | Description | 
    
      | PBSE_NONE | 15000 | No error | 
    
      | PBSE_UNKJOBID | 15001 | Unknown job identifier | 
    
      | PBSE_NOATTR | 15002 | Undefined attribute | 
    
      | PBSE_ATTRRO | 15003 | Attempt to set READ ONLY attribute | 
    
      | PBSE_IVALREQ | 15004 | Invalid request | 
    
      | PBSE_UNKREQ | 15005 | Unknown batch request | 
    
      | PBSE_TOOMANY | 15006 | Too many submit retries | 
    
      | PBSE_PERM | 15007 | No permission | 
    
      | PBSE_BADHOST | 15008 | Access from host not allowed | 
    
      | PBSE_JOBEXIST | 15009 | Job already exists | 
    
      | PBSE_SYSTEM | 15010 | System error occurred | 
    
      | PBSE_INTERNAL | 15011 | Internal server error occurred | 
    
      | PBSE_REGROUTE | 15012 | Parent job of dependent in rte queue | 
    
      | PBSE_UNKSIG | 15013 | Unknown signal name | 
    
      | PBSE_BADATVAL | 15014 | Bad attribute value | 
    
      | PBSE_MODATRRUN | 15015 | Cannot modify attribute in run state | 
    
      | PBSE_BADSTATE | 15016 | Request invalid for job state | 
    
      | PBSE_UNKQUE | 15018 | Unknown queue name | 
    
      | PBSE_BADCRED | 15019 | Invalid credential in request | 
    
      | PBSE_EXPIRED | 15020 | Expired credential in request | 
    
      | PBSE_QUNOENB | 15021 | Queue not enabled | 
    
      | PBSE_QACESS | 15022 | No access permission for queue | 
    
      | PBSE_BADUSER | 15023 | Bad user - no password entry | 
    
      | PBSE_HOPCOUNT | 15024 | Max hop count exceeded | 
    
      | PBSE_QUEEXIST | 15025 | Queue already exists | 
    
      | PBSE_ATTRTYPE | 15026 | Incompatible queue attribute type | 
    
      | PBSE_QUEBUSY | 15027 | Queue busy (not empty) | 
    
      | PBSE_QUENBIG | 15028 | Queue name too long | 
    
      | PBSE_NOSUP | 15029 | Feature/function not supported | 
    
      | PBSE_QUENOEN | 15030 | Cannot enable queue,needs add def | 
    
      | PBSE_PROTOCOL | 15031 | Protocol (ASN.1) error | 
    
      | PBSE_BADATLST | 15032 | Bad attribute list structure | 
    
      | PBSE_NOCONNECTS | 15033 | No free connections | 
    
      | PBSE_NOSERVER | 15034 | No server to connect to | 
    
      | PBSE_UNKRESC | 15035 | Unknown resource | 
    
      | PBSE_QUENODFLT | 15036 | No default queue defined | 
    
      | PBSE_EXCQRESC | 15037 | Job exceeds queue resource limits | 
    
      | PBSE_NORERUN | 15038 | Job not rerunnable | 
    
      | PBSE_ROUTEREJ | 15039 | Route rejected by all destinations | 
    
      | PBSE_ROUTEEXPD | 15040 | Time in route queue expired | 
    
      | PBSE_MOMREJECT | 15041 | Request to MOM failed | 
    
      | PBSE_BADSCRIPT | 15042 | (qsub) Cannot access script file | 
    
      | PBSE_STAGEIN | 15043 | Stage-In of files failed | 
    
      | PBSE_RESCUNAV | 15044 | Resources temporarily unavailable | 
    
      | PBSE_BADGRP | 15045 | Bad group specified | 
    
      | PBSE_MAXQUED | 15046 | Max number of jobs in queue | 
    
      | PBSE_CKPBSY | 15047 | Checkpoint busy, may be retries | 
    
      | PBSE_EXLIMIT | 15048 | Limit exceeds allowable | 
    
      | PBSE_BADACCT | 15049 | Bad account attribute value | 
    
      | PBSE_ALRDYEXIT | 15050 | Job already in exit state | 
    
      | PBSE_NOCOPYFILE | 15051 | Job files not copied | 
    
      | PBSE_CLEANEDOUT | 15052 | Unknown job id after clean init | 
    
      | PBSE_NOSYNCMSTR | 15053 | No master in sync set | 
    
      | PBSE_BADDEPEND | 15054 | Invalid dependency | 
    
      | PBSE_DUPLIST | 15055 | Duplicate entry in list | 
    
      | PBSE_DISPROTO | 15056 | Bad DIS based request protocol | 
    
      | PBSE_EXECTHERE | 15057 | Cannot execute there | 
    
      | PBSE_SISREJECT | 15058 | Sister rejected | 
    
      | PBSE_SISCOMM | 15059 | Sister could not communicate | 
    
      | PBSE_SVRDOWN | 15060 | Requirement rejected -server shutting down | 
    
      | PBSE_CKPSHORT | 15061 | Not all tasks could checkpoint | 
    
      | PBSE_UNKNODE | 15062 | Named node is not in the list | 
    
      | PBSE_UNKNODEATR | 15063 | Node-attribute not recognized | 
    
      | PBSE_NONODES | 15064 | Server has no node list | 
    
      | PBSE_NODENBIG | 15065 | Node name is too big | 
    
      | PBSE_NODEEXIST | 15066 | Node name already exists | 
    
      | PBSE_BADNDATVAL | 15067 | Bad node-attribute value | 
    
      | PBSE_MUTUALEX | 15068 | State values are mutually exclusive | 
    
      | PBSE_GMODERR | 15069 | Error(s) during global modification of nodes | 
    
      | PBSE_NORELYMOM | 15070 | Could not contact Mom | 
    
      | PBSE_NOTSNODE | 15071 | No time-shared nodes | 
  
See Also