TORQUE Resource Manager > Troubleshooting > TORQUE Log Files

TORQUE Log Files

pbs_server and pbs_mom Log Files

The pbs_server keeps a daily log of all activity in the TORQUE_HOME/server_logs directory. The pbs_mom also keeps a daily log of all activity in the TORQUE_HOME/mom_logs/ directory. These logs contain information on communication between server and MOM as well as information on jobs as they enter the queue and as they are dispatched, run, and terminated. These logs can be very helpful in determining general job failures. For MOM logs, the verbosity of the logging can be adjusted by setting the $loglevel parameter in the mom_priv/config file. For server logs, the verbosity of the logging can be adjusted by setting the server log_level attribute in qmgr.

For both pbs_mom and pbs_server daemons, the log verbosity level can also be adjusted by setting the environment variable PBSLOGLEVEL to a value between 0 and 7. Further, to dynamically change the log level of a running daemon, use the SIGUSR1 and SIGUSR2 signals to increase and decrease the active loglevel by one. Signals are sent to a process using the kill command.

For example, kill -USR1 `pgrep pbs_mom` would raise the log level up by one.

The current loglevel for pbs_mom can be displayed with the command momctl -d3.

trqauthd Log Files

As of TORQUE 4.1.3, trqauthd logs its events in the $TORQUE_HOME/client_logs directory. It names the log files in the format <YYYYMMDD>, creating a new log daily as events occur.

You might see some peculiar behavior if you mount the client_logs directory for shared access via network-attached storage.

When trqauthd first gets access on a particular day, it writes an "open" message to the day's log file. It also writes a "close" message to the last log file it accessed prior to that, which is usually the previous day's log file, but not always. For example, if it is Monday and no client commands were executed over the weekend, trqauthd writes the "close" message to Friday's file.

Since the various trqauthd binaries on the submit hosts (and potentially, the compute nodes) each write an "open" and "close" message on the first access of a new day, you'll see multiple (seemingly random) accesses when you have a shared log.

The trqauthd records the following events along with the date and time of the occurrence:

Example 4-26: trqauthd logging sample

2012-10-05 15:05:51.8404 Log opened

2012-10-05 15:05:51.8405 TORQUE authd daemon started and listening on IP:port 101.0.1.0:12345

2012-10-10 14:48:05.5688 User hfrye at IP:port abc:12345 logged in

Related Topics 

© 2015 Adaptive Computing