pbs_mom

Start a pbs batch execution mini-server.

Synopsis

pbs_mom [-a alarm] [-A alias] [-C chkdirectory] [-c config] [-d directory] [-h hostname]
[-L logfile] [-M MOMport] [-R RPPport] [-p|-r] [-P purge] [-w] [-x]

Description

The pbs_mom command is located within the TORQUE_HOME directory and starts the operation of a batch Machine Oriented Mini-server (MOM) on the execution host. To ensure that the pbs_mom command is not runnable by the general user community, the server will only execute if its real and effective uid is zero.

The first function of pbs_mom is to place jobs into execution as directed by the server, establish resource usage limits, monitor the job's usage, and notify the server when the job completes. If they exist, pbs_mom will execute a prologue script before executing a job and an epilogue script after executing the job.

The second function of pbs_mom is to respond to resource monitor requests. This was done by a separate process in previous versions of PBS but has now been combined into one process. It provides information about the status of running jobs, memory available, etc.

The last function of pbs_mom is to respond to task manager requests. This involves communicating with running tasks over a TCP socket as well as communicating with other MOMs within a job (a.k.a. a "sisterhood").

pbs_mom will record a diagnostic message in a log file for any error occurrence. The log files are maintained in the mom_logs directory below the home directory of the server. If the log file cannot be opened, the diagnostic message is written to the system console.

Options

Flag Name Description
-a alarm Used to specify the alarm timeout in seconds for computing a resource. Every time a resource request is processed, an alarm is set for the given amount of time. If the request has not completed before the given time, an alarm signal is generated. The default is 5 seconds.
-A alias Used to specify this multimom's alias name. The alias name needs to be the same name used in the mom.hierarchy file. It is only needed when running multiple MOMs on the same machine. For more information, see TORQUE Multi-MOM.
-C chkdirectory Specifies The path of the directory used to hold checkpoint files. (Currently this is only valid on Cray systems.) The default directory is TORQUE_HOME/spool/checkpoint (see the -d option). The directory specified with the -C option must be owned by root and accessible (rwx) only by root to protect the security of the checkpoint files.
-c config Specifies an alternative configuration file, see description below. If this is a relative file name it will be relative to TORQUE_HOME/mom_priv, (see the -d option). If the specified file cannot be opened, pbs_mom will abort. If the -C option is not supplied, pbs_mom will attempt to open the default configuration file "config" in TORQUE_HOME/mom_priv. If this file is not present, pbs_mom will log the fact and continue.
-d directory Specifies the path of the directory which is the home of the server's working files, TORQUE_HOME. This option is typically used along with -M when debugging MOM. The default directory is given by $PBS_SERVER_HOME which is typically /usr/spool/PBS.
-h hostname Set MOM's hostname. This can be useful on multi-homed networks.
-L logfile Specify an absolute path name for use as the log file. If not specified, MOM will open a file named for the current date in the TORQUE_HOME/mom_logs directory (see the -d option).
-M port Specifies the port number on which the mini-server (MOM) will listen for batch requests.
-p n/a Specifies the impact on jobs which were in execution when the mini-server shut down. On any restart of MOM, the new mini-server will not be the parent of any running jobs, MOM has lost control of her offspring (not a new situation for a mother). With the -p option, MOM will allow the jobs to continue to run and monitor them indirectly via polling. This flag is redundant in that this is the default behavior when starting the server. The -p option is mutually exclusive with the -R and -q options.
-P purge Specifies the impact on jobs which were in execution when the mini-server shut down. With the -P option, it is assumed that either the entire system has been restarted or the MOM has been down so long that it can no longer guarantee that the pid of any running process is the same as the recorded job process pid of a recovering job. Unlike the -p option, no attempt is made to try and preserve or recover running jobs. All jobs are terminated and removed from the queue.
-q n/a Specifies the impact on jobs which were in execution when the mini-server shut down. With the -q option, MOM will allow the processes belonging to jobs to continue to run, but will not attempt to monitor them. The -q option is mutually exclusive with the -p and -R options.
-R port Specifies the port number on which the mini-server (MOM) will listen for resource monitor requests, task manager requests and inter-MOM messages. Both a UDP and a TCP port of this number will be used.
-r n/a

Specifies the impact on jobs which were in execution when the mini-server shut down. With the -r option, MOM will kill any processes belonging to jobs, mark the jobs as terminated, and notify the batch server which owns the job. The -r option is mutually exclusive with the -p and -q options.

Normally the mini-server is started from the system boot file without the -p or the -r option. The mini-server will make no attempt to signal the former session of any job which may have been running when the mini-server terminated. It is assumed that on reboot, all processes have been killed. If the -r option is used following a reboot, process IDs (pids) may be reused and MOM may kill a process that is not a batch session.

-w wait_for_server When started with -w, pbs_moms wait until they get their MOM hierarchy file from pbs_server to send their first update, or until 10 minutes pass. This reduces network traffic on startup and can bring up clusters faster.
-x n/a Disables the check for privileged port resource monitor connections. This is used mainly for testing since the privileged port is the only mechanism used to prevent any ordinary user from connecting.

Configuration file

The configuration file may be specified on the command line at program start with the -C flag. The use of this file is to provide several types of run time information to pbs_mom: static resource names and values, external resources provided by a program to be run on request via a shell escape, and values to pass to internal set up functions at initialization (and re-initialization).

Each item type is on a single line with the component parts separated by white space. If the line starts with a hash mark (pound sign, #), the line is considered to be a comment and is skipped.

Static Resources

For static resource names and values, the configuration file contains a list of resource names/values pairs, one pair per line and separated by white space. An example of static resource names and values could be the number of tape drives of different types and could be specified by:

Shell Commands

If the first character of the value is an exclamation mark (!), the entire rest of the line is saved to be executed through the services of the system(3) standard library routine.

The shell escape provides a means for the resource monitor to yield arbitrary information to the scheduler. Parameter substitution is done such that the value of any qualifier sent with the query, as explained below, replaces a token with a percent sign (%) followed by the name of the qualifier. For example, here is a configuration file line which gives a resource name of "escape":

escape !echo %xxx %yyy

If a query for "escape" is sent with no qualifiers, the command executed would be echo %xxx %yyy.

If one qualifier is sent, escape[xxx=hi there], the command executed would be echo hi there %yyy.

If two qualifiers are sent, escape[xxx=hi][yyy=there], the command executed would be echo hi there.

If a qualifier is sent with no matching token in the command line, escape[zzz=snafu], an error is reported.

size[fs=<FS>]

Specifies that the available and configured disk space in the <FS> filesystem is to be reported to the pbs_server and scheduler. To request disk space on a per job basis, specify the file resource, as in qsub -l nodes=1,file=1000kb. For example, the available and configured disk space in the /localscratch filesystem will be reported:

size[fs=/localscratch]

Initialization Value

An initialization value directive has a name which starts with a dollar sign ($) and must be known to the MOM via an internal table. The entries in this table now are:

Entry Description
pbsclient

Causes a host name to be added to the list of hosts which will be allowed to connect to the MOM as long as they are using a privileged port for the purposes of resource monitor requests. For example, here are two configuration file lines which will allow the hosts "fred" and "wilma" to connect:

$pbsclient fred

$pbsclient wilma

Two host names are always allowed to connect to pbs_mom "localhost" and the name returned to pbs_mom by the system call gethostname(). These names need not be specified in the configuration file. The hosts listed as "clients" can issue Resource Manager (RM) requests. Other MOM nodes and servers do not need to be listed as clients.

restricted

Causes a host name to be added to the list of hosts which will be allowed to connect to the MOM without needing to use a privileged port. These names allow for wildcard matching. For example, here is a configuration file line which will allow queries from any host from the domain "ibm.com".

$restricted *.ibm.com

The restriction which applies to these connections is that only internal queries may be made. No resources from a config file will be found. This is to prevent any shell commands from being run by a non-root process. This parameter is generally not required except for some versions of OSX.

logevent

Sets the mask that determines which event types are logged by pbs_mom. For example:

$logevent 0x1fff $logevent 255

The first example would set the log event mask to 0x1ff (511) which enables logging of all events including debug events. The second example would set the mask to 0x0ff (255) which enables all events except debug events.

cputmult

Sets a factor used to adjust cpu time used by a job. This is provided to allow adjustment of time charged and limits enforced where the job might run on systems with different cpu performance. If the MOM's system is faster than the reference system, set cputmult to a decimal value greater than 1.0. If the MOM's system is slower, set cputmult to a value between 1.0 and 0.0. For example:

$cputmult 1.5 $cputmult 0.75

usecp

Specifies which directories should be staged with cp instead of rcp/scp. If a shared filesystem is available on all hosts in a cluster, this directive is used to make these filesystems known to the MOM. For example, if /home is NFS mounted on all nodes in a cluster:

$usecp *:/home /home

wallmult Sets a factor to adjust wall time usage by to job to a common reference system. The factor is used for walltime calculations and limits in the same way that cputmult is used for cpu time.
configversion Specifies the version of the config file data, a string.
check_poll_time Specifies the MOM interval in seconds that TORQUE polls the sisters for job information. The MOM checks each job for updated resource usages, exited processes, over-limit conditions, etc., once per interval. This value should be equal or lower to pbs_server's job_stat_rate. High values result in stale information reported to pbs_server. Low values result in increased system usage by the MOM. Default is 45 seconds.
down_on_error Causes the MOM to report itself as state "down" to pbs_server in the event of a failed health check. This feature is experimental. (For more information, see Health check.)
ideal_load Ideal processor load. Represents a low water mark for the load average. A node that is currently busy will consider itself free after falling below ideal_load.
loglevel Specifies the verbosity of logging with higher numbers specifying more verbose logging. Values may range between 0 and 7.
log_file_max_size If this is set to a value > 0, then pbs_mom will roll the current log file to log-file-name.1 when its size is greater than or equal to the value of log_file_max_size. This value is interpreted as kilobytes.
log_file_roll_depth If this is set to a value >=1 and log_file_max_size is set, then pbs_mom will allow logs to be rolled up to the specified number of logs. At every roll, the oldest log will be the one to be deleted to make room for rolling. pbs_mom will continue rolling the log files to log-file-name.log_file_roll_depth.
max_load Maximum processor load. Nodes over this load average are considered busy (see ideal_load above).
enablemomrestart Enables automatic restarts of the MOM. If enabled, the MOM will check if its binary has been updated and restart itself at a safe point when no jobs are running; thus making upgrades easier. The check is made by comparing the mtime of the pbs_mom executable. Command-line args, the process name, and the PATH env variable are preserved across restarts. It is recommended that this not be enabled in the config file, but enabled when desired with momctl (see Resources for more information.)
node_check_script Specifies the fully qualified pathname of the health check script to run (see Health check for more information).
node_check_interval

Specifies when to run the MOM health check. The check can be either periodic, event-driven, or both. The value starts with an integer specifying the number of MOM intervals between subsequent executions of the specified health check. After the integer is an optional comma-separated list of event names. Currently supported are "jobstart" and "jobend". This value defaults to 1 with no events indicating the check is run every MOM interval. (see Health check for more information.)

$node_check_interval 0,Disabled

$node_check_interval 0,jobstartOnly

$node_check_interval 10,jobstart,jobend

prologalarm Specifies maximum duration (in seconds) which the MOM will wait for the job prolog or job epilog to complete. This parameter defaults to 300 seconds (5 minutes).
rcpcmd

Specify the full path and argument to be used for remote file copies. This overrides the compile-time default found in configure. This must contain 2 words: the full path to the command and the options. The copy command must be able to recursively copy files to the remote host and accept arguments of the form "user@host:files." For example:

$rcpcmd /usr/bin/rcp -rp

$rcpcmd /usr/bin/scp -rpB

remote_checkpoint_dirs

Specifies which server checkpoint directories are remotely mounted. It tells the MOM which directories are shared with the server. Using remote checkpoint directories eliminates the need to copy the checkpoint files back and forth between the MOM and the server. All entries must be on the same line, separated by a space.

$remote_checkpoint_dirs /checkpointFiles /bigStorage /fast

This informs the MOM that the /checkpointFiles, /bigStorage, and /fast directories are remotely mounted checkpoint directories.

remote_reconfig Enables the ability to remotely reconfigure pbs_mom with a new config file. Default is disabled. This parameter accepts various forms of true, yes, and 1.
timeout Specifies the number of seconds before TCP messages will time out. TCP messages include job obituaries, and TM requests if RPP is disabled. Default is 60 seconds.
tmpdir

Sets the directory base name for a per-job temporary directory. Before job launch, the MOM appends the jobid of running jobs to the tmpdir base name and creates the directory. After the job exits, the MOM recursively deletes the directory with the jobid in its name. TORQUE creates and removes the directory as the job owner and group, so the owner must have write permissions to create the directory. The environment variable TMPDIR will be set for all prologue and epilogue scripts, the job script, and TM tasks for each job.

It is recommended that you create the tmpdir directory before running any jobs on the MOM and that you make it readable and writable to all users who will run jobs. If you do not specify a base tmpdir directory, the first job run on the MOM will create the directory and set the directory owner to its own. After that, no other users will be able to run jobs on that MOM.

status_update_time Specifies (in seconds) how often the MOM updates its status information to pbs_server. This value should correlate with the server's scheduling interval and its "node_check_rate" attribute. High values for "status_update_time" cause pbs_server to report stale information, while low values increase the load of pbs_server and the network. Default is 45 seconds.
varattr

This is similar to a shell escape above, but includes a TTL. The command will only be run every TTL seconds. A TTL of -1 will cause the command to be executed only once. A TTL of 0 will cause the command to be run every time varattr is requested. This parameter may be used multiple times, but all output will be grouped into a single "varattr" attribute in the request and status output. If the command has no output, the name will be skipped in the output.

$varattrseta

$varattrsetb

xauthpath Specifies the path to the xauth binary to enable X11 forwarding.
ignvmem If set to true, then pbs_mom will ignore vmem/pvmem limit enforcement.
ignwalltime If set to true, then pbs_mom will ignore walltime limit enforcement.
mom_host Sets the local hostname as used by pbs_mom.

Resources

Resource Manager queries can be made with momctl -q options to retrieve and set pbs_mom options. Any configured static resource may be retrieved with a request of the same name. These are resource requests not otherwise documented in the PBS ERS.

Request Description
cycle Forces an immediate MOM cycle.
status_update_time Retrieve or set the $status_update_time parameter.
check_poll_time Retrieve or set the $check_poll_time parameter.
configversion Retrieve the config version.
jobstartblocktime Retrieve or set the $jobstartblocktime parameter.
enablemomrestart Retrieve or set the $enablemomrestart parameter.
loglevel Retrieve or set the $loglevel parameter.
down_on_error Retrieve or set the EXPERIMENTAL $down_on_error parameter.
diag0 - diag4 Retrieves varied diagnostic information.
rcpcmd Retrieve or set the $rcpcmd parameter.
version Retrieves the pbs_mom version.

Health check

The health check script is executed directly by the pbs_mom daemon under the root user id. It must be accessible from the compute node and may be a script or compiled executable program. It may make any needed system calls and execute any combination of system utilities but should not execute resource manager client commands. Also, the pbs_mom daemon blocks until the health check is completed and does not possess a built-in timeout. Consequently, it is advisable to keep the launch script execution time short and verify that the script will not block even under failure conditions.

If the script detects a failure, it should return the keyword "Error" to stdout followed by an error message. The message (up to 256 characters) immediately following the Error string will be assigned to the node attribute message of the associated node.

If the script detects a failure when run from "jobstart", then the job will be rejected. You can use this behavior with an advanced scheduler, such as Moab Workload Manager, to cause the job to be routed to another node. TORQUE currently ignores Error messages by default, but you can configure an advanced scheduler to react appropriately.

If the experimental $down_on_error MOM setting is enabled, the MOM will set itself to state down and report to pbs_server. Additionally, the experimental $down_on_error server attribute can be enabled which has the same effect but moves the decision to pbs_server. It is redundant to have MOM's $down_on_error and pbs_servers down_on_error features enabled. See "down_on_error" in pbs_server_attributes(7B).

Files

File Description
$PBS_SERVER_HOME/server_name Contains the hostname running pbs_server
$PBS_SERVER_HOME/mom_priv The default directory for configuration files, typically (/usr/spool/pbs)/mom_priv
$PBS_SERVER_HOME/mom_logs Directory for log files recorded by the server
$PBS_SERVER_HOME/mom_priv/prologue The administrative script to be run before job execution
$PBS_SERVER_HOME/mom_priv/epilogue The administrative script to be run after job execution

Signal handling

pbs_mom handles the following signals:

Signal Description
SIGHUP Causes pbs_mom to re-read its configuration file, close and reopen the log file, and reinitialize resource structures.
SIGALRM Results in a log file entry. The signal is used to limit the time taken by certain children processes, such as the prologue and epilogue.
SIGINT and SIGTERM Results in pbs_mom exiting without terminating any running jobs. This is the action for the following signals as well: SIGXCPU, SIGXFSZ, SIGCPULIM, and SIGSHUTDN.
SIGUSR1, SIGUSR2 Causes the MOM to increase and decrease logging levels, respectively.
SIGPIPE, SIGINFO Are ignored.
SIGBUS, SIGFPE, SIGILL, SIGTRAP, and SIGSYS Cause a core dump if the PBSCOREDUMP environmental variable is defined.

All other signals have their default behavior installed.

Exit status

If the pbs_mom command fails to begin operation, the server exits with a value greater than zero.

Related topics 

Non-Adaptive Computing topics

© 2014 Adaptive Computing