TORQUE Resource Manager

Appendix C: Node Manager (MOM) Configuration

Under TORQUE, MOM configuration is accomplished using the mom_priv/config file located in the PBS directory on each execution server.

C.1 Parameters

tbody>
arch
<STRING>
specifies the architecture of the local machine. This information is used by the scheduler only.
arch ia64
   
$alias_server_name
<STRING>
(Applicable in version 2.5.0 and later.) Allows the MOM to accept an additional pbs_server host name as a trusted address.

This feature was added to overcome a problem with UDP and RPP where alias IP addresses are used on a server. With alias IP addresses a UDP packet can be sent to the alias address but the UDP reply packet will come back on the primary IP address. RPP matches addresses from its connection table to incoming packets. If the addresses do not match an entry in the RPP table, the packet is dropped. This feature allows an additional address for the server to be added to the table so legitimate packets are not dropped.
$alias_server_name node01
   
$clienthost
<STRING>
specifies the machine running pbs_server

Note This parameter is deprecated, use pbsserver

$clienthost node01.teracluster.org
   
$check_poll_time
<STRING>
amount of time between checking running jobs, polling jobs, and trying to resend obituariess for jobs that haven't sent successfully. Default is 45 seconds.
$check_poll_time 90
   
$configversion
<STRING>
specifies the version of the config file data
$configversion 113
   
$cputmult
<FLOAT>
cpu time multiplier.

Note If set to 0.0, MOM level cputime enforcement is disabled.

$cputmult 2.2
   
$exec_with_exec
<BOOLEAN>
pbs_mom uses the exec command to start the job script rather than the TORQUE default method, which is to pass the script's contents as the input to the shell. This means that if you trap signals in the job script, they will be trapped for the job. Using the default method, you would need to configure the shell to also trap the signals. Default is FALSE.
$exec_with_exec true
   
$ideal_load
<FLOAT>
ideal processor load
$ideal_load 4.0
   
$igncput
<BOOLEAN>
ignores limit violations pertaining to cpu time. Default is false.
$igncput true
   
$ignmem
<BOOLEAN>
ignores limit violations pertaining to physical memory. Default is false.
$ignmem true
   
$ignvmem
<BOOLEAN>
ignores limit violations pertaining to virtual memory. Default is false.
$ignvmem true
   
$ignwalltime
<BOOLEAN>
ignore walltime (do not enable mom based walltime limit enforcement)
$ignwalltime true
   
$job_output_file_umask
<STRING>
uses the specified umask when creating job output and error files. Values can be specified in base 8, 10, or 16; leading 0 implies octal and leading 0x or 0X hexadecimal. A value of "userdefault" will use the user's default umask. This parameter is in version 2.3.0 and later.
$job_output_file_umask 027
   
$job_starter
<STRING>
specifies the fully qualified pathname of the job starter. If this parameter is specified, instead of executing the job command and job arguments directly, the MOM will execute the job starter, passing the job command and job arguments to it as its arguments. The job starter can be used to launch jobs within a desired environment.

$job_starter /var/torque/mom_priv/job_starter.sh
> cat /var/torque/mom_priv/job_starter.sh
#!/bin/bash
export FOOHOME=/home/foo
ulimit -n 314
$*

   
$log_directory
<STRING>
changes the log directory. Default is TORQUE_HOME/mom_logs/. TORQUE_HOME default is /var/spool/torque/ but can be changed in the ./configure script. The value is a string and should be the full path to the desired mom log directory.
$log_directory /opt/torque/mom_logs/
   
$log_file_suffix
<STRING>
optional suffix to append to log file names. If %h is the suffix, pbs_mom appends the hostname for where
the log files are stored if it knows it, otherwise it will append the hostname where the mom is running.
$log_file_suffix %h = 20100223.mybox
$log_file_suffix foo = 20100223.foo
   
$logevent
<STRING>
specifies a bitmap for event types to log
$logevent 255
   
$loglevel
<INTEGER>
specifies the verbosity of logging with higher numbers specifying more verbose logging. Values may range between 0 and 7.
$loglevel 4
   
$log_file_max_size
<INTEGER>
Soft limit for log file size in kilobytes. Checked every 5 minutes. If the log file is found to be greater than or equal to log_file_max_size the current log file will be moved from X to X.1 and a new empty file will be opened.
$log_file_max_size = 100
   
$log_file_roll_depth
<INTEGER>
specifies how many times a log fill will be rolled before it is deleted.
$log_file_roll_depth = 7
   
$log_keep_days
<INTEGER>
Specifies how many days to keep log files. pbs_mom deletes log files older than the specified number of days. If not specified, pbs_mom won't delete log files based on their age.
$log_keep_days 10
   
$max_load
<FLOAT>
maximum processor load
$max_load 4.0
   
$memory_pressure_duration
<INTEGER>
(Applicable in version 3.0 and later.) Memory pressure duration sets a limit to the number of times the value of memory_pressure_threshold can be exceeded before a process is terminated. This can only be used with $memory_pressure_threshold.
$memory_pressure_duration 5
   
$memory_pressure_threshold
<INTEGER>
(Applicable in version 3.0 and later.) The memory_pressure of a cpuset provides a simple per-cpuset running average of the rate that the processes in a cpuset are attempting to free up in-use memory on the nodes of the cpuset to satisfy additional memory requests. The memory_pressure_threshold is an integer number used to compare against the reclaim rate provided by the memory_pressure file. If the threshold is exceeded and memory_pressure_duration is set, then the process terminates after exceeding the threshold by the number of times set in memory_pressure_duration. If memory_pressure duration is not set, then a warning is logged and the process continues. Memory_pressure_threshold is only valid with memory_pressure enabled in the root cpuset. To enable, log in as the super user and execute the command echo 1 >> /dev/cpuset/memory_pressure_enabled. See the cpuset man page for more information concerning memory pressure.
$memory_pressure_threshold 1000
   
$node_check_script
<STRING>
specifies the fully qualified pathname of the health check script to run. (see Health Check for more information)
$node_check_script /opt/batch_tools/nodecheck.pl
   
$node_check_interval
<INTEGER>

specifies the number of MOM intervals between subsequent executions of the specified health check. This value default to 1 indicating the check is run every mom interval (see Health Check for more information).

$node_check_interval has two special strings that can be set:

  • jobstart - makes the node health script run when a job is started.
  • jobend - makes the node health script run after each job has completed on a node.

$node_check_interval 5
   
$nodefile_suffix
<STRING>
Specifies the suffix to append to a host names to denote the data channel network adapter in a multihomed compute node.
$nodefile_suffix i

With the suffix of 'i' and the control channel adapter with the name node01, the data channel would have a hostname of node01i.

   
$nospool_dir_list
<STRING>

If this is configured, the job's output is spooled in the working directory of the job or the specified output directory.

Specify the list in full paths, delimited by commas. If the job's working directory (or specified output directory) is in one of the paths in the list (or a subdirectory of one of the paths in the list), the job is spooled directly to the output location. $nospool_dir_list * is accepted.

The user that submits the job must have write permission on the folder where the job is written, and read permission on the folder where the file is spooled.

Alternatively, you can use the $spool_as_final_name parameter to force the job to spool directly to the final output.

Note This should generally be used only when the job can run on the same machine as where the output file goes, or if there is a shared filesystem. If not, this parameter can slow down the system or fail to create the output file.
$nospool_dir_list /home/mike/jobs/,/var/tmp/spool/
   
opsys
<STRING>
specifies the operating system of the local machine. This information is used by the scheduler only.
opsys RHEL3
   
$pbsclient
<STRING>
specifies machines which the mom daemon will trust to run resource manager commands via momctl. This may include machines where monitors, schedulers, or admins require the use of this command.)
$pbsclient node01.teracluster.org
   
$pbsserver
<STRING>
specifies the machine running pbs_server

Note This parameter replaces the deprecated parameter clienthost.

$pbsserver node01.teracluster.org
   
$prologalarm
<INTEGER>
Specifies maximum duration (in seconds) which the mom will wait for the job prologue or job epilogue to complete. This parameter defaults to 300 seconds (5 minutes).
$prologalarm 60
   
$rcpcmd
<STRING>
specifies the full path and optional additional command line args to use to perform remote copies
mom_priv/config:
$rcpcmd /usr/local/bin/scp -i /etc/sshauth.dat
   
$remote_reconfig
<STRING>
Enables the ability to remotely reconfigure pbs_mom with a new config file. Default is disabled. This parameter accepts various forms of true, yes, and 1. For more information on how to reconfigure MOMs, see momctl -r.
$remote_reconfig true
   
$reduce_prolog_checks
<STRING>
If enabled, TORQUE will only check if the file is a regular file and is executable, instead of the normal checks listed on the prologue and epilogue page. Default is false.
$reduce_prolog_checks true
   
$restricted
<STRING>
Specifies hosts which can be trusted to access mom services as non-root. By default, no hosts are trusted to access mom services as non-root.
$restricted *.teracluster.org
   
$rpp_throttle
<INTEGER>
This integer is in microseconds and causes a sleep after every RPP packet is sent. It is for systems that experience job failures because of incomplete data.
$rpp_throttle 100 (will cause a 100 microsecond sleep)
   
size[fs=<FS>]
N/A
Specifies that the available and configured disk space in the <FS> filesystem is to be reported to the pbs_server and scheduler.

Note To request disk space on a per job basis, specify the file resource as in 'qsub -l nodes=1,file=1000kb'.

Note Unlike most mom config options, the size parameter is not preceded by a '$' character.

size[fs=/localscratch]

the available and configured disk space in the /localscratch filesystem will be reported.

   
$source_login_batch
<STRING>
Specifies whether or not mom will source the /etc/profile, etc. type files for batch jobs. Parameter accepts various forms of true, false, yes, no, 1 and 0. Default is True. This parameter is in version 2.3.1 and later.
$source_login_batch False

mom will bypass the sourcing of /etc/profile, etc. type files.

   
$source_login_interactive
<STRING>
Specifies whether or not mom will source the /etc/profile, etc. type files for interactive jobs. Parameter accepts various forms of true, false, yes, no, 1 and 0. Default is True. This parameter is in version 2.3.1 and later.
$source_login_interactive False

mom will bypass the sourcing of /etc/profile, etc. type files.

   
$spool_as_final_name
<BOOLEAN>
This will spool the job under the final name that the output and error files will receive, instead of having an intermediate file and then copying the result to the final file when the job has completed. This allows users easier access to the file if they want to watch the jobs output as it runs.
$spool_as_final_name true
   
$status_update_time
<INTEGER>
Specifies the number of seconds between subsequent mom-to-server update reports. Default is 45 seconds.
status_update_time:
$status_update_time 120
mom will send server update reports every 120 seconds.
   
$thread_unlink_calls
<BOOLEAN>
Threads calls to unlink when deleting a job. Default is false. If it is set to TRUE, pbs_mom will use a thread to delete the job's files.
thread_unlink_calls:
$thread_unlink_calls true
   
$timeout
<INTEGER>
Specifies the number of seconds before mom-to-mom messages will timeout if RPP is disabled. Default is 60 seconds.
$timeout 120

mom-to-mom communication will allow up to 120 seconds before timing out.

   
$tmpdir
<STRING>
Specifies a directory to create job-specific scratch space (see Creating Per-Job Temporary Directories
$tmpdir /localscratch
   
$usecp
<HOST>:<SRCDIR> <DSTDIR>
Specifies which directories should be staged (see TORQUE Data Management)
$usecp *.fte.com:/data /usr/local/data
   
$use_smt
<BOOLEAN>
Indicates that the user would like to use SMT. If set, each logical core inside of a physical core will be used as a normal core for cpusets.

Note If SMT is used, you will need to set the np attribute so that each logical processor is counted.

$use_smt false
   
varattr
<INTEGER> <STRING>

Provides a way to keep track of dynamic attributes on nodes.

<INTEGER> is how many seconds should go by between calls to the script to update the dynamic values. If set to -1, the script is read only one time.

<STRING> is the script path. This script should check for whatever dynamic attributes are desired, and then output lines in this format:

    name=value

Include any arguments after the script's full path.

These features are visible in the output of pbsnodes -a:

varattr=Matlab=7.1;Octave=1.0.

varattr 25 /usr/local/scripts/nodeProperties.pl arg1 arg2 arg3
   
$wallmult
<FLOAT>
Sets a factor to adjust walltime usage by multiplying a default job time to a common reference system. It modifies real walltime on a per-MOM basis (MOM configuration parameters). The factor is used for walltime calculations and limits in the same way that cputmult is used for cpu time.

Note If set to 0.0, MOM level walltime enforcement is disabled.

$wallmult 2.2
   

C.2 Node Features and Generic Consumable Resource Specification

Node features (a.k.a. node properties) are opaque labels which can be applied to a node. They are not consumable and cannot be associated with a value. (use generic resources described below for these purposes). Node features are configured within the global nodes file on the pbs_server head node and are not specified on a per node basis. This file can be used to specify an arbitrary number of node features.

Additionally, per node consumable generic resources may be specified using the format '<ATTR> <VAL>' with no leading dollar ('$') character. When specified, this information is routed to the scheduler and can be used in scheduling decisions. For example, to indicate that a given host has two tape drives and one node-locked matlab license available for batch jobs, the following could be specified:

mom_priv/config:
$clienthost 241.13.153.7

tape    2
matlab  1

Dynamic Consumable Generic Resources

Dynamic consumable resource information can be routed in by specifying a value preceded by a exclamation point (i.e., '!') as in the example below. If the resource value is configured in this manner, the specified file will be periodically executed to load the effective resource value. (See section 2.5.3 of the 'PBS Administrator Guide' for more information)

mom_priv/config:
$clienthost 241.13.153.7

tape    !/opt/rm/gettapecount.pl
matlab  !/opt/tools/getlicensecount.pl

C.3 Command-line Arguments

Below is a table of pbs_mom command-line startup flags.

Flag Description
Alarm time in seconds.
Config file path.
Checkpoint path.
Home directory.
Logfile.
MOM port to listen on.
Perform 'poll' based job recovery on restart (jobs persist until associated processes terminate).
On restart, deletes all jobs that were running on MOM (Available in 2.4.X and later).
On restart, requeues all jobs that were running on MOM (Available in 2.4.X and later).
On restart, kills all processes associated with jobs that were running on MOM, and then requeues the jobs.
MOM 'RM' port to listen on.
pbs_server port to connect to.
Display version information and exit.
Disable use of privileged port.
Show usage information and exit.
*For more details on these command-line options, see the pbs_mom man page.

See Also