5.119
Parameters
These parameters go in the mom_priv/config file. They control various behaviors for the MOMs.
arch |
Format |
<STRING> |
Description |
Specifies the architecture of the local machine. This information is used by the scheduler only. |
Example |
arch ia64
|
$attempt_to_make_dir |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, specifies that you want Torque to attempt to create the output directories for jobs if they do not already exist.
Torque uses this parameter to make the directory as the user and not as root. Torque will create the directory (or directories) ONLY if the user has permissions to do so.
|
Example |
$attempt_to_make_dir true
|
$clienthost |
Format |
<STRING> |
Description |
Specifies the machine running pbs_server.
|
Example |
$clienthost node01.teracluster.org
|
$check_poll_time |
Format |
<STRING> |
Default |
45
|
Description |
Amount of time (in seconds) between checking running jobs, polling jobs, and trying to resend obituaries for jobs that haven't sent successfully. |
Example |
$check_poll_time 90
|
$configversion |
Format |
<STRING> |
Description |
Specifies the version of the config file data. |
Example |
$configversion 113
|
$cputmult |
Format |
<FLOAT> |
Description |
CPU time multiplier.
If set to 0.0, MOM level cputime enforcement is disabled.
|
Example |
$cputmult 2.2
|
$cray_check_rur |
Format |
<BOOLEAN> |
Default |
TRUE
|
Description |
When set to FALSE, login MOMs (Cray only) will not look at the energy resource information used for each job. Bypassing Resource Utilization Reporting (RUR) checking may improve performance. |
Example |
$cray_check_rur false
|
$cuda_visible_devices |
Format |
<BOOLEAN> |
Default |
TRUE
|
Description |
When set to TRUE, the MOM will set the CUDA_VISIBLE_DEVICES environment variable for jobs using NVIDIA GPUs. If set to FALSE, the MOM will not set CUDA_VISBLE_DEVICES for any jobs. |
Example |
$cuda_visible_devices true
|
$down_on_error |
Format |
<BOOLEAN> |
Default |
TRUE
|
Description |
Causes the MOM to report itself as state "down" to pbs_server in the event of a failed health check. See 5.405.6 Health check for more information.
|
Example |
$down_on_error true
|
$enablemomrestart |
Format |
<BOOLEAN> |
Description |
Enables automatic restarts of the MOM. If enabled, the MOM will check if its binary has been updated and restart itself at a safe point when no jobs are running; thus making upgrades easier. The check is made by comparing the mtime of the pbs_mom executable. Command-line args, the process name, and the PATH env variable are preserved across restarts. It is recommended that this not be enabled in the config file, but enabled when desired with momctl (see 5.405.5 Resources for more information.)
|
Example |
$enablemomrestart true
|
$exec_with_exec |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
pbs_mom uses the exec command to start the job script rather than the Torque default method, which is to pass the script's contents as the input to the shell. This means that if you trap signals in the job script, they will be trapped for the job. Using the default method, you would need to configure the shell to also trap the signals. |
Example |
$exec_with_exec true
|
$ext_pwd_retry |
Format |
<INTEGER> |
Default |
3
|
Description |
(Available in Torque 2.5.10, 3.0.4, and later.) Specifies the number of times to retry checking the password. Useful in cases where external password validation is used, such as with LDAP.
|
Example |
$ext_pwd_retry = 5
|
$force_overwrite |
Format |
<BOOLEAN> |
Description |
(Available in Torque 6.0.3 and later.) When set to true, forces the output files to be overwritten each time a job is started.
|
Example |
$force_overwrite true
|
$ideal_load |
Format |
<FLOAT> |
Description |
Ideal processor load. |
Example |
$ideal_load 4.0
|
$igncput |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
Ignores limit violation pertaining to CPU time. |
Example |
$igncput true
|
$ignmem |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
Ignores limit violations pertaining to physical memory. |
Example |
$ignmem true
|
$ignvmem |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
Ignore limit violations pertaining to virtual memory. |
Example |
$ignvmem true
|
$ignwalltime |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
Ignore walltime (do not enable MOM based walltime limit enforcement). |
Example |
$ignwalltime true
|
$jobdirectory_sticky |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When this option is set (TRUE), the job directory on the MOM can have a sticky bit set. |
Example |
$jobdirectory_sticky true
|
$job_exit_wait_time |
Format |
<INTEGER> |
Default |
600
|
Description |
This is the timeout (in seconds) to clean up parallel jobs after one of the sister nodes for the parallel job goes down or is otherwise unresponsive. The MOM sends out all of its kill job requests to sisters and marks the time. Additionally, the job is placed in the substate JOB_SUBSTATE_EXIT_WAIT. The MOM then periodically checks jobs in this state and if they are in this state for more than the specified time, death is assumed and the job gets cleaned up. Default is 600 seconds (10 minutes). |
Example |
$job_exit_wait_time 300
|
$job_output_file_umask |
Format |
<STRING> |
Description |
Uses the specified umask when creating job output and error files. Values can be specified in base 8, 10, or 16; leading 0 implies octal and leading 0x or 0X hexadecimal. A value of "userdefault" will use the user's default umask. This parameter is in version 2.3.0 and later. |
Example |
$job_output_file_umask 027
|
$job_starter |
Format |
<STRING> |
Description |
Specifies the fully qualified pathname of the job starter. If this parameter is specified, instead of executing the job command and job arguments directly, the MOM will execute the job starter, passing the job command and job arguments to it as its arguments. The job starter can be used to launch jobs within a desired environment. |
Example |
$job_starter /var/torque/mom_priv/job_starter.sh
> cat /var/torque/mom_priv/job_starter.sh
#!/bin/bash
export FOOHOME=/home/foo
ulimit -n 314
$*
|
$job_starter_run_priviledged |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, specifies that you want Torque to execute the $job_starter script with elevated privileges. |
Example |
$job_starter_run_privileged true
|
$log_directory |
Format |
<STRING> |
Default |
TORQUE_HOME/mom_logs/
|
Description |
Changes the log directory. TORQUE_HOME default is /var/spool/torque/ but can be changed in the ./configure script. The value is a string and should be the full path to the desired MOM log directory. |
Example |
$log_directory /opt/torque/mom_logs/
|
$log_file_suffix |
Format |
<STRING> |
Description |
Optional suffix to append to log file names. If %h is the suffix, pbs_mom appends the hostname for where the log files are stored if it knows it, otherwise it will append the hostname where the MOM is running. |
Example |
$log_file_suffix %h = 20100223.mybox
$log_file_suffix foo = 20100223.foo
|
$logevent |
Format |
<INTEGER> |
Description |
Creates an event mask enumerating which log events will be recorded in the MOM logs. By default all events are logged.
These are the events which can be chosen:
ERROR 0x0001 internal errors SYSTEM 0x0002 system (server) & (trqauthd) events ADMIN 0x0004 admin events JOB 0x0008 job related events JOB_USAGE 0x0010 End of Job accounting SECURITY 0x0020 security violation events SCHED 0x0040 scheduler events DEBUG 0x0080 common debug messages DEBUG2 0x0100 less needed debug messages CLIENTAUTH 0X0200 TRQAUTHD login events SYSLOG 0x0400 pass this event to the syslog as well
The listed events are shown here with hexidecimal values; however, a decimal value must be used when setting $logevent.
|
Example |
$logevent 1039 will log ERROR, SYSTEM, ADMIN, JOB and SYSLOG events. This has a hexidecimal value of 0x40F. |
$loglevel |
Format |
<INTEGER> |
Description |
Specifies the verbosity of logging with higher numbers specifying more verbose logging. Values may range between 0 and 7. |
Example |
$loglevel 4
|
$log_file_max_size |
Format |
<INTEGER> |
Description |
Soft limit for log file size in kilobytes. Checked every 5 minutes. If the log file is found to be greater than or equal to log_file_max_size the current log file will be moved from X to X.1 and a new empty file will be opened. |
Example |
$log_file_max_size = 100
|
$log_file_roll_depth |
Format |
<INTEGER> |
Description |
Specifies how many times a log fill will be rolled before it is deleted. |
Example |
$log_file_roll_depth = 7
|
$log_keep_days |
Format |
<INTEGER> |
Description |
Specifies how many days to keep log files. pbs_mom deletes log files older than the specified number of days. If not specified, pbs_mom won't delete log files based on their age. |
Example |
$log_keep_days 10
|
$max_conn_timeout_micro_sec |
Format |
<INTEGER> |
Default |
10000
|
Description |
Specifies how long (in microseconds) pbs_mom should wait for a connection to be made. Default value is 10,000 (.01 sec). |
Example |
$max_conn_timeout_micro_sec 30000
This sets the connection timeout on the MOM to .03 seconds.
|
$max_join_job_wait_time |
Format |
<INTEGER>
|
Default |
600
|
Description |
The interval to wait (in seconds) for jobs stuck in a prerun state before deleting them from the MOMs and requeueing them on the server. Default is 600 seconds (10 minutes).
If a MOM is completely idle, it can take as long as the next MOM-to-server update time to requeue a failed job.
|
Example |
$max_join_job_wait_time 300
|
$max_load
|
Format |
<FLOAT>
|
Description |
Maximum processor load.
|
Example |
$max_load 4.0
|
$max_physical_memory |
Format |
<INTEGER>
<unit> |
Description |
Restrict the amount of memory available to jobs on this node to the specified amount, which may not exceed the amount of memory on the machine and must be greater than 0. Default is to use all available memory on the host.
When cgroups are enabled, this limits the whole of the machine and doesn't specifically limit each socket or NUMA node. If you have 2 NUMA nodes and 32 GB of memory, but you limit the machine to 30, it won't force a job requesting 16 GB to span NUMA nodes, but once that jobs starts, there would only be 14 GB remaining in use for jobs.
If you are using this setting, availmem (as reported in pbsnodes) is no longer accurate, as we do not know what portion of used memory and swap are by jobs and what portion are from the operating system. Since availmem is no longer accurate, you need to set NODEAVAILABILITYPOLICY to DEDICATED if you are using Moab or Maui.
|
Example |
$max_physical_memory 30gb
|
$max_swap_memory |
Format |
<INTEGER>
<unit> |
Description |
Restrict the amount of swap available to jobs on this node to the specified amount, which may not exceed the amount of swap on the machine and must be greater than 0. If you wish to disallow swap, this must be set to a very low value instead of 0. Default is to use all available memory on the host.
If you are using this setting, availmem (as reported in pbsnodes) is no longer accurate, as we do not know what portion of used memory and swap are by jobs and what portion are from the operating system. Since availmem is no longer accurate, you need to set NODEAVAILABILITYPOLICY to DEDICATED if you are using Moab or Maui.
|
Example |
$max_swap_memory 5gb
|
$memory_pressure_duration |
Format |
<INTEGER> |
Description |
(Applicable in version 3.0 and later.) Memory pressure duration sets a limit to the number of times the value of memory_pressure_threshold can be exceeded before a process is terminated. This can only be used with $memory_pressure_threshold. |
Example |
$memory_pressure_duration 5
|
$memory_pressure_threshold |
Format |
<INTEGER> |
Description |
(Applicable in version 3.0 and later.) The memory_pressure of a cpuset provides a simple per-cpuset running average of the rate that the processes in a cpuset are attempting to free up in-use memory on the nodes of the cpuset to satisfy additional memory requests. The memory_pressure_threshold is an integer number used to compare against the reclaim rate provided by the memory_pressure file. If the threshold is exceeded and memory_pressure_duration is set, then the process terminates after exceeding the threshold by the number of times set in memory_pressure_duration. If memory_pressure duration is not set, then a warning is logged and the process continues. Memory_pressure_threshold is only valid with memory_pressure enabled in the root cpuset.
To enable, log in as the super user and execute the command echo 1 >> /dev/cpuset/memory_pressure_enabled. See the cpuset man page for more information concerning memory pressure.
|
Example |
$memory_pressure_threshold 1000
|
$mom_hierarchy_retry_time |
Format |
<SECONDS> |
Default |
90
|
Description |
Specifies the amount of time (in seconds) that a MOM waits to retry a node in the hierarchy path after a failed connection to that node. |
Example |
$mom_hierarchy_retry_time 30
|
$mom_host |
Format |
<STRING> |
Description |
Sets the local hostname as used by pbs_mom. |
Example |
$mom_host node42
|
$node_check_script |
Format |
<STRING> |
Description |
Specifies the fully qualified pathname of the health check script to run (see Compute Node Health Check for more information). |
Example |
$node_check_script /opt/batch_tools/nodecheck.pl
|
$node_check_interval |
Format |
<STRING> |
Description |
Specifies the number of MOM intervals between subsequent executions of the specified health check. This value defaults to 1 indicating the check is run every MOM interval (see Compute Node Health Check for more information).
$node_check_interval has two special strings that can be set:
- jobstart – makes the node health script run when a job is started (before the prologue script).
- jobend – makes the node health script run after each job has completed on a node (after the epilogue script).
The node health check may be configured to run before or after the job with the "jobstart" and/or "jobend" options. However, the job environment variables do not get passed to node health check script, so it has no access to those variables at any time.
|
Example |
$node_check_interval 5
|
$nodefile_suffix |
Format |
<STRING> |
Description |
Specifies the suffix to append to a host names to denote the data channel network adapter in a multi-homed compute node. |
Example |
$nodefile_suffix i
with the suffix of "i" and the control channel adapter with the name node01, the data channel would have a hostname of node01i.
|
$nospool_dir_list |
Format |
<STRING> |
Description |
If this is configured, the job's output is spooled in the working directory of the job or the specified output directory.
Specify the list in full paths, delimited by commas. If the job's working directory (or specified output directory) is in one of the paths in the list (or a subdirectory of one of the paths in the list), the job is spooled directly to the output location. $nospool_dir_list * is accepted.
The user that submits the job must have write permission on the folder where the job is written, and read permission on the folder where the file is spooled.
Alternatively, you can use the $spool_as_final_name parameter to force the job to spool directly to the final output.
This should generally be used only when the job can run on the same machine as where the output file goes, or if there is a shared filesystem. If not, this parameter can slow down the system or fail to create the output file.
|
Example |
$nospool_dir_list /home/mike/jobs/,/var/tmp/spool/
|
opsys |
Format |
<STRING> |
Description |
Specifies the operating system of the local machine. This information is used by the scheduler only. |
Example |
opsys RHEL3
|
$pbsclient |
Format |
<STRING> |
Description |
Specifies machines which the MOM daemon will trust to run resource manager commands via momctl. This may include machines where monitors, schedulers, or admins require the use of this command. |
Example |
$pbsclient node01.teracluster.org
|
$pbsserver |
Format |
<STRING> |
Description |
Specifies the machine running pbs_server.
This parameter replaces the deprecated parameter $clienthost.
|
Example |
$pbsserver node01.teracluster.org
|
$presetup_prologue |
Format |
<STRING> |
Description |
A full path to the presetup prologue for all jobs on this node. If set, this script executes before any setup for the job occurs (such as becoming the user, creating the output files, or changing directories). As a result, no output from this script will appear in the job's output.
|
Example |
$presetup_prologue /opt/kerberos_integration.sh
|
$prologalarm |
Format |
<INTEGER> |
Description |
Specifies maximum duration (in seconds) which the MOM will wait for the job prologue or job epilogue to complete. The default value is 300 seconds (5 minutes). When running parallel jobs, this is also the maximum time a sister node will wait for a job to start. |
Example |
$prologalarm 60
|
$rcpcmd |
Format |
<STRING> |
Description |
Specifies the full path and optional additional command line args to use to perform remote copies. |
Example |
mom_priv/config:
$rcpcmd /usr/local/bin/scp -i /etc/sshauth.dat
|
$remote_reconfig |
Format |
<STRING> |
Description |
Enables the ability to remotely reconfigure pbs_mom with a new config file. Default is disabled. This parameter accepts various forms of true, yes, and 1. For more information on how to reconfigure MOMs, see momctl-r. |
Example |
$remote_reconfig true
|
$remote_checkpoint_dirs |
Format |
<STRING> |
Description |
Specifies which server checkpoint directories are remotely
mounted. It tells the MOM which directories are shared with
the server. Using remote checkpoint directories eliminates the
need to copy the checkpoint files back and forth between the
MOM and the server. All entries must be on the same line, separated by a space. |
Example |
$remote_checkpoint_dirs /checkpointFiles /bigStorage /fast
This informs the MOM that the /checkpointFiles, /bigStorage, and /fast directories are remotely mounted checkpoint directories.
|
$reduce_prolog_checks |
Format |
<BOOLEAN> |
Description |
If enabled, Torque will only check if the file is a regular file and is executable, instead of the normal checks listed on the prologue and epilogue page. Default is FALSE. |
Example |
$reduce_prolog_checks true
|
$reject_job_submission
|
Format |
<BOOLEAN>
|
Description |
If set to TRUE, jobs will be rejected and the user will receive the message, "Jobs cannot be run on mom %s." Default is FALSE.
|
Example |
$reject_job_submission true
|
$resend_join_job_wait_time |
Format |
<INTEGER> |
Description |
This is the timeout for the Mother Superior to re-send the join job request if it didn't get a reply from all the sister MOMs. The resend happens only once. Default is 5 minutes. |
Example |
$resend_join_job_wait_time 120
|
$restricted |
Format |
<STRING> |
Description |
Specifies hosts which can be trusted to access MOM services as non-root. By default, no hosts are trusted to access MOM services as non-root. |
Example |
$restricted *.teracluster.org
|
size[fs=<FS>] |
Format |
N/A |
Description |
Specifies that the available and configured disk space in the <FS> filesystem is to be reported to the pbs_server and scheduler.
To request disk space on a per job basis, specify the file resource as in qsub -l nodes=1,file=1000kb.
Unlike most MOM config options, the size parameter is not preceded by a "$" character.
|
Example |
size[fs=/localscratch]
The available and configured disk space in the /localscratch filesystem will be reported.
|
$source_login_batch |
Format |
<BOOLEAN> |
Description |
Specifies whether or not MOM will source the /etc/profile, etc. type files for batch jobs. Parameter accepts various forms of true, false, yes, no, 1 and 0. Default is TRUE. This parameter is in version 2.3.1 and later. |
Example |
$source_login_batch False
MOM will bypass the sourcing of /etc/profile, etc. type files.
|
$source_login_interactive |
Format |
<BOOLEAN> |
Description |
Specifies whether or not MOM will source the /etc/profile, etc. type files for interactive jobs. Parameter accepts various forms of true, false, yes, no, 1 and 0. Default is TRUE. This parameter is in version 2.3.1 and later. |
Example |
$source_login_interactive False
MOM will bypass the sourcing of /etc/profile, etc. type files.
|
$spool_as_final_name |
Format |
<BOOLEAN> |
Description |
This makes the job write directly to its output destination instead of a spool directory. This allows users easier access to the file if they want to watch the jobs output as it runs. |
Example |
$spool_as_final_name true
|
$status_update_time |
Format |
<INTEGER> |
Description |
Specifies the number of seconds between subsequent MOM-to-server update reports. Default is 45 seconds. |
Example |
status_update_time:
$status_update_time 120
MOM will send server update reports every 120 seconds.
|
$thread_unlink_calls |
Format |
<BOOLEAN> |
Description |
Threads calls to unlink when deleting a job. Default is false. If it is set to TRUE, pbs_mom will use a thread to delete the job's files. |
Example |
thread_unlink_calls:
$thread_unlink_calls true
|
$timeout |
Format |
<INTEGER> |
Description |
Specifies the number of seconds before a TCP connection on the MOM will timeout. Default is 300 seconds.
|
Example |
$timeout 120
A TCP connection will wait up to 120 seconds before timing out.
For 3.x and earlier, MOM-to-MOM communication will allow up to 120 seconds before timing out.
|
$tmpdir |
Format |
<STRING> |
Description |
Specifies a directory to create job-specific scratch space. |
Example |
$tmpdir /localscratch
|
$usecp |
Format |
<HOST>:<SRCDIR> <DSTDIR> |
Description |
Specifies which directories should be staged (see NFS and Other Networked Filesystems) |
Example |
$usecp *.fte.com:/data /usr/local/data
|
$use_smt |
Format |
<BOOLEAN> |
Default |
TRUE
|
Description |
Indicates that the user would like to use SMT. If set, each logical core inside of a physical core will be used as a normal core for cpusets. This parameter is on by default.
$use_smt is deprecated. Please use the -L NUMA Resource Request syntax to control whether or not threads or cores are used.
If you use SMT, you will need to set the np attribute so that each logical processor is counted.
|
Example |
$use_smt false
|
$varattr |
Format |
<INTEGER> <STRING> |
Description |
Provides a way to keep track of dynamic attributes on nodes.
<INTEGER> is how many seconds should go by between calls to the script to update the dynamic values. If set to -1, the script is read only one time.
<STRING> is the script path. This script should check for whatever dynamic attributes are desired, and then output lines in this format:
name=value
Include any arguments after the script's full path. These features are visible in the output of pbsnodes-a
varattr=Matlab=7.1;Octave=1.0.
For information about using $varattr to request dynamic features in Moab, see REQATTR in the Moab Workload Manager Administrator Guide.
|
Example |
$varattr 25 /usr/local/scripts/nodeProperties.pl arg1 arg2 arg3
|
$wallmult |
Format |
<FLOAT> |
Description |
Sets a factor to adjust walltime usage by multiplying a default job time to a common reference system. It modifies real walltime on a per-MOM basis (MOM configuration parameters). The factor is used for walltime calculations and limits in the same way that cputmult is used for cpu time.
If set to 0.0, MOM level walltime enforcement is disabled.
|
Example |
$wallmult 2.2
|
$xauthpath |
Format |
<STRING> |
Description |
Specifies the path to the xauth binary to enable X11 forwarding.
|
Example |
$xauthpath /opt/bin/xauth
|
Related Topics