Appendix B: Server parameters
TORQUE server parameters are specified using the qmgr command. The set subcommand is used to modify the server object. For example:
> qmgr -c 'set server default_queue=batch'
Parameters
acl_hosts |
Format |
<HOST>[,<HOST>]... or <HOST>[range] or <HOST*> where the asterisk (*) can appear anywhere in the host name |
Default |
(Only the host running pbs_server may submit jobs.) |
Description |
Specifies a list of hosts from which jobs may be submitted. Hosts in the server nodes file located at $TORQUE/server_priv/nodes cannot be added to the list using the acl_hosts parameter (see Server node file configuration). To submit batch or interactive jobs (see Server configuration) through hosts that are specified in the server nodes file, use the submit_hosts parameter.
Qmgr: set queue batch acl_hosts = "hostA,hostB" Qmgr: set queue batch acl_hosts += "hostE,hostF,hostG"
In version 2.5 and later, the wildcard (*) character can appear anywhere in the host name, and ranges are supported; these specifications also work for managers and operators.
Qmgr: set server acl_hosts = "galaxy*.tom.org" Qmgr: set server acl_hosts += "galaxy[0-50].tom.org" Qmgr: set server managers+=tom@galaxy[0-50].tom.org
|
acl_host_enable |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, specifies that the acl_hosts value is enabled. |
acl_logic_or |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE,the user and group queue ACL's are logically OR'd. When set to FALSE, they are AND'd. |
allow_node_submit |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, specifies that users can submit jobs directly from any trusted compute host directly or from within batch jobs (see Configuring job submission hosts).
When you enable allow_node_submit, you must also enable the allow_proxy_user parameter to allow user proxying when submitting and running jobs.
|
allow_proxy_user |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, specifies that users can proxy from one user to another. Proxy requests will be either validated by ruserok() or by the scheduler (see Job submission). |
auto_node_np |
Format |
<BOOLEAN> |
Default |
DISABLED
|
Description |
When set to TRUE, automatically configures a node's np (number of processors) value based on the ncpus value from the status update. Requires full manager privilege to set or alter. |
automatic_requeue_exit_code |
Format |
<LONG> |
Default |
--- |
Description |
This is an exit code, defined by the admin, that tells pbs_server to requeue the job instead of considering it as completed. This allows the user to add some additional checks that the job can run meaningfully, and if not, then the job script exits with the specified code to be requeued. |
checkpoint_defaults |
Format |
<STRING> |
Default |
--- |
Description |
Specifies for a queue the default checkpoint values for a job that does not have checkpointing specified. The checkpoint_defaults parameter only takes effect on execution queues.
set queue batch checkpoint_defaults="enabled, periodic, interval=5"
|
clone_batch_delay |
Format |
<INTEGER> |
Default |
1 |
Description |
Specifies the delay (in seconds) between clone batches (see clone_batch_size). |
clone_batch_size |
Format |
<INTEGER> |
Default |
256 |
Description |
Job arrays are created in batches of size X. X jobs are created, and after the clone_batch_delay, X more are created. This repeats until all are created. |
cray_enabled |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, specifies that this instance of pbs_server has Cray hardware that reports to it. See Installation Notes for Moab and TORQUE for Cray in the Moab Workload Manager documentation. |
default_queue |
Format |
<STRING> |
Default |
--- |
Description |
Indicates the queue to assign to a job if no queue is explicitly specified by the submitter. |
disable_server_id_check |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, makes it so the user for the job doesn't have to exist on the server. The user must still exist on all the compute nodes or the job will fail when it tries to execute.
If you have disable_server_id_check set to TRUE, a user could request a group to which they do not belong. Setting VALIDATEGROUP to TRUE in the torque.cfg file prevents such a scenario (see "torque.cfg" configuration file).
|
display_job_server_suffix |
Format |
<BOOLEAN> |
Default |
TRUE
|
Description |
When set to TRUE, TORQUE will display both the job ID and the host name. When set to FALSE, only the job ID will be displayed.
If set to FALSE, the environment variable NO_SERVER_SUFFIX must be set to TRUE for pbs_track to work as expected.
|
interactive_jobs_can_roam |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
By default, interactive jobs run from the login node that they submitted from. When TRUE, interactive jobs may run on login nodes other than the one where the jobs were submitted to. See "Installation Notes for Moab and TORQUE for Cray" in the Moab Workload Manager Administrator Guide for more information. |
job_force_cancel_time |
Format |
<INTEGER> |
Default |
Disabled |
Description |
If a job has been deleted and is still in the system after x seconds, the job will be purged from the system. This is mostly useful when a job is running on a large number of nodes and one node goes down. The job cannot be deleted because the MOM cannot be contacted. The qdel fails and none of the other nodes can be reused. This parameter can used to remedy such situations. |
job_log_file_max_size |
Format |
<INTEGER> |
Default |
--- |
Description |
This specifies a soft limit (in kilobytes) for the job log's maximum size. The file size is checked every five minutes and if the current day file size is greater than or equal to this value, it is rolled from <filename> to <filename.1> and a new empty log is opened. If the current day file size exceeds the maximum size a second time, the <filename.1> log file is rolled to <filename.2>, the current log is rolled to <filename.1>, and a new empty log is opened. Each new log causes all other logs to roll to an extension that is one greater than its current number. Any value less than 0 is ignored by pbs_server (meaning the log will not be rolled). |
job_log_file_roll_depth |
Format |
<INTEGER> |
Default |
--- |
Description |
This sets the maximum number of new log files that are kept in a day if the job_log_file_max_size parameter is set. For example, if the roll depth is set to 3, no file can roll higher than <filename.3>. If a file is already at the specified depth, such as <filename.3>, the file is deleted so it can be replaced by the incoming file roll, <filename.2>. |
job_log_keep_days |
Format |
<INTEGER> |
Default |
--- |
Description |
This maintains logs for the number of days designated. If set to 4, any log file older than 4 days old is deleted. |
job_nanny |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, enables the experimental "job deletion nanny" feature. All job cancels will create a repeating task that will resend KILL signals if the initial job cancel failed. Further job cancels will be rejected with the message "job cancel in progress." This is useful for temporary failures with a job's execution node during a job delete request. |
job_stat_rate |
Format |
<INTEGER> |
Default |
45 (30 in TORQUE 1.2.0p5 and earlier) |
Description |
Specifies the maximum age of MOM level job data which is allowed when servicing a qstat request. If data is older than this value, the pbs_server daemon will contact the MOMs with stale data to request an update.
For large systems, this value should be increased to 5 minutes or higher.
|
job_start_timeout |
Format |
<INTEGER> |
Default |
--- |
Description |
Specifies the pbs_server to pbs_mom TCP socket timeout in seconds that is used when the pbs_server sends a job start to the pbs_mom. It is useful when the MOM has extra overhead involved in starting jobs. If not specified, then the tcp_timeout parameter is used. |
job_sync_timeout |
Format |
<INTEGER> |
Default |
60 |
Description |
When a stray job is reported on multiple nodes, the server sends a kill signal to one node at a time. This timeout determines how long the server waits between kills if the job is still being reported on any nodes. |
keep_completed |
Format |
<INTEGER> |
Default |
---
If you ran torque.setup on TORQUE installation, the default is 300.
|
Description |
The amount of time a job will be kept in the queue after it has entered the completed state. keep_completed must be set for job dependencies to work.
For more information, see Keeping completed jobs.
|
lock_file |
Format |
<STRING> |
Default |
torque/server_priv/server.lock
|
Description |
Specifies the name and location of the lock file used to determine which high availability server should be active.
If a full path is specified, it is used verbatim by TORQUE. If a relative path is specified, TORQUE will prefix it with torque/server_priv.
|
lock_file_update_time |
Format |
<INTEGER> |
Default |
3 |
Description |
Specifies how often (in seconds) the thread will update the lock file. |
lock_file_check_time |
Format |
<INTEGER> |
Default |
9 |
Description |
Specifies how often (in seconds) a high availability server will check to see if it should become active. |
log_events |
Format |
Bitmap |
Default |
--- |
Description |
By default, all events are logged. However, you can customize things so that only certain events show up in the log file. These are the bitmaps for the different kinds of logs:
#define PBSEVENT_ERROR 0x0001 /* internal errors */
#define PBSEVENT_SYSTEM 0x0002 /* system (server) events */
#define PBSEVENT_ADMIN 0x0004 /* admin events */
#define PBSEVENT_JOB 0x0008 /* job related events */
#define PBSEVENT_JOB_USAGE 0x0010 /* End of Job accounting */
#define PBSEVENT_SECURITY 0x0020 /* security violation events */
#define PBSEVENT_SCHED 0x0040 /* scheduler events */
#define PBSEVENT_DEBUG 0x0080 /* common debug messages */
#define PBSEVENT_DEBUG2 0x0100 /* less needed debug messages */
#define PBSEVENT_FORCE 0x8000 /* set to force a message */
If you want to log only error, system, and job information, use qmgr to set log_events to 11:
set server log_events = 11
|
log_file_max_size |
Format |
<INTEGER> |
Default |
0 |
Description |
Specifies a soft limit, in kilobytes, for the server's log file. The file size is checked every 5 minutes, and if the current day file size is greater than or equal to this value then it will be rolled from X to X.1 and a new empty log will be opened. Any value less than or equal to 0 will be ignored by pbs_server (the log will not be rolled). |
log_file_roll_depth |
Format |
<INTEGER> |
Default |
1 |
Description |
Controls how deep the current day log files will be rolled, if log_file_max_size is set, before they are deleted. |
log_keep_days |
Format |
<INTEGER> |
Default |
0 |
Description |
Specifies how long (in days) a server or MOM log should be kept. |
log_level |
Format |
<INTEGER> |
Default |
0 |
Description |
Specifies the pbs_server logging verbosity. Maximum value is 7. |
mail_body_fmt |
Format |
A printf-like format string |
Default |
PBS Job Id: %i Job Name: %j Exec host: %h %m %d |
Description |
Override the default format for the body of outgoing mail messages. A number of printf-like format specifiers and escape sequences can be used:
\n new line \t tab \\ backslash \' single quote \" double quote %d details concerning the message %h PBS host name %i PBS job identifier %j PBS job name %m long reason for message %r short reason for message %% a single %
|
mail_domain |
Format |
<STRING> |
Default |
--- |
Description |
Override the default domain for outgoing mail messages. If set, emails will be addressed to <user>@<hostdomain>. If unset, the job's Job_Owner attribute will be used. If set to never, TORQUE will never send emails. |
mail_from |
Format |
<STRING> |
Default |
adm |
Description |
Specify the name of the sender whenTORQUEsends emails. |
mail_subject_fmt |
Format |
A printf-like format string |
Default |
PBS JOB %i |
Description |
Override the default format for the subject of outgoing mail messages. A number of printf-like format specifiers and escape sequences can be used:
\n new line \t tab \\ backslash \' single quote \" double quote %d details concerning the message %h PBS host name %i PBS job identifier %j PBS job name %m long reason for message %r short reason for message %% a single %
|
managers |
Format |
<user>@<host.sub.domain>[,<user>@<host.sub.domain>...]
|
Default |
root@localhost |
Description |
List of users granted batch administrator privileges. The host, sub-domain, or domain name may be wildcarded by the use of an asterisk character (*). Requires full manager privilege to set or alter. |
max_job_array_size |
Format |
<INTEGER> |
Default |
Unlimited |
Description |
Sets the maximum number of jobs that can be in a single job array. |
max_slot_limit |
Format |
<INTEGER> |
Default |
Unlimited |
Description |
This is the maximum number of jobs that can run concurrently in any job array. Slot limits can be applied at submission time with qsub, or it can be modified with qalter.
qmgr -c 'set server max_slot_limit=10'
No array can request a slot limit greater than 10. Any array that does not request a slot limit receives a slot limit of 10. Using the example above, slot requests greater than 10 are rejected with the message: "Requested slot limit is too large, limit is 10."
|
max_threads |
Format |
<INTEGER> |
Default |
The value of min_threads ((2 * the number of procs listed in /proc/cpuinfo) + 1) * 10 |
Description |
This is the maximum number of threads that should exist in the thread pool at any time. |
max_user_queuable |
Format |
<INTEGER> |
Default |
Unlimited |
Description |
When set, max_user_queuable places a system-wide limit on the amount of jobs that an individual user can queue.
qmgr -c 'set server max_user_queuable=500'
|
min_threads |
Format |
<INTEGER> |
Default |
(2 * the number of procs listed in /proc/cpuinfo) + 1. If TORQUE is unable to read /proc/cpuinfo, the default is 10. |
Description |
This is the minimum number of threads that should exist in the thread pool at any time. |
moab_array_compatible |
Format |
<BOOLEAN> |
Default |
TRUE |
Description |
This parameter places a hold on jobs that exceed the slot limit in a job array. When one of the active jobs is completed or deleted, one of the held jobs goes to a queued state. |
mom_job_sync |
Format |
<BOOLEAN> |
Default |
TRUE
|
Description |
When set to TRUE, specifies that the pbs_server will synchronize its view of the job queue and resource allocation with compute nodes as they come online. If a job exists on a compute node, it will be automatically cleaned up and purged. (Enabled by default in TORQUE 2.2.0 and higher.)
Jobs that are no longer reported by the mother superior are automatically purged by pbs_server.
Jobs that pbs_server instructs the MOM to cancel have their processes killed in addition to being deleted (instead of leaving them running as in versions of TORQUE prior to 4.1.1).
|
next_job_number |
Format |
<INTEGER> |
Default |
--- |
Description |
Specifies the ID number of the next job. If you set your job number too low and TORQUE repeats a job number that it has already used, the job will fail. Before setting next_job_number to a number lower than any number that TORQUE has already used, you must clear out your .e and .o files.
If you use Moab Workload Manager and have configured it to synchronize job IDs with TORQUE (See Synchronizing Job IDs in TORQUE and Moab in the Moab Workload Manager Administrator Guide for more information.), then Moab will generate the job ID and next_job_number will have no effect on the job ID.
|
node_check_rate |
Format |
<INTEGER> |
Default |
600 |
Description |
Specifies the minimum duration (in seconds) that a node can be unresponsive to server queries before being marked down by the pbs_server daemon. |
node_pack |
Format |
<BOOLEAN> |
Default |
--- |
Description |
Controls how multiple processor nodes are allocated to jobs. If this attribute is set to TRUE, jobs will be assigned to the multiple processor nodes with the fewest free processors. This packs jobs into the fewest possible nodes leaving multiple processor nodes free for jobs which need many processors on a node. If set to false, jobs will be scattered across nodes reducing conflicts over memory between jobs. If unset, the jobs are packed on nodes in the order that the nodes are declared to the server (in the nodes file). Default value: unset - assigned to nodes as nodes in order that were declared. |
node_ping_rate |
Format |
<INTEGER> |
Default |
300 |
Description |
Specifies the maximum interval (in seconds) between successive "pings" sent from the pbs_server daemon to the pbs_mom daemon to determine node/daemon health. |
no_mail_force |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, eliminates all e-mails when mail_options (see qsub) is set to "n". The job owner won't receive e-mails when a job is deleted by a different user or a job failure occurs. If no_mail_force is unset or is FALSE, then the job owner receives e-mails when a job is deleted by a different user or a job failure occurs. |
np_default |
Format |
<INTEGER> |
Default |
--- |
Description |
Allows the administrator to unify the number of processors (np) on all nodes. The value can be dynamically changed. A value of 0 tells pbs_server to use the value of np found in the nodes file. The maximum value is 32767. |
operators |
Format |
<user>@<host.sub.domain>[,<user>@<host.sub.domain>...]
|
Default |
root@localhost |
Description |
List of users granted batch operator privileges. Requires full manager privilege to set or alter. |
poll_jobs |
Format |
<BOOLEAN> |
Default |
TRUE (FALSE in TORQUE 1.2.0p5 and earlier) |
Description |
If set to TRUE, pbs_server will poll job info from MOMs over time and will not block on handling requests which require this job information. If set to FALSE, no polling will occur and if requested job information is stale, pbs_server may block while it attempts to update this information. For large systems, this value should be set to TRUE. |
query_other_jobs |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
When set to TRUE, specifies whether or not non-admin users may view jobs they do not own. |
record_job_info |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
This must be set to TRUE in order for job logging to be enabled. |
record_job_script |
Format |
<BOOLEAN> |
Default |
FALSE
|
Description |
If set to TRUE, this adds the contents of the script executed by a job to the log.
|
resources_available |
Format |
<STRING> |
Default |
--- |
Description |
Allows overriding of detected resource quantity limits (see Assigning queue resource limits). pbs_server must be restarted for changes to take effect. Also, resources_available is constrained by the smallest of queue.resources_available and the server.resources_available. |
scheduling |
Format |
<BOOLEAN> |
Default |
--- |
Description |
Allows pbs_server to be scheduled. When FALSE, pbs_server is a resource manager that works on its own. When TRUE, TORQUE allows a scheduler, such as Moab or Maui, to dictate what pbs_server should do. |
submit_hosts |
Format |
"<HOSTNAME>[,<HOSTNAME>]..." |
Default |
--- |
Description |
Indicates which hosts included in the server nodes file located at $TORQUE/server_priv/nodes (see Server node file configuration) can submit batch or interactive jobs (see Configuring job submission hosts). For more information on adding hosts that are not included in the first nodes file, see the acl_hosts parameter. |
tcp_timeout |
Format |
<INTEGER> |
Default |
300 |
Description |
Specifies the timeout for idle TCP connections. If no communication is received by the server on the connection after the timeout, the server closes the connection. There is an exception for connections made to the server on port 15001 (default); timeout events are ignored on the server for such connections established by a client utility or scheduler. Responsibility rests with the client to close the connection first (See Large cluster considerations for additional information.).
If you use Moab Workload Manager, prevent communication errors by giving tcp_timeout at least twice the value of the Moab RMPOLLINTERVAL.
|
thread_idle_seconds |
Format |
<INTEGER> |
Default |
300 |
Description |
This is the number of seconds a thread can be idle in the thread pool before it is deleted. If threads should not be deleted, set to -1. TORQUE will always maintain at least min_threads number of threads, even if all are idle. |