(Click to open topic with navigation)
Torque server parameters are specified using the qmgr command. The set subcommand is used to modify the server object. For example:
> qmgr -c 'set server default_queue=batch'
acl_host_enable | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | When set to TRUE, hosts not in the pbs_server nodes file must be added to the acl_hosts list in order to get access to pbs_server. |
acl_logic_or | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | When set to TRUE, the user and group queue ACLs are logically OR'd. When set to FALSE, they are AND'd. |
allow_node_submit | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description |
When set to TRUE, allows all hosts in the PBSHOME/server_priv/nodes file (MOM nodes) to submit jobs to pbs_server. To only allow qsub from a subset of all MOMs, use submit_hosts. |
allow_proxy_user | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | When set to TRUE, specifies that users can proxy from one user to another. Proxy requests will be either validated by ruserok() or by the scheduler (see Job Submission). |
cgroup_per_task | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description |
When set to FALSE, jobs submitted with the -L syntax will have one cgroup created per host unless they specify otherwise at submission time. This behavior is similar to the pre-6.0 cpuset implementation. When set to TRUE, jobs submitted with the -L syntax will have one cgroup created per task unless they specify otherwise at submission time. Some MPI implementations are not compatible with using one cgroup per task.
See -L NUMA Resource Request for more information. |
clone_batch_delay | |
---|---|
Format | <INTEGER> |
Default | 1 |
Description | Specifies the delay (in seconds) between clone batches (see clone_batch_size). |
clone_batch_size | |
---|---|
Format | <INTEGER> |
Default | 256 |
Description | Job arrays are created in batches of size X. X jobs are created, and after the clone_batch_delay, X more are created. This repeats until all are created. |
cray_enabled | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | When set to TRUE, specifies that this instance of pbs_server has Cray hardware that reports to it. See Installation Notes for Moab and Torque for Cray in the Moab Workload Manager Administrator Guide. |
default_queue | |
Format | <STRING> |
Default | --- |
Description | Indicates the queue to assign to a job if no queue is explicitly specified by the submitter. |
disable_server_id_check | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description |
When set to TRUE, makes it so the user for the job doesn't have to exist on the server. The user must still exist on all the compute nodes or the job will fail when it tries to execute. If you have disable_server_id_check set to TRUE, a user could request a group to which they do not belong. Setting VALIDATEGROUP to TRUE in the torque.cfg file prevents such a scenario (see "torque.cfg" Configuration File). |
dont_write_nodes_file | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description |
When set to TRUE, the nodes file cannot be overwritten for any reason; qmgr commands to edit nodes will be rejected. |
down_on_error | |
---|---|
Format | <BOOLEAN> |
Default | TRUE |
Description |
When set to TRUE, nodes that report an error from their node health check to pbs_server will be marked down and unavailable to run jobs. |
idle_slot_limit | |
---|---|
Format | <INTEGER> |
Default | 300 |
Description |
Sets a default idle slot limit that will be applied to all arrays submitted after it is set. The idle slot limit is the maximum number of sub jobs from an array that will be instantiated at once. For example, if this is set to 2, and an array with 1000 sub jobs is submitted, then only two will ever be idle (queued) at a time. Whenever an idle sub job runs or is deleted, then a new sub job will be instantiated until the array no longer has remaining sub jobs. If this parameter is set, and user during job submission (using qsub -i) requests an idle slot limit that exceeds this setting, that array will be rejected. See also the qsub -i option. |
Example |
qmgr -c 'set server idle_slot_limit = 50' |
interactive_jobs_can_roam | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | By default, interactive jobs run from the login node that they submitted from. When TRUE, interactive jobs may run on login nodes other than the one where the jobs were submitted from. See Installation Notes for Moab and Torque for Cray in the Moab Workload Manager Administrator Guide.
With interactive_jobs_can_roam enabled, jobs will only go to nodes with the alps_login property set in the nodes file. |
job_log_file_roll_depth | |
---|---|
Format | <INTEGER> |
Default | --- |
Description | This sets the maximum number of new log files that are kept in a day if the job_log_file_max_size parameter is set. For example, if the roll depth is set to 3, no file can roll higher than <filename.3>. If a file is already at the specified depth, such as <filename.3>, the file is deleted so it can be replaced by the incoming file roll, <filename.2>. |
job_log_keep_days | |
---|---|
Format | <INTEGER> |
Default | --- |
Description | This maintains logs for the number of days designated. If set to 4, any log file older than 4 days old is deleted. |
job_start_timeout | |
---|---|
Format | <INTEGER> |
Default | --- |
Description | Specifies the pbs_server to pbs_mom TCP socket timeout in seconds that is used when the pbs_server sends a job start to the pbs_mom. It is useful when the MOM has extra overhead involved in starting jobs. If not specified, then the tcp_timeout parameter is used. |
keep_completed | |
---|---|
Format | <INTEGER> |
Default |
300 |
Description |
The amount of time (in seconds) a job will be kept in the queue after it has entered the completed state. keep_completed must be set for job dependencies to work. For more information, see Keeping Completed Jobs. |
kill_delay | |
---|---|
Format | <INTEGER> |
Default |
If using qdel, 2 seconds If using qrerun, 0 (no wait) |
Description |
Specifies the number of seconds between sending a SIGTERM and a SIGKILL to a job you want to cancel. It is possible that the job script, and any child processes it spawns, can receive several SIGTERM signals before the SIGKILL signal is received. All MOMs must be configured with $exec_with_exec true in order for kill_delay to work, even when relying on default kill_delay settings. If kill_delay is set for a queue, the queue setting overrides the server setting. See kill_delay in 5.645 Queue Attributes. |
Example |
qmgr -c "set server kill_delay=30" |
lock_file_update_time | |
---|---|
Format | <INTEGER> |
Default | 3 |
Description | Specifies how often (in seconds) the thread will update the lock file. |
lock_file_check_time | |
---|---|
Format | <INTEGER> |
Default | 9 |
Description | Specifies how often (in seconds) a high availability server will check to see if it should become active. |
log_file_roll_depth | |
---|---|
Format | <INTEGER> |
Default | 1 |
Description | Controls how deep the current day log files will be rolled, if log_file_max_size is set, before they are deleted. |
log_keep_days | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Specifies how long (in days) a server or MOM log should be kept. |
log_level | |
---|---|
Format | <INTEGER> |
Default | 0 |
Description | Specifies the pbs_server logging verbosity. Maximum value is 7. |
mail_from | |
---|---|
Format | <STRING> |
Default | adm |
Description | Specify the name of the sender whenTorquesends emails. |
max_job_array_size | |
---|---|
Format | <INTEGER> |
Default | Unlimited |
Description | Sets the maximum number of jobs that can be in a single job array. |
max_slot_limit | |
---|---|
Format | <INTEGER> |
Default | Unlimited |
Description |
This is the maximum number of jobs that can run concurrently in any job array. Slot limits can be applied at submission time with qsub, or it can be modified with qalter. qmgr -c 'set server max_slot_limit=10' No array can request a slot limit greater than 10. Any array that does not request a slot limit receives a slot limit of 10. Using the example above, slot requests greater than 10 are rejected with the message: "Requested slot limit is too large, limit is 10." |
max_user_run | |
---|---|
Format | <INTEGER> |
Default | unlimited |
Description |
This limits the maximum number of jobs a user can have running for the given server. |
Example |
qmgr -c "set server max_user_run=5"
|
max_threads | |
---|---|
Format | <INTEGER> |
Default | The value of min_threads ((2 * the number of procs listed in /proc/cpuinfo) + 1) * 20 |
Description | This is the maximum number of threads that should exist in the thread pool at any time. See Setting min_threads and max_threads for more information. |
min_threads | |
---|---|
Format | <INTEGER> |
Default | (2 * the number of procs listed in /proc/cpuinfo) + 1. If Torque is unable to read /proc/cpuinfo, the default is 10. |
Description | This is the minimum number of threads that should exist in the thread pool at any time. See Setting min_threads and max_threads for more information. |
moab_array_compatible | |
---|---|
Format | <BOOLEAN> |
Default | TRUE |
Description | This parameter places a hold on jobs that exceed the slot limit in a job array. When one of the active jobs is completed or deleted, one of the held jobs goes to a queued state. |
next_job_number | |
---|---|
Format | <INTEGER> |
Default | --- |
Description |
Specifies the ID number of the next job. If you set your job number too low and Torque repeats a job number that it has already used, the job will fail. Before setting next_job_number to a number lower than any number that Torque has already used, you must clear out your .e and .o files. If you use Moab Workload Manager (and have configured it to synchronize job IDs with Torque), then Moab will generate the job ID and next_job_number will have no effect on the job ID. See Resource Manager Configuration in the Moab Workload Manager Administrator Guide for more information. |
node_check_rate | |
---|---|
Format | <INTEGER> |
Default | 600 |
Description | Specifies the minimum duration (in seconds) that a node can fail to send a status update before being marked down by the pbs_server daemon. |
node_pack | |
---|---|
Description | This is deprecated. |
node_submit_exceptions | |
---|---|
Format | String |
Default | --- |
Description | When set in conjunction with allow_node_submit, these nodes will not be allowed to submit jobs. |
no_mail_force | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | When set to TRUE, eliminates all e-mails when mail_options (see qsub) is set to "n". The job owner won't receive e-mails when a job is deleted by a different user or a job failure occurs. If no_mail_force is unset or is FALSE, then the job owner receives e-mails when a job is deleted by a different user or a job failure occurs. |
pass_cpuclock | |
---|---|
Format | <BOOLEAN> |
Default | TRUE |
Description |
If set to TRUE, the pbs_server daemon passes the option and its value to the pbs_mom daemons for direct implementation by the daemons, making the CPU frequency adjustable as part of a resource request by a job submission. If set to FALSE, the pbs_server daemon creates and passes a PBS_CPUCLOCK job environment variable to the pbs_mom daemons that contains the value of the cpuclock attribute used as part of a resource request by a job submission. The CPU frequencies on the MOMs are not adjusted. The environment variable is for use by prologue and epilogue scripts, enabling administrators to log and research when users are making cpuclock requests, as well as researchers and developers to perform CPU clock frequency changes using a method outside of that employed by the Torque pbs_mom daemons. |
query_other_jobs | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | When set to TRUE, specifies whether or not non-admin users may view jobs they do not own. |
record_job_info | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | This must be set to TRUE in order for job logging to be enabled. |
record_job_script | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description |
If set to TRUE, this adds the contents of the script executed by a job to the log. For record_job_script to take effect, record_job_info must be set to TRUE. |
resources_available | |
---|---|
Format | <STRING> |
Default | --- |
Description | Allows overriding of detected resource quantities (see Assigning Queue Resource Limits). pbs_server must be restarted for changes to take effect. Also, resources_available is constrained by the smallest of queue.resources_available and the server.resources_available. |
submit_hosts | |
---|---|
Format | <HOSTNAME>[,<HOSTNAME>]... |
Default | Not set. |
Description |
Hosts in this list are able to submit jobs. This applies to any node whether within the cluster or outside of the cluster. If acl_host_enable is set to TRUE and the host is not in the PBSHOME/server_priv/nodes file, then the host must also be in the acl_hosts list. To allow qsub from all compute nodes instead of just a subset of nodes, use allow_node_submit. |
tcp_incoming_timeout | |
---|---|
Format | <INTEGER> |
Default | 600 |
Description |
Specifies the timeout for incoming TCP connections to pbs_server. Functions exactly the same as tcp_timeout, but governs incoming connections while tcp_timeout governs only outgoing connections (or connections initiated by pbs_server). If you use Moab Workload Manager, prevent communication errors by giving tcp_incoming_timeout at least twice the value of the Moab RMPOLLINTERVAL. See RMPOLLINTERVAL for more information. |
tcp_timeout | |
---|---|
Format | <INTEGER> |
Default | 300 |
Description |
Specifies the timeout for idle outbound TCP connections. If no communication is received by the server on the connection after the timeout, the server closes the connection. There is an exception for connections made to the server on port 15001 (default); timeout events are ignored on the server for such connections established by a client utility or scheduler. Responsibility rests with the client to close the connection first (See Large Cluster Considerations for additional information.).
Use tcp_incoming_timeout to specify the timeout for idle inbound TCP connections. |
thread_idle_seconds | |
---|---|
Format | <INTEGER> |
Default | 300 |
Description | This is the number of seconds a thread can be idle in the thread pool before it is deleted. If threads should not be deleted, set to -1. Torque will always maintain at least min_threads number of threads, even if all are idle. |
timeout_for_job_delete | |
---|---|
Format | <INTEGER> (seconds) |
Default | 120 |
Description | The specific timeout used when deleting jobs because the node they are executing on is being deleted. |
timeout_for_job_requeue | |
---|---|
Format | <INTEGER> (seconds) |
Default | 120 |
Description | The specific timeout used when requeuing jobs because the node they are executing on is being deleted. |