You can now run TORQUE in a redundant or high availability mode. This means that there can be multiple instances of the server running and waiting to take over processing in the event that the currently running server fails.
The high availability feature is available in the 2.3 and later versions of TORQUE. TORQUE 2.4 includes several enhancements to high availability (see Enhanced high availability).
For more details, see these sections:
Redundant server host machines
High availability enables TORQUE to continue running even if pbs_server is brought down. This is done by running multiple copies of pbs_server which have their torque/server_priv directory mounted on a shared file system. The torque/server_name must include the host names of all nodes that run pbs_server. All MOM nodes also must include the host names of all nodes running pbs_server in their torque/server_name file. The syntax of the torque/server_name is a comma delimited list of host names.
For example:
host1,host2,host3
When configuring high availability, do not use $pbsserver to specify the host names. You must use the $TORQUEHOMEDIR/server_name file.
All instances of pbs_server need to be started with the --ha command line option that allows the servers to run at the same time. Only the first server to start will complete the full startup. The second server to start will block very early in the startup when it tries to lock the file torque/server_priv/server.lock. When the second server cannot obtain the lock, it will spin in a loop and wait for the lock to clear. The sleep time between checks of the lock file is one second.
Notice that not only can the servers run on independent server hardware, there can also be multiple instances of the pbs_server running on the same machine. This was not possible before as the second one to start would always write an error and quit when it could not obtain the lock.
Because the file server_priv/serverdb is created in a way which is not compatible between hardware architectures, the machines that are running pbs_server in high-availability mode must be of similar architecture. For example, a 32-bit machine is unable to read the server_priv/serverdb file of a 64-bit machine. Therefore, when choosing hardware, verify all servers are of the same architecture.
The default high availability configuration of TORQUE 2.4 is backward compatible with version 2.3, but an enhanced high availability option is available with version 2.4. The enhanced version in 2.4 fixes some shortcomings in the default configuration and is more robust. The lock file mechanism used to trigger a fail-over in TORQUE 2.3 works correctly only if the primary pbs_server is taken down gracefully, and releases the lock on the file being used as the semaphore. If the server crashes, the lock stays in place and the backup server will not start unless the lock is manually removed by the administrator. With 2.4 enhanced high availability the reliance on the file system is bypassed with a much more reliable mechanism.
In order to use enhanced high availability with TORQUE 2.4, TORQUE must be configured using the --enable-high-availability option (in addition to all other configuration options you specify).
> ./configure --prefix=/usr/var/torque --enable-high-availability
This configuration option is not necessary in TORQUE 4.0 because high availability is enhanced high availability in TORQUE 4.0.
In the above example, TORQUE installs to the /usr/var/torque directory and is configured to use the high availability features.
Once TORQUE has been compiled and installed, it is launched the same way as with TORQUE 2.3; start each instance of pbs_server with the --ha option.
In addition to the new fail-over mechanism, three server options have been added to help manage enhanced high availability in TORQUE 2.4. The server parameters are lock_file, lock_file_update_time, and lock_file_check_time.
The lock_file option allows the administrator to change the location of the lock file. The default location is torque/server_priv. If the lock_file option is used, the new location must be on the shared partition so all servers have access.
The lock_file_update_time and lock_file_check_time parameters are used by the servers to determine if the primary server is active. The primary pbs_server will update the lock file based on the lock_file_update_time (default value of 3 seconds). All backup pbs_servers will check the lock file as indicated by the lock_file_check_time parameter (default value of 9 seconds). The lock_file_update_time must be less than the lock_file_check_time. When a failure occurs, the backup pbs_server takes up to the lock_file_check_time value to take over.
> qmgr -c "set server lock_file_check_time=5"
In the above example, after the primary pbs_server goes down, the backup pbs_server takes up to 5 seconds to take over. It takes additional time for all MOMs to switch over to the new pbs_server.
The clock on the primary and redundant servers must be synchronized in order for high availability to work. Use a utility such as NTP to ensure your servers have a synchronized time.
Enhanced high availability with Moab
When TORQUE is run with an external scheduler such as Moab, and the pbs_server is not running on the same host as Moab, pbs_server needs to know where to find the scheduler. To do this, use the following syntax (the port is required and the default is 15004):
> pbs_server --ha -l <moabhost:port>
If Moab is running in HA mode, add a -l option for each redundant server.
> pbs_server --ha -l <moabhost1:port> -l <moabhost2:port>
The root user of each Moab host must be added to the operators and managers lists of the server. This enables Moab to execute root level operations in TORQUE.
How commands select the correct server host
The various commands that send messages to pbs_server usually have an option of specifying the server name on the command line, or if none is specified will use the default server name. The default server name comes either from the environment variable PBS_DEFAULT or from the file torque/server_name.
When a command is executed and no explicit server is mentioned, an attempt is made to connect to the first server name in the list of hosts from PBS_DEFAULT or torque/server_name. If this fails, the next server name is tried. If all servers in the list are unreachable, an error is returned and the command fails.
Note that there is a period of time after the failure of the current server during which the new server is starting up where it is unable to process commands. The new server must read the existing configuration and job information from the disk, so the length of time that commands cannot be received varies. Commands issued during this period of time might fail due to timeouts expiring.
One aspect of this enhancement is in the construction of job names. Job names normally contain the name of the host machine where pbs_server is running. When job names are constructed, only the first name from the server specification list is used in building the job name.
Persistence of the pbs_server process
The system administrator must ensure that pbs_server continues to run on the server nodes. This could be as simple as a cron job that counts the number of pbs_server's in the process table and starts some more if needed.
High availability of the NFS server
One consideration of this implementation is that it depends on NFS file system also being redundant. NFS can be set up as a redundant service. See the following.
There are also other ways to set up a shared file system. See the following:
Installing TORQUE in high availability mode
The following procedure demonstrates a TORQUE installation in high availability (HA) mode.
These systems can be CentOS 5.7 or higher, RHEL 5.7 or higher, or SLES 6.3 or higher.
To install TORQUE in HA mode
> service iptables stop
> chkconfig iptables off
If you are unable to stop the firewall due to infrastructure restriction, open the following ports:
> vi /etc/sysconfig/selinux
SELINUX=disabled
# TORQUE
export TORQUEHOME=/var/spool/torque
# Library Path
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${TORQUEHOME}/lib
# Update system paths
export PATH=${TORQUEHOME}/bin:${TORQUEHOME}/sbin:$ {PATH}
fileServer# mkdir -m 0755 /var/spool/torque
fileServer# mkdir -m 0750 /var/spool/torque/server_priv
/var/spool/torque/server_priv 192.168.0.0/255.255.255.0(rw,sync,no_root_squash)
fileServer# exportfs -r
> systemctl restart rpcbind.service
> systemctl start nfs-server.service
> systemctl start nfs-lock.service
> systemctl start nfs-idmap.service
server1# mkdir /var/spool/torque/server_priv
Repeat this process for server2.
fileServer:/var/spool/torque/server_priv /var/spool/torque/server_priv nfs rsize= 8192,wsize=8192,timeo=14,intr
Repeat this step for server2.
server1# wget http://github.com/adaptivecomputing/torque/ branches/4.1.4/torque-4.1.4.tar.gz
server1# tar -xvzf torque-4.1.4.tar.gz
server1# configure --enable-high-availability --with-tcp-retry- limit=3
server1# make
server1# make install
server1# make packages
server1# make install
If the installation directory is not shared, repeat step 8a-b (downloading and installing TORQUE) on server2.
server1# /etc/init.d/trqauthd start
List the host names of all nodes that run pbs_server in the torque/server_name file. You must also include the host names of all nodes running pbs_server in the torque/server_name file of each MOM node. The syntax of torque/server_name is a comma-delimited list of host names.
server1
server2
server1# pbs_server -t create
server1# qmgr -c “set server scheduling=true”
server1# qmgr -c “create queue batch queue_type=execution”
server1# qmgr -c “set queue batch started=true”
server1# qmgr -c “set queue batch enabled=true”
server1# qmgr -c “set queue batch resources_default.nodes=1”
server1# qmgr -c “set queue batch resources_default.walltime=3600”
server1# qmgr -c “set server default_queue=batch”
Because server_priv/* is a shared drive, you do not need to repeat this step on server2.
server1# qmgr -c “set server managers += root@server1”
server1# qmgr -c “set server managers += root@server2”
server1# qmgr -c “set server operators += root@server1”
server1# qmgr -c “set server operators += root@server2”
Because server_priv/* is a shared drive, you do not need to repeat this step on server2.
You must update the lock file mechanism for TORQUE in order to determine which server is the primary. To do so, use the lock_file_update_time and lock_file_check_time parameters. The primary pbs_server will update the lock file based on the specified lock_file_update_time (default value of 3 seconds). All backup pbs_servers will check the lock file as indicated by the lock_file_check_time parameter (default value of 9 seconds). The lock_file_update_time must be less than the lock_file_check_time. When a failure occurs, the backup pbs_server takes up to the lock_file_check_time value to take over.
server1# qmgr -c “set server lock_file_check_time=5”
server1# qmgr -c “set server lock_file_update_time=3”
Because server_priv/* is a shared drive, you do not need to repeat this step on server2.
server1# qmgr -c “set server acl_hosts += server1”
server1# qmgr -c “set server acl_hosts += server2”
Because server_priv/* is a shared drive, you do not need to repeat this step on server2.
server1# qterm
server1# pbs_server --ha -l server2:port
server2# pbs_server --ha -l server1:port
server1# qmgr -c “p s”
server2# qmgr -c “p s”
The commands above returns all settings from the active TORQUE server from either node.
Drop one of the pbs_servers to verify that the secondary server picks up the request.
server1# qterm
server2# qmgr -c “p s”
Stop the pbs_server on server2 and restart pbs_server on server1 to verify that both nodes can handle a request from the other.
node1# torque-package-mom-linux-x86_64.sh --install
node2# torque-package-clients-linux-x86_64.sh --install
Repeat this for each compute node. Verify that the /var/pool/ torque/server-name file shows all your compute nodes.
node1 np=2
node2 np=2
Change the np flag to reflect number of available processors on that node.
server1# qterm; pbs_server --ha -l server2:port
server2# qterm; pbs_server --ha -l server1:port
node5# pbs_mom
node6# pbs_mom
Installing TORQUE in high availability mode on headless nodes
The following procedure demonstrates a TORQUE installation in high availability (HA) mode on nodes with no local hard drive.
These systems can be CentOS 5.7 or higher, RHEL 5.7 or higher, or SLES 6.3 or higher.
To install TORQUE in HA mode on a node with no local hard drive
> service iptables stop
> chkconfig iptables off
If you are unable to stop the firewall due to infrastructure restriction, open the following ports:
> vi /etc/sysconfig/selinux
SELINUX=disabled
# TORQUE
export TORQUEHOME=/var/spool/torque
# Library Path
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${TORQUEHOME}/lib
# Update system paths
export PATH=${TORQUEHOME}/bin:${TORQUEHOME}/sbin:$ {PATH}
fileServer# mkdir -m 0755 /var/spool/torque
/var/spool/torque/ 192.168.0.0/255.255.255.0(rw,sync,no_root_squash)
fileServer# exportfs -r
> systemctl restart rpcbind.service
> systemctl start nfs-server.service
> systemctl start nfs-lock.service
> systemctl start nfs-idmap.service
server1# mkdir /var/spool/torque
Repeat this process for server2.
fileServer:/var/spool/torque/server_priv /var/spool/torque/server_priv nfs rsize= 8192,wsize=8192,timeo=14,intr
Repeat this step for server2.
server1# wget http://github.com/adaptivecomputing/torque/ branches/4.1.4/torque-4.1.4.tar.gz
server1# tar -xvzf torque-4.1.4.tar.gz
server1# configure --enable-high-availability --with-tcp-retry- limit=3 --prefix=/var/spool/torque
server1# make
server1# make install
server1# make packages
server1# make install
If the installation directory is not shared, repeat step 8a-b (downloading and installing TORQUE) on server2.
server1# /etc/init.d/trqauthd start
List the host names of all nodes that run pbs_server in the torque/server_name file. You must also include the host names of all nodes running pbs_server in the torque/server_name file of each MOM node. The syntax of torque/server_name is a comma-delimited list of host names.
server1,server2
server1# pbs_server -t create
server1# qmgr -c “set server scheduling=true”
server1# qmgr -c “create queue batch queue_type=execution”
server1# qmgr -c “set queue batch started=true”
server1# qmgr -c “set queue batch enabled=true”
server1# qmgr -c “set queue batch resources_default.nodes=1”
server1# qmgr -c “set queue batch resources_default.walltime=3600”
server1# qmgr -c “set server default_queue=batch”
Because TORQUEHOME is a shared drive, you do not need to repeat this step on server2.
server1# qmgr -c “set server managers += root@server1”
server1# qmgr -c “set server managers += root@server2”
server1# qmgr -c “set server operators += root@server1”
server1# qmgr -c “set server operators += root@server2”
Because TORQUEHOME is a shared drive, you do not need to repeat this step on server2.
You must update the lock file mechanism for TORQUE in order to determine which server is the primary. To do so, use the lock_file_update_time and lock_file_check_time parameters. The primary pbs_server will update the lock file based on the specified lock_file_update_time (default value of 3 seconds). All backup pbs_servers will check the lock file as indicated by the lock_file_check_time parameter (default value of 9 seconds). The lock_file_update_time must be less than the lock_file_check_time. When a failure occurs, the backup pbs_server takes up to the lock_file_check_time value to take over.
server1# qmgr -c “set server lock_file_check_time=5”
server1# qmgr -c “set server lock_file_update_time=3”
Because TORQUEHOME is a shared drive, you do not need to repeat this step on server2.
server1# qmgr -c “set server acl_hosts += server1”
server1# qmgr -c “set server acl_hosts += server2”
Because TORQUEHOME is a shared drive, you do not need to repeat this step on server2.
server1# qterm
server1# pbs_server --ha -l server2:port
server2# pbs_server --ha -l server1:port
server1# qmgr -c “p s”
server2# qmgr -c “p s”
The commands above returns all settings from the active TORQUE server from either node.
Drop one of the pbs_servers to verify that the secondary server picks up the request.
server1# qterm
server2# qmgr -c “p s”
Stop the pbs_server on server2 and restart pbs_server on server1 to verify that both nodes can handle a request from the other.
node1 np=2
node2 np=2
Change the np flag to reflect number of available processors on that node.
server1# qterm; pbs_server --ha -l server2:port
server2# qterm; pbs_server --ha -l server1:port
server1# pbs_mom -d <mom-server1>
server2# pbs_mom -d <mom-server2>
Example setup of high availability
# List of all servers running pbs_server
server1,server2
> qmgr -c "set server acl_hosts += server1"
> qmgr -c "set server acl_hosts += server2"
[root@server1]$ pbs_server --ha
[root@server2]$ pbs_server --ha
Related topics