Server high availability

4.0 Setting server policies > 4.2 Server high availability

4.2 Server high availability

You can now run TORQUE in a redundant or high availability mode. This means that there can be multiple instances of the server running and waiting to take over processing in the event that the currently running server fails.

The high availability feature is available in the 2.3 and later versions of TORQUE. TORQUE 2.4 includes several enhancements to high availability (see Enhanced high availability).

For more details, see these sections:

Redundant server host machines
Enhanced high availability
Enhanced high availability with Moab
How commands select the correct server host
Job names
Persistence of the pbs_server process
High availability of the NFS server
Installing TORQUE in high availability mode
Installing TORQUE in high availability mode on headless nodes
Example setup of high availability

Redundant server host machines

High availability enables TORQUE to continue running even if pbs_server is brought down. This is done by running multiple copies of pbs_server which have their torque/server_priv directory mounted on a shared file system. The torque/server_name must include the host names of all nodes that run pbs_server. All MOM nodes also must include the host names of all nodes running pbs_server in their torque/server_name file. The syntax of the torque/server_name is a comma delimited list of host names.

For example:

host1,host2,host3

When configuring high availability, do not use $pbsserver to specify the host names. You must use the $TORQUEHOMEDIR/server_name file.

All instances of pbs_server need to be started with the --ha command line option that allows the servers to run at the same time. Only the first server to start will complete the full startup. The second server to start will block very early in the startup when it tries to lock the file torque/server_priv/server.lock. When the second server cannot obtain the lock, it will spin in a loop and wait for the lock to clear. The sleep time between checks of the lock file is one second.

Notice that not only can the servers run on independent server hardware, there can also be multiple instances of the pbs_server running on the same machine. This was not possible before as the second one to start would always write an error and quit when it could not obtain the lock.

Because the file server_priv/serverdb is created in a way which is not compatible between hardware architectures, the machines that are running pbs_server in high-availability mode must be of similar architecture. For example, a 32-bit machine is unable to read the server_priv/serverdb file of a 64-bit machine. Therefore, when choosing hardware, verify all servers are of the same architecture.

Enhanced high availability

The default high availability configuration of TORQUE 2.4 is backward compatible with version 2.3, but an enhanced high availability option is available with version 2.4. The enhanced version in 2.4 fixes some shortcomings in the default configuration and is more robust. The lock file mechanism used to trigger a fail-over in TORQUE 2.3 works correctly only if the primary pbs_server is taken down gracefully, and releases the lock on the file being used as the semaphore. If the server crashes, the lock stays in place and the backup server will not start unless the lock is manually removed by the administrator. With 2.4 enhanced high availability the reliance on the file system is bypassed with a much more reliable mechanism.

In order to use enhanced high availability with TORQUE 2.4, TORQUE must be configured using the --enable-high-availability option (in addition to all other configuration options you specify).

> ./configure --prefix=/usr/var/torque --enable-high-availability

This configuration option is not necessary in TORQUE 4.0 because high availability is enhanced high availability in TORQUE 4.0.

In the above example, TORQUE installs to the /usr/var/torque directory and is configured to use the high availability features.

Once TORQUE has been compiled and installed, it is launched the same way as with TORQUE 2.3; start each instance of pbs_server with the --ha option.

In addition to the new fail-over mechanism, three server options have been added to help manage enhanced high availability in TORQUE 2.4. The server parameters are lock_file, lock_file_update_time, and lock_file_check_time.

The lock_file option allows the administrator to change the location of the lock file. The default location is torque/server_priv. If the lock_file option is used, the new location must be on the shared partition so all servers have access.

The lock_file_update_time and lock_file_check_time parameters are used by the servers to determine if the primary server is active. The primary pbs_server will update the lock file based on the lock_file_update_time (default value of 3 seconds). All backup pbs_servers will check the lock file as indicated by the lock_file_check_time parameter (default value of 9 seconds). The lock_file_update_time must be less than the lock_file_check_time. When a failure occurs, the backup pbs_server takes up to the lock_file_check_time value to take over.

> qmgr -c "set server lock_file_check_time=5"

In the above example, after the primary pbs_server goes down, the backup pbs_server takes up to 5 seconds to take over. It takes additional time for all MOMs to switch over to the new pbs_server.

The clock on the primary and redundant servers must be synchronized in order for high availability to work. Use a utility such as NTP to ensure your servers have a synchronized time.

Enhanced high availability with Moab

When TORQUE is run with an external scheduler such as Moab, and the pbs_server is not running on the same host as Moab, pbs_server needs to know where to find the scheduler. To do this, use the following syntax (the port is required and the default is 15004):

> pbs_server --ha -l <moabhost:port>

If Moab is running in HA mode, add a -l option for each redundant server.

> pbs_server --ha -l <moabhost1:port> -l <moabhost2:port>

The root user of each Moab host must be added to the operators and managers lists of the server. This enables Moab to execute root level operations in TORQUE.

How commands select the correct server host

The various commands that send messages to pbs_server usually have an option of specifying the server name on the command line, or if none is specified will use the default server name. The default server name comes either from the environment variable PBS_DEFAULT or from the file torque/server_name.

When a command is executed and no explicit server is mentioned, an attempt is made to connect to the first server name in the list of hosts from PBS_DEFAULT or torque/server_name. If this fails, the next server name is tried. If all servers in the list are unreachable, an error is returned and the command fails.

Note that there is a period of time after the failure of the current server during which the new server is starting up where it is unable to process commands. The new server must read the existing configuration and job information from the disk, so the length of time that commands cannot be received varies. Commands issued during this period of time might fail due to timeouts expiring.

Job names

One aspect of this enhancement is in the construction of job names. Job names normally contain the name of the host machine where pbs_server is running. When job names are constructed, only the first name from the server specification list is used in building the job name.

Persistence of the pbs_server process

The system administrator must ensure that pbs_server continues to run on the server nodes. This could be as simple as a cron job that counts the number of pbs_server's in the process table and starts some more if needed.

High availability of the NFS server

One consideration of this implementation is that it depends on NFS file system also being redundant. NFS can be set up as a redundant service. See the following.

There are also other ways to set up a shared file system. See the following:

Installing TORQUE in high availability mode

The following procedure demonstrates a TORQUE installation in high availability (HA) mode.

To install TORQUE in HA mode

Stop all firewalls or update your firewall to allow traffic from TORQUE services.

> service iptables stop

> chkconfig iptables off

If you are unable to stop the firewall due to infrastructure restriction, open the following ports:

15001[tcp,udp]
15002[tcp,udp]
15003[tcp,udp]

Disable SELinux
> vi /etc/sysconfig/selinux

SELINUX=disabled
Update your main ~/.bashrc profile to ensure you are always referencing the applications to be installed on all servers.

# TORQUE

export TORQUEHOME=/var/spool/torque

# Library Path

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${TORQUEHOME}/lib

# Update system paths

export PATH=${TORQUEHOME}/bin:${TORQUEHOME}/sbin:$ {PATH}

Verify server1 and server2 are resolvable via either DNS or looking for an entry in the /etc/hosts file.
Configure the NFS Mounts by following these steps:

Create mount point folders on fileServer.

fileServer# mkdir -m 0755 /var/spool/torque

fileServer# mkdir -m 0750 /var/spool/torque/server_priv

Update /etc/exports on fileServer. The IP addresses should be that of server2.

/var/spool/torque/server_priv 192.168.0.0/255.255.255.0(rw,sync,no_root_squash)

Update the list of NFS exported file systems.

fileServer# exportfs -r

If the NFS daemons are not already running on fileServer, start them.

> systemctl restart rpcbind.service

> systemctl start nfs-server.service

> systemctl start nfs-lock.service

> systemctl start nfs-idmap.service

Mount the exported file systems on server1 by following these steps:

Create the directory reference and mount them.

server1# mkdir /var/spool/torque/server_priv

Repeat this process for server2.

Update /etc/fstab on server1 to ensure that NFS mount is performed on startup.

fileServer:/var/spool/torque/server_priv /var/spool/torque/server_priv nfs rsize= 8192,wsize=8192,timeo=14,intr

Repeat this step for server2.

Install TORQUE by following these steps:

Download and extract TORQUE 4.1.4 on server1.

server1# wget http://github.com/adaptivecomputing/torque/ branches/4.1.4/torque-4.1.4.tar.gz

server1# tar -xvzf torque-4.1.4.tar.gz

Navigate to the TORQUE directory and compile TORQUE with the HA flags on server1.

server1# configure --enable-high-availability --with-tcp-retry- limit=3

server1# make

server1# make install

server1# make packages

If the installation directory is shared on both head nodes, then run make install on server1.

server1# make install

If the installation directory is not shared, repeat step 8a-b (downloading and installing TORQUE) on server2.

Start trqauthd.
server1# /etc/init.d/trqauthd start
Configure TORQUE for HA.

List the host names of all nodes that run pbs_server in the torque/server_name file. You must also include the host names of all nodes running pbs_server in the torque/server_name file of each MOM node. The syntax of torque/server_name is a comma-delimited list of host names.

server1

server2

Create a simple queue configuration for TORQUE job queues on server1.

server1# pbs_server -t create

server1# qmgr -c “set server scheduling=true”

server1# qmgr -c “create queue batch queue_type=execution”

server1# qmgr -c “set queue batch started=true”

server1# qmgr -c “set queue batch enabled=true”

server1# qmgr -c “set queue batch resources_default.nodes=1”

server1# qmgr -c “set queue batch resources_default.walltime=3600”

server1# qmgr -c “set server default_queue=batch”

Because server_priv/* is a shared drive, you do not need to repeat this step on server2.

Add the root users of TORQUE to the TORQUE configuration as an operator and manager.

server1# qmgr -c “set server managers += root@server1”

server1# qmgr -c “set server managers += root@server2”

server1# qmgr -c “set server operators += root@server1”

server1# qmgr -c “set server operators += root@server2”

Because server_priv/* is a shared drive, you do not need to repeat this step on server2.

You must update the lock file mechanism for TORQUE in order to determine which server is the primary. To do so, use the lock_file_update_time and lock_file_check_time parameters. The primary pbs_server will update the lock file based on the specified lock_file_update_time (default value of 3 seconds). All backup pbs_servers will check the lock file as indicated by the lock_file_check_time parameter (default value of 9 seconds). The lock_file_update_time must be less than the lock_file_check_time. When a failure occurs, the backup pbs_server takes up to the lock_file_check_time value to take over.

server1# qmgr -c “set server lock_file_check_time=5”

server1# qmgr -c “set server lock_file_update_time=3”

Because server_priv/* is a shared drive, you do not need to repeat this step on server2.
List the servers running pbs_server in the TORQUE acl_hosts file.

server1# qmgr -c “set server acl_hosts += server1”

server1# qmgr -c “set server acl_hosts += server2”

Because server_priv/* is a shared drive, you do not need to repeat this step on server2.

Restart the running pbs_server in HA mode.

server1# qterm

Start the pbs_server on the secondary server.

server1# pbs_server --ha -l server2:port

server2# pbs_server --ha -l server1:port

Check the status of TORQUE in HA mode.

server1# qmgr -c “p s”

server2# qmgr -c “p s”

The commands above returns all settings from the active TORQUE server from either node.

Drop one of the pbs_servers to verify that the secondary server picks up the request.

server1# qterm

server2# qmgr -c “p s”

Stop the pbs_server on server2 and restart pbs_server on server1 to verify that both nodes can handle a request from the other.

Install a pbs_mom on the compute nodes.

Copy the install scripts to the compute nodes and install.
Navigate to the shared source directory of TORQUE and run the following:

node1# torque-package-mom-linux-x86_64.sh --install

node2# torque-package-clients-linux-x86_64.sh --install

Repeat this for each compute node. Verify that the /var/pool/ torque/server-name file shows all your compute nodes.

On server1 or server2, configure the nodes file to identify all available MOMs. To do so, edit the / var/spool/torque/server_priv/nodes file.

node1 np=2

node2 np=2

Change the np flag to reflect number of available processors on that node.

Recycle the pbs_servers to verify that they pick up the MOM configuration.

server1# qterm; pbs_server --ha -l server2:port

server2# qterm; pbs_server --ha -l server1:port

Start the pbs_mom on each execution node.

node5# pbs_mom

node6# pbs_mom

Installing TORQUE in high availability mode on headless nodes

The following procedure demonstrates a TORQUE installation in high availability (HA) mode on nodes with no local hard drive.

To install TORQUE in HA mode on a node with no local hard drive

Stop all firewalls or update your firewall to allow traffic from TORQUE services.

> service iptables stop

> chkconfig iptables off

If you are unable to stop the firewall due to infrastructure restriction, open the following ports:

15001[tcp,udp]
15002[tcp,udp]
15003[tcp,udp]

Disable SELinux
> vi /etc/sysconfig/selinux

SELINUX=disabled
Update your main ~/.bashrc profile to ensure you are always referencing the applications to be installed on all servers.

# TORQUE

export TORQUEHOME=/var/spool/torque

# Library Path

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${TORQUEHOME}/lib

# Update system paths

export PATH=${TORQUEHOME}/bin:${TORQUEHOME}/sbin:$ {PATH}

Verify server1 and server2 are resolvable via either DNS or looking for an entry in the /etc/hosts file.
Configure the NFS Mounts by following these steps:

Create mount point folders on fileServer.

fileServer# mkdir -m 0755 /var/spool/torque

Update /etc/exports on fileServer. The IP addresses should be that of server2.

/var/spool/torque/ 192.168.0.0/255.255.255.0(rw,sync,no_root_squash)

Update the list of NFS exported file systems.

fileServer# exportfs -r

If the NFS daemons are not already running on fileServer, start them.

> systemctl restart rpcbind.service

> systemctl start nfs-server.service

> systemctl start nfs-lock.service

> systemctl start nfs-idmap.service

Mount the exported file systems on server1 by following these steps:

Create the directory reference and mount them.

server1# mkdir /var/spool/torque

Repeat this process for server2.

Update /etc/fstab on server1 to ensure that NFS mount is performed on startup.

fileServer:/var/spool/torque/server_priv /var/spool/torque/server_priv nfs rsize= 8192,wsize=8192,timeo=14,intr

Repeat this step for server2.

Install TORQUE by following these steps:

Download and extract TORQUE 4.1.4 on server1.

server1# wget http://github.com/adaptivecomputing/torque/ branches/4.1.4/torque-4.1.4.tar.gz

server1# tar -xvzf torque-4.1.4.tar.gz

Navigate to the TORQUE directory and compile TORQUE with the HA flags on server1.

server1# configure --enable-high-availability --with-tcp-retry- limit=3 --prefix=/var/spool/torque

server1# make

server1# make install

server1# make packages

If the installation directory is shared on both head nodes, then run make install on server1.

server1# make install

If the installation directory is not shared, repeat step 8a-b (downloading and installing TORQUE) on server2.

Start trqauthd.
server1# /etc/init.d/trqauthd start
Configure TORQUE for HA.

List the host names of all nodes that run pbs_server in the torque/server_name file. You must also include the host names of all nodes running pbs_server in the torque/server_name file of each MOM node. The syntax of torque/server_name is a comma-delimited list of host names.

server1,server2

Create a simple queue configuration for TORQUE job queues on server1.

server1# pbs_server -t create

server1# qmgr -c “set server scheduling=true”

server1# qmgr -c “create queue batch queue_type=execution”

server1# qmgr -c “set queue batch started=true”

server1# qmgr -c “set queue batch enabled=true”

server1# qmgr -c “set queue batch resources_default.nodes=1”

server1# qmgr -c “set queue batch resources_default.walltime=3600”

server1# qmgr -c “set server default_queue=batch”

Because TORQUEHOME is a shared drive, you do not need to repeat this step on server2.

Add the root users of TORQUE to the TORQUE configuration as an operator and manager.

server1# qmgr -c “set server managers += root@server1”

server1# qmgr -c “set server managers += root@server2”

server1# qmgr -c “set server operators += root@server1”

server1# qmgr -c “set server operators += root@server2”

Because TORQUEHOME is a shared drive, you do not need to repeat this step on server2.

You must update the lock file mechanism for TORQUE in order to determine which server is the primary. To do so, use the lock_file_update_time and lock_file_check_time parameters. The primary pbs_server will update the lock file based on the specified lock_file_update_time (default value of 3 seconds). All backup pbs_servers will check the lock file as indicated by the lock_file_check_time parameter (default value of 9 seconds). The lock_file_update_time must be less than the lock_file_check_time. When a failure occurs, the backup pbs_server takes up to the lock_file_check_time value to take over.

server1# qmgr -c “set server lock_file_check_time=5”

server1# qmgr -c “set server lock_file_update_time=3”

Because TORQUEHOME is a shared drive, you do not need to repeat this step on server2.
List the servers running pbs_server in the TORQUE acl_hosts file.

server1# qmgr -c “set server acl_hosts += server1”

server1# qmgr -c “set server acl_hosts += server2”

Because TORQUEHOME is a shared drive, you do not need to repeat this step on server2.

Restart the running pbs_server in HA mode.

server1# qterm

Start the pbs_server on the secondary server.

server1# pbs_server --ha -l server2:port

server2# pbs_server --ha -l server1:port

Check the status of TORQUE in HA mode.

server1# qmgr -c “p s”

server2# qmgr -c “p s”

The commands above returns all settings from the active TORQUE server from either node.

Drop one of the pbs_servers to verify that the secondary server picks up the request.

server1# qterm

server2# qmgr -c “p s”

Stop the pbs_server on server2 and restart pbs_server on server1 to verify that both nodes can handle a request from the other.

Install a pbs_mom on the compute nodes.

On server1 or server2, configure the nodes file to identify all available MOMs. To do so, edit the / var/spool/torque/server_priv/nodes file.

node1 np=2

node2 np=2

Change the np flag to reflect number of available processors on that node.

Recycle the pbs_servers to verify that they pick up the MOM configuration.

server1# qterm; pbs_server --ha -l server2:port

server2# qterm; pbs_server --ha -l server1:port

Start the pbs_mom on each execution node.

server1# pbs_mom -d <mom-server1>

server2# pbs_mom -d <mom-server2>

Example setup of high availability

The machines running pbs_server must have access to a shared server_priv/ directory (usually an NFS share on a MoM).
All MoMs must have the same content in their server_name file. This can be done manually or via an NFS share. The server_name file contains a comma-delimited list of the hosts that run pbs_server.

# List of all servers running pbs_server

server1,server2

The machines running pbs_server must be listed in acl_hosts.

> qmgr -c "set server acl_hosts += server1"

> qmgr -c "set server acl_hosts += server2"

Start pbs_server with the --ha option.

[root@server1]$ pbs_server --ha

[root@server2]$ pbs_server --ha