1.0 High availability overview > How-to's > Installing Moab and TORQUE in high availability mode

Installing Moab and TORQUE in high availability mode

The following procedure demonstrates how to install Moab and TORQUE in high availability (HA) mode.

To install Moab and TORQUE in HA mode

Stop all firewalls or update your firewall to allow traffic from Moab and TORQUE services.

> service iptables stop

> chkconfig iptables off

If you are unable to stop the firewall due to infrastructure restriction, open the following ports:

TORQUE

15001[tcp,udp]
15002[tcp,udp]
15003[tcp,udp]

Moab

42559[tcp]

Disable SELinux
> vi /etc/sysconfig/selinux

SELINUX=disabled
Update your main ~/.bashrc profile to ensure you are always referencing the applications to be installed on all servers.

# Moab

export MOABHOMEDIR=/opt/moab

# TORQUE

export TORQUEHOME=/var/spool/torque

# Library Path

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${MOABHOMEDIR}/lib:${TORQUEHOME}/lib

# Update system paths

export PATH=${MOABHOMEDIR}/sbin:${MOABHOMEDIR}/bin:${TORQUEHOME}/bin:${TORQUEHOME}/sbin:${PATH}

Verify server1 and server2 are resolvable via either DNS or looking for an entry in the /etc/hosts file.
Configure the NFS Mounts by following these steps:

Create mount point folders on fileServer.

fileServer# mkdir -m 0755 /var/spool/torque

fileServer# mkdir -m 0750 /var/spool/torque/server_priv

fileServer# mkdir -m 0755 /opt/moab

Update /etc/exports on fileServer. The IP addresses should be that of server2.

/opt/moab 192.168.0.0/255.255.255.0(rw,sync,no_root_squash)

/var/spool/torque/server_priv 192.168.0.0/255.255.255.0(rw,sync,no_root_squash)

Update the list of NFS exported file systems.

fileServer# exportfs -r

If the NFS daemons are not already running on fileServer, start them.

> systemctl restart rpcbind.service

> systemctl start nfs-server.service

> systemctl start nfs-lock.service

> systemctl start nfs-idmap.service
Mount the exported file systems on server1 by following these steps:
1. Create the directory reference and mount them.
2. Update /etc/fstab on server1 to ensure that NFS mount is performed on startup.
Install TORQUE by following these steps:
1. Download and extract TORQUE 4.2 on server1.
2. Navigate to the TORQUE directory and compile TORQUE with the HA flags on server1.
3. If the installation directory is shared on both head nodes, then run make install on server1.
Start trqauthd.
server1# /etc/init.d/trqauthd start
Configure TORQUE for HA.
1. List the host names of all nodes that run pbs_server in the torque/server_name file. You must also include the host names of all nodes running pbs_server in the torque/server_name file of each MOM node. The syntax of torque/server_name is a comma-delimited list of host names.
  server1
  server2
2. Create a simple queue configuration for TORQUE job queues on server1.
3. Add the root users of Moab and TORQUE to the TORQUE configuration as an operator and manager.
4. You must update the lock file mechanism for TORQUE in order to determine which server is the primary. To do so, use the lock_file_update_time and lock_file_check_time parameters. The primary pbs_server will update the lock file based on the specified lock_file_update_time (default value of 3 seconds). All backup pbs_servers will check the lock file as indicated by the lock_file_check_time parameter (default value of 9 seconds). The lock_file_update_time must be less than the lock_file_check_time. When a failure occurs, the backup pbs_server takes up to the lock_file_check_time value to take over.
  server1# qmgr -c “set server lock_file_check_time=5”
  server1# qmgr -c “set server lock_file_update_time=3”
  Because server_priv/* is a shared drive, you do not need to repeat this step on server2.
5. List the servers running pbs_server in the TORQUE acl_hosts file.
6. Restart the running pbs_server in HA mode.
7. Start the pbs_server on the secondary server.
Check the status of TORQUE in HA mode.
server1# qmgr -c “p s”
server2# qmgr -c “p s”
The commands above returns all settings from the active TORQUE server from either node.
Drop one of the pbs_servers to verify that the secondary server picks up the request.
server1# qterm
server2# qmgr -c “p s”
Stop the pbs_server on server2 and restart pbs_server on server1 to verify that both nodes can handle a request from the other.
Install a pbs_mom on the compute nodes.

Copy the install scripts to the compute nodes and install.
Navigate to the shared source directory of TORQUE and run the following:

node1# torque-package-mom-linux-x86_64.sh --install

node2# torque-package-clients-linux-x86_64.sh --install

Repeat this for each compute node. Verify that the /var/pool/torque/server-name file shows all your compute nodes.

On server1 or server2, configure the nodes file to identify all available MOMs. To do so, edit the /var/spool/torque/server_priv/nodes file.
node1 np=2
node2 np=2
Change the np flag to reflect number of available processors on that node.
Recycle the pbs_servers to verify that they pick up the MOM configuration.

server1# qterm; pbs_server --ha -l server2:port

server2# qterm; pbs_server --ha -l server1:port

Again, if Moab HA is configured on a remote server, run pbs_server --ha -l <moabHost1:port> -l <moabHost2:port>.
Start the pbs_mom on each execution node.

node5# pbs_mom

node6# pbs_mom

Download Moab 7.2 (ODBC + TORQUE). Extract and install the package.

server1# tar -xvzf moab-6.1.0-linux-x86_64-torque-odbc.tar.gz

Navigate to the moab directory.

server1# cd moab-7.2.0

Begin the Moab installation.

server1# ./configure

server1# make install

Configure Moab by editing the /opt/moab/etc/moab.cfg file.

SCHEDCFG[Moab] SERVER=server1:42559

SCHEDCFG[Moab] FBSERVER=server2

SCHEDCFG[Moab] FLAGS=filelockha

SCHEDCFG[Moab] HALOCKFILE=/opt/moab/.moab_lock

ADMINCFG[1] USERS=root

TOOLSDIR /opt/moab/tools

LOGLEVEL 3

...

RMCFG[moabha] TYPE=PBS

RMCFG[moabha] SUBMITCMD=/usr/local/bin/qsub

Install your Moab license file moab.lic into the directory /opt/moab. You must have a license that permits HA by allowing Moab to run on both server1 and server2.
Since /opt/moab has an NFS share and is mounted, and you have already set the system paths for your bash shell at step 2, you must now start your Moab instance.
server1# moab
server2# moab
Run showq to make sure everything is working correctly.
server1# showq
server2# showq
Query the available MOMs via TORQUE and check their status. If everything is working correctly, the MOMs you configured in step 5.c should be returned as available.
server1# mdiag -n
server2# mdiag -n
Verify that your setup is working correctly. To do so:
1. Switch to a non-root user and ensure they have the path defined in step 2, then run the following:
  server1# echo "sleep 60" | msub
  Verify that the job is running.
  server1# showq
2. Submit jobs from the secondary MOM server to double-check that it is working there.
  server2# echo "sleep 60" | msub
  Verify that the job is running.
  server2# showq
3. While the jobs are running, kill one of the pbs_servers and Moab on the same node to simulate a disaster. Allow about 10 seconds and you should see all traffic being handled by the active server.