6.39 Installing Torque Resource Manager

If you intend to use Torque Resource Manager 6.0.3 with Moab Workload Manager, you must run Moab version 8.0 or later. However, some Torque 6.0 functionality requires Moab 9.0 or later.

This topic contains instructions on how to install, configure, and start Torque Resource Manager (Torque).

For Cray systems, Adaptive Computing recommends that you install Moab and Torque Servers (head nodes) on commodity hardware (not on Cray compute/service/login nodes).

However, you must install the Torque pbs_mom daemon and Torque client commands on Cray login and "mom" service nodes since the pbs_mom must run on a Cray service node within the Cray system so it has access to the Cray ALPS subsystem.

See Installation Notes for Moab and Torque for Cray in the Moab Workload Manager Administrator Guide for instructions on installing Moab and Torque on a non-Cray server.

In this topic:

6.39.1 Prerequisites

In this section:

6.39.1.A Open Necessary Ports

Torque requires certain ports to be open for essential communication.

For more information on how to configure the ports that Torque uses for communication, see Configuring Ports in the Torque Resource Manager Administrator Guide for more information.

If you have a firewall enabled, do the following:

  1. On the Torque Server Host:
    [root]# iptables-save > /tmp/iptables.mod
    [root]# vi /tmp/iptables.mod
    
    # Add the following line immediately *before* the line matching
    # "-A INPUT -j REJECT --reject-with icmp-host-prohibited"
    
    -A INPUT -p tcp --dport 15001 -j ACCEPT
    
    [root]# iptables-restore < /tmp/iptables.mod
    [root]# service iptables save
    [root]# firewall-cmd --add-port=15001/tcp --permanent
    [root]# firewall-cmd --reload
    [root]# vi /etc/sysconfig/SuSEfirewall2
    
    # Add the following port to the FW_SERVICES_EXT_TCP parameter
    FW_SERVICES_EXT_TCP="15001"
    
    [root]# service SuSEfirewall2 restart
  2. On each Torque MOM Host (Compute Hosts):
    [root]# iptables-save > /tmp/iptables.mod
    [root]# vi /tmp/iptables.mod
    
    # Add the following lines immediately *before* the line matching
    # "-A INPUT -j REJECT --reject-with icmp-host-prohibited"
    
    -A INPUT -p tcp --dport 15002:15003 -j ACCEPT
    
    [root]# iptables-restore < /tmp/iptables.mod
    [root]# service iptables save
    [root]# firewall-cmd --add-port=15002-15003/tcp --permanent
    [root]# firewall-cmd --reload
    [root]# vi /etc/sysconfig/SuSEfirewall2
    
    # Add the following ports to the FW_SERVICES_EXT_TCP parameter
    FW_SERVICES_EXT_TCP="15002 15003"
    
    [root]# service SuSEfirewall2 restart

6.39.1.B Verify the hostname

On the Torque Server Host, confirm your host (with the correct IP address) is in your /etc/hosts file. To verify that the hostname resolves correctly, make sure that hostname and hostname -f report the correct name for the host.

6.39.2 Install Torque Server

You must complete the prerequisite tasks earlier in this topic before installing the Torque Server. See 6.39.1 Prerequisites.

On the Torque Server Host, do the following:

  1. If you are installing the Torque Server on its own host (recommend) and not on the same host where you installed another server (such as Moab Server), verify you completed the steps to prepare the host. See 6.31 Preparing the Host – Typical Method or 6.30 Preparing the Host – Offline Method.
  2. Install the Torque Server RPM.
    [root]# yum install moab-torque-server
    [root]# zypper install moab-torque-server
  3. Source the following file to add the Torque executable directories to your current shell $PATH environment.
    [root]# . /etc/profile.d/torque.sh
  4. Add the hostnames of your Torque MOMs (which is commonly all of your compute nodes) to the /var/spool/torque/server_priv/nodes file. You can remove the hostname entry for the Torque server node unless you will be running a Torque MOM daemon on this host. See Managing Nodes in the Torque Resource Manager Administrator Guide for information on syntax and options for specifying compute nodes.

    Example:

    [root]# vi /var/spool/torque/server_priv/nodes
    
    node01 np=16
    node02 np=16
    ...
  5. Start the Torque server.
    [root]# service pbs_server start
    [root]# service trqauthd start
    [root]# systemctl start pbs_server.service
    [root]# systemctl start trqauthd.service
    [root]# systemctl start pbs_server.service
    [root]# systemctl start trqauthd.service

6.39.3 Install Torque MOMs

In most installations, you will install a Torque MOM on each of your compute nodes.

Do the following:

  1. From the Torque Server Host, copy the moab-torque-common and moab-torque-mom RPM files to each MOM node. It is also recommended that you install the moab-torque-common RPM so you can use client commands and submit jobs from compute nodes.
    [root]# scp RPMs/moab-torque-common-*.rpm <torque-mom-host>:
    [root]# scp RPMs/moab-torque-mom-*.rpm <torque-mom-host>:
    [root]# scp RPMs/moab-torque-client-*.rpm <torque-mom-host>:
  2. On each Torque MOM Host, install the RPMs; moab-torque-common is installed first.
    [root]# ssh root@<torque-mom-host>
    [root]# yum install moab-torque-common-*.rpm moab-torque-mom-*.rpm moab-torque-client-*.rpm
    [root]# ssh root@<torque-mom-host>
    [root]# zypper install moab-torque-common-*.rpm moab-torque-mom-*.rpm moab-torque-client-*.rpm
  3. On each Torque MOM Host, create or edit the /var/spool/torque/server_name file to contain the hostname of the Torque server.
  4. [root]# echo <torque_server_hostname> > /var/spool/torque/server_name
  5. On each Torque MOM Host, edit the /var/spool/torque/mom_priv/config file. This file is identical for all compute nodes and can be created on the Torque Server and distributed in parallel to all systems.
  6. [root]# vi /var/spool/torque/mom_priv/config
    
    $pbsserver     <torque_server_hostname>   # hostname running pbs server
    $logevent      225                        # bitmap of which events to log
  7. On each Torque MOM Host, start the pbs_mom daemon.
    [root]# service pbs_mom start
    [root]# systemctl start pbs_mom.service
    [root]# systemctl start pbs_mom.service
  8. If you installed the Torque Client RPM on the MOMs, then on each Torque MOM Host, start the trqauthd daemon.
    [root]# service trqauthd start
    [root]# systemctl start trqauthd.service
    [root]# systemctl start trqauthd.service

6.39.4 Configure Data Management

When a batch job completes, stdout and stderr files are generated and placed in the spool directory on the master Torque MOM Host for the job instead of the submit host. You can configure the Torque batch environment to copy the stdout and stderr files back to the submit host. See Configuring Data Management in the Torque Resource Manager Administrator Guide for more information.

Related Topics 

© 2017 Adaptive Computing