4.4 Installing Torque Resource Manager

If you intend to use Torque Resource Manager 6.0.3 with Moab Workload Manager, you must run Moab version 8.0 or later. However, some Torque 6.0 functionality requires Moab 9.0 or later.

This topic contains instructions on how to install and start Torque Resource Manager (Torque).

For Cray systems, Adaptive Computing recommends that you install Moab and Torque Servers (head nodes) on commodity hardware (not on Cray compute/service/login nodes).

However, you must install the Torque pbs_mom daemon and Torque client commands on Cray login and "mom" service nodes since the pbs_mom must run on a Cray service node within the Cray system so it has access to the Cray ALPS subsystem.

See Installation Notes for Moab and Torque for Cray in the Moab Workload Manager Administrator Guide for instructions on installing Moab and Torque on a non-Cray server.

In this topic:

4.4.1 Prerequisites

In this section:

4.4.1.A Open Necessary Ports

Torque requires certain ports to be open for essential communication.

For more information on how to configure the ports that Torque uses for communication, see Configuring Ports for more information.

If you have a firewall enabled, do the following:

  1. On the Torque Server Host:
    [root]# iptables-save > /tmp/iptables.mod
    [root]# vi /tmp/iptables.mod
    				
    # Add the following line immediately *before* the line matching
    # "-A INPUT -j REJECT --reject-with icmp-host-prohibited"
    
    -A INPUT -p tcp --dport 15001 -j ACCEPT
    		
    [root]# iptables-restore < /tmp/iptables.mod				
    [root]# service iptables save
    [root]# firewall-cmd --add-port=15001/tcp --permanent
    [root]# firewall-cmd --reload
    [root]# vi /etc/sysconfig/SuSEfirewall2
    
    # Add the following port to the FW_SERVICES_EXT_TCP parameter
    FW_SERVICES_EXT_TCP="15001"
    
    [root]# service SuSEfirewall2_setup restart
    [root]# vi /etc/sysconfig/SuSEfirewall2
    
    # Add the following port to the FW_SERVICES_EXT_TCP parameter
    FW_SERVICES_EXT_TCP="15001"
    
    [root]# service SuSEfirewall2 restart
  2. On the Torque MOM Hosts (compute nodes):
    [root]# iptables-save > /tmp/iptables.mod
    [root]# vi /tmp/iptables.mod
    				
    # Add the following lines immediately *before* the line matching
    # "-A INPUT -j REJECT --reject-with icmp-host-prohibited"
    
    -A INPUT -p tcp --dport 15002:15003 -j ACCEPT
    				
    [root]# iptables-restore < /tmp/iptables.mod
    [root]# service iptables save
    [root]# firewall-cmd --add-port=15002-15003/tcp --permanent
    [root]# firewall-cmd --reload
    [root]# vi /etc/sysconfig/SuSEfirewall2
    
    # Add the following ports to the FW_SERVICES_EXT_TCP parameter
    FW_SERVICES_EXT_TCP="15002 15003"
    
    [root]# service SuSEfirewall2_setup restart
    [root]# vi /etc/sysconfig/SuSEfirewall2
    
    # Add the following ports to the FW_SERVICES_EXT_TCP parameter
    FW_SERVICES_EXT_TCP="15002 15003"
    
    [root]# service SuSEfirewall2 restart

4.4.1.B Verify the hostname

On the Torque Server Host, confirm your host (with the correct IP address) is in your /etc/hosts file. To verify that the hostname resolves correctly, make sure that hostname and hostname -f report the correct name for the host.

4.4.2 Install Dependencies, Packages, or Clients

4.4.2.A Install Packages

On the Torque Server Host, use the following commands to install the libxml2-devel, openssl-devel, and boost-devel packages.

[root]# yum install libtool openssl-devel libxml2-devel boost-devel gcc gcc-c++
[root]# zypper install libopenssl-devel libtool libxml2-devel boost-devel gcc gcc-c++ make gmake automake

4.4.3 Install Torque Server

You must complete the prerequisite tasks and the tasks to install the dependencies, packages, or clients before installing Torque Server. See 4.4.1 Prerequisites and 4.4.2 Install Dependencies, Packages, or Clients.

On the Torque Server Host, do the following:

  1. Download the latest 6.0.3 build from the Adaptive Computing website. It can also be downloaded via command line (github method or the tarball distribution).
    • Clone the source from github.

      If git is not installed:

      [root]# yum install git
      [root]# zypper install git
      [root]# git clone https://github.com/adaptivecomputing/torque.git -b 6.0.3 6.0.3 
      [root]# cd 6.0.3
      [root]# ./autogen.sh
    • Get the tarball source distribution.
      [root]# yum install wget
      [root]# wget http://www.adaptivecomputing.com/download/torque/torque-6.0.3-<filename>.tar.gz -O torque-6.0.3.tar.gz
      [root]# tar -xzvf torque-6.0.3.tar.gz
      [root]# cd torque-6.0.3/
      [root]# zypper install wget
      [root]# wget http://www.adaptivecomputing.com/download/torque/torque-6.0.3-<filename>.tar.gz -O torque-6.0.3.tar.gz
      [root]# tar -xzvf torque-6.0.3.tar.gz
      [root]# cd torque-6.0.3/
  2. Run each of the following commands in order.
    [root]# ./configure
    [root]# make
    [root]# make install

    See Customizing the Install for information on which options are available to customize the ./configure command.

  3. Verify that the /var/spool/torque/server_name file exists and contains the correct name of the server.
  4. [root]# echo <torque_server_hostname> > /var/spool/torque/server_name
  5. Configure the trqauthd daemon to start automatically at system boot.
    [root]# cp contrib/init.d/trqauthd /etc/init.d/
    [root]# chkconfig --add trqauthd
    [root]# echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
    [root]# ldconfig
    [root]# service trqauthd start
    [root]# cp contrib/systemd/trqauthd.service /usr/lib/systemd/system/
    [root]# systemctl enable trqauthd.service
    [root]# echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
    [root]# ldconfig
    [root]# systemctl start trqauthd.service
    [root]# cp contrib/init.d/suse.trqauthd /etc/init.d/trqauthd
    [root]# chkconfig --add trqauthd
    [root]# echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
    [root]# ldconfig
    [root]# service trqauthd start
    [root]# cp contrib/systemd/trqauthd.service /usr/lib/systemd/system/
    [root]# systemctl enable trqauthd.service
    [root]# echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
    [root]# ldconfig
    [root]# systemctl start trqauthd.service
  6. By default, Torque installs all binary files to /usr/local/bin and /usr/local/sbin. Make sure the path environment variable includes these directories for both the installation user and the root user.
  7. [root]# export PATH=/usr/local/bin/:/usr/local/sbin/:$PATH
  8. Initialize serverdb by executing the torque.setup script.
  9. [root]# ./torque.setup root
  10. Add nodes to the /var/spool/torque/server_priv/nodes file. See Specifying Compute Nodes for information on syntax and options for specifying compute nodes.
  11. Configure pbs_server to start automatically at system boot, and then start the daemon.
    [root]# cp contrib/init.d/pbs_server /etc/init.d
    [root]# chkconfig --add pbs_server
    [root]# service pbs_server restart
    [root]# qterm
    [root]# cp contrib/systemd/pbs_server.service /usr/lib/systemd/system/
    [root]# systemctl enable pbs_server.service
    [root]# systemctl start pbs_server.service
    [root]# cp contrib/init.d/suse.pbs_server /etc/init.d/pbs_server
    [root]# chkconfig --add pbs_server
    [root]# service pbs_server restart
    [root]# qterm
    [root]# cp contrib/systemd/pbs_server.service /usr/lib/systemd/system/
    [root]# systemctl enable pbs_server.service
    [root]# systemctl start pbs_server.service

4.4.4 Install Torque MOMs

In most installations, you will install a Torque MOM on each of your compute nodes.

Do the following:

  1. On the Torque Server Host, do the following:
    1. Create the self-extracting packages that are copied and executed on your nodes.
      [root]# make packages
      Building ./torque-package-clients-linux-x86_64.sh ...
      Building ./torque-package-mom-linux-x86_64.sh ...
      Building ./torque-package-server-linux-x86_64.sh ...
      Building ./torque-package-gui-linux-x86_64.sh ...
      Building ./torque-package-devel-linux-x86_64.sh ...
      Done.
      
      The package files are self-extracting packages that can be copied and executed on your production machines.  Use --help for options.
    2. Copy the self-extracting packages to each Torque MOM Host.

      Adaptive Computing recommends that you use a remote shell, such as SSH, to install packages on remote systems. Set up shared SSH keys if you do not want to supply a password for each Torque MOM Host.

      The only required package for the compute node is mom-linux. Additional packages are recommended so you can use client commands and submit jobs from compute nodes.

      [root]# scp torque-package-mom-linux-x86_64.sh <mom-node>:
      [root]# scp torque-package-clients-linux-x86_64.sh <mom-node>:
    3. Copy the pbs_mom startup script to each Torque MOM Host.
      [root]# scp contrib/init.d/pbs_mom <mom-node>:/etc/init.d
      [root]# scp contrib/systemd/pbs_mom.service <mom-node>:/usr/lib/systemd/system/
      [root]# scp contrib/init.d/suse.pbs_mom <mom-node>:/etc/init.d/pbs_mom
      [root]# scp contrib/systemd/pbs_mom.service <mom-node>:/usr/lib/systemd/system/
    4. Not all sites see an inherited ulimit but those that do can change the ulimit in the pbs_mom init script. The pbs_mom init script is responsible for starting and stopping the pbs_mom process.

  2. On each Torque MOM Host, do the following:
    1. Install the self-extracting packages and run ldconfig.
      [root]# ssh root@<mom-node>
      [root]# ./torque-package-mom-linux-x86_64.sh --install
      [root]# ./torque-package-clients-linux-x86_64.sh --install
      [root]# echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
      [root]# ldconfig
    2. Configure pbs_mom to start at system boot, and then start the daemon.
      [root]# chkconfig --add pbs_mom
      [root]# service pbs_mom start
      [root]# systemctl enable pbs_mom.service
      [root]# systemctl start pbs_mom.service
      [root]# chkconfig --add pbs_mom
      [root]# service pbs_mom start
      [root]# systemctl enable pbs_mom.service
      [root]# systemctl start pbs_mom.service

4.4.5 Install Torque Clients

If you want to have the Torque client commands installed on hosts other than the Torque Server Host (such as the compute nodes or separate login nodes), do the following:

  1. On the Torque Server Host, do the following:
    1. Copy the self-extracting client package to each Torque Client Host.

      Adaptive Computing recommends that you use a remote shell, such as SSH, to install packages on remote systems. Set up shared SSH keys if you do not want to supply a password for each Torque MOM Host.

      [root]# scp torque-package-clients-linux-x86_64.sh <torque-client-host>:
    2. Copy the trqauthd startup script to each Torque Client Host.
      [root]# scp contrib/init.d/trqauthd <torque-client-host>:/etc/init.d
      [root]# scp contrib/systemd/trqauthd.service <torque-client-host>:/usr/lib/systemd/system/
      [root]# scp contrib/init.d/suse.trqauthd <torque-client-host>:/etc/init.d/trqauthd
      [root]# scp contrib/systemd/trqauthd.service <torque-client-host>:/usr/lib/systemd/system/
  2. On each Torque Client Host, do the following:

    Many of these steps can be done from the Torque server from a remote shell, such as SSH. Set up shared SSH keys if you do not want to supply a password for each Torque Client Host.

    1. Install the self-extracting client package.
      [root]# ./torque-package-clients-linux-x86_64.sh --install
      [root]# echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
      [root]# ldconfig
    2. Enable and start the trqauthd service.
      [root]# chkconfig --add trqauthd
      [root]# service trqauthd start
      [root]# systemctl enable trqauthd.service
      [root]# systemctl start trqauthd.service
      [root]# chkconfig --add trqauthd
      [root]# service trqauthd start
      [root]# systemctl enable trqauthd.service
      [root]# systemctl start trqauthd.service

4.4.6 Configure Data Management

When a batch job completes, stdout and stderr files are generated and placed in the spool directory on the master Torque MOM Host for the job instead of the submit host. You can configure the Torque batch environment to copy the stdout and stderr files back to the submit host. See Configuring Data Management for more information.

© 2017 Adaptive Computing