(Click to open topic with navigation)
If you intend to use Torque Resource Manager 6.1.0 with Moab Workload Manager, you must run Moab version 8.0 or later. However, some Torque functionality may not be available. See Compatibility Requirements in the Moab HPC Suite Release Notes for more information.
This topic contains instructions on how to install and start Torque Resource Manager (Torque).
For Cray systems, Adaptive Computing recommends that you install Moab and Torque Servers (head nodes) on commodity hardware (not on Cray compute/service/login nodes).
However, you must install the Torque pbs_mom daemon and Torque client commands on Cray login and "mom" service nodes since the pbs_mom must run on a Cray service node within the Cray system so it has access to the Cray ALPS subsystem.
See Installation Notes for Moab and Torque for Cray in the Moab Workload Manager Administrator Guide for instructions on installing Moab and Torque on a non-Cray server.
In this topic:
In this section:
2.2.1.A Supported Operating Systems
cgroups are supported and cpusets are handled by the cgroup cpuset subsystem.
It is recommended that you use ‑‑enable‑cgroups instead of ‑‑enable‑cpuset. ‑‑enable‑cpuset is deprecated and no new features will be added to it.
Red Hat 6-based systems come packaged with 1.41.0 and Red Hat 7-based systems come packaged with 1.53.0. If needed, use the --with-boost-path=DIR option to change the packaged boost version. See
Torque requires certain ports to be open for essential communication.
If your site is running firewall software on its hosts, you will need to configure the firewall to allow connections to the necessary ports.
Location | Ports | Functions | When Needed |
---|---|---|---|
Torque Server Host | 15001 | Torque Client and MOM communication to Torque Server | Always |
Torque MOM Host (Compute Nodes) | 15002 | Torque Server communication to Torque MOMs | Always |
Torque MOM Host (Compute Nodes) | 15003 | Torque MOM communication to other Torque MOMs | Always |
See also:
2.2.3 Install Dependencies, Packages, or Clients
In this section:
On the Torque Server Host, use the following commands to install the libxml2-devel, openssl-devel, and boost-devel packages.
[root]# yum install libtool openssl-devel libxml2-devel boost-devel gcc gcc-c++
[root]# zypper install libopenssl-devel libtool libxml2-devel boost-devel gcc gcc-c++ make gmake
Using
When cgroups are enabled (recommended), hwloc version 1.9.1 or later is required. NVIDIA K80 requires libhwloc 1.11.0.
The following instructions are for installing version 1.9.1.
Do the following:
[root]# yum install gcc make [root]# tar -xzvf hwloc-1.9.1.tar.gz [root]# cd hwloc-1.9.1 [root]# ./configure [root]# make [root]# make install
[root]# zypper install gcc make [root]# tar -xzvf hwloc-1.9.1.tar.gz [root]# cd hwloc-1.9.1 [root]# ./configure [root]# make [root]# make install
[root]# echo /usr/local/lib >/etc/ld.so.conf.d/hwloc.conf [root]# ldconfig
[root]# echo /usr/local/lib >/etc/ld.so.conf.d/hwloc.conf [root]# ldconfig
[root]# echo /usr/local/lib >/etc/ld.so.conf.d/hwloc.conf [root]# ldconfig
You must complete the prerequisite tasks and the tasks to install the dependencies, packages, or clients before installing Torque Server. See 2.2 Installing Torque Resource Manager and 2.2.3 Install Dependencies, Packages, or Clients.
On the Torque Server Host, do the following:
If git is not installed:
# Red Hat 6-based or Red Hat 7-based systems
[root]# yum install git
# SUSE 11-based or SUSE 12-based systems
[root]# zypper install git
[root]# zypper install autotools automake pkg-config
[root]# git clone https://github.com/adaptivecomputing/torque.git -b 6.1.0 6.1.0 [root]# cd 6.1.0 [root]# ./autogen.sh
[root]# yum install wget [root]# wget http://www.adaptivecomputing.com/download/torque/torque-6.1.0.tar.gz -O torque-6.1.0.tar.gz [root]# tar -xzvf torque-6.1.0.tar.gz [root]# cd torque-6.1.0/
[root]# zypper install wget [root]# wget http://www.adaptivecomputing.com/download/torque/torque-6.1.0.tar.gz -O torque-6.1.0.tar.gz [root]# tar -xzvf torque-6.1.0.tar.gz [root]# cd torque-6.1.0/
Depending on your system configuration, you will need to add ./configure command options.
At a minimum, you add:
These instructions assume you are using cgroups. When cgroups are supported, cpusets are handled by the cgroup cpuset subsystem. If you are not using cgroups, use ‑‑enable‑cpusets instead.
$ cd /usr/lib64 $ ln -s libXext.so.6.4.0 libXext.so $ ln -s libXss.so.1 libXss.so
When finished, cd back to your install directory.
See
[root]# ./configure --enable-cgroups --with-hwloc-path=/usr/local # add any other specified options [root]# make [root]# make install
[root]# . /etc/profile.d/torque.sh
[root]# ./torque.setup root
[root]# chkconfig --add pbs_server [root]# service pbs_server restart
[root]# qterm [root]# systemctl enable pbs_server.service [root]# systemctl start pbs_server.service
[root]# chkconfig --add pbs_server [root]# service pbs_server restart
[root]# qterm [root]# systemctl enable pbs_server.service [root]# systemctl start pbs_server.service
In most installations, you will install a Torque MOM on each of your compute nodes.
See Specifying Compute Nodes or Configuring Torque on Compute Nodes for more information.
Do the following:
[root]# make packages Building ./torque-package-clients-linux-x86_64.sh ... Building ./torque-package-mom-linux-x86_64.sh ... Building ./torque-package-server-linux-x86_64.sh ... Building ./torque-package-gui-linux-x86_64.sh ... Building ./torque-package-devel-linux-x86_64.sh ... Done. The package files are self-extracting packages that can be copied and executed on your production machines. Use --help for options.
Adaptive Computing recommends that you use a remote shell, such as SSH, to install packages on remote systems. Set up shared SSH keys if you do not want to supply a password for each Torque MOM Host.
[root]# scp torque-package-mom-linux-x86_64.sh <mom-node>:
[root]# scp contrib/init.d/pbs_mom <mom-node>:/etc/init.d
[root]# scp contrib/systemd/pbs_mom.service <mom-node>:/usr/lib/systemd/system/
[root]# scp contrib/init.d/suse.pbs_mom <mom-node>:/etc/init.d/pbs_mom
[root]# scp contrib/systemd/pbs_mom.service <mom-node>:/usr/lib/systemd/system/
Not all sites see an inherited ulimit but those that do can change the ulimit in the pbs_mom init script. The pbs_mom init script is responsible for starting and stopping the pbs_mom process.
ns perf_event net_prio cpuset /cgroup/cpuset cpu /cgroup/cpu cpuacct /cgroup/cpuacct memory /cgroup/memory devices /cgroup/devices freezer /cgroup/freezer net_cls /cgroup/net_cls blkio /cgroup/blkio
[root]# yum install libcgroup [root]# service cgconfig start
[root]# zypper install libcgroup1
mount { devices = /mnt/cgroups/devices; cpuset = /mnt/cgroups/cpuset; cpu = /mnt/cgroups/cpu; cpuacct = /mnt/cgroups/cpuacct; memory = /mnt/cgroups/memory; }
[root]# service cgconfig start
[root]# yum install libcgroup-tools
[root]# zypper install libcgroup-tools
[root]# ./torque-package-mom-linux-x86_64.sh --install
Configure pbs_mom to start at system boot, and then start the daemon.
[root]# chkconfig --add pbs_mom [root]# service pbs_mom start
[root]# systemctl enable pbs_mom.service [root]# systemctl start pbs_mom.service
[root]# chkconfig --add pbs_mom [root]# service pbs_mom start
[root]# systemctl enable pbs_mom.service [root]# systemctl start pbs_mom.service
If you want to have the Torque client commands installed on hosts other than the Torque Server Host (such as the compute nodes or separate login nodes), do the following:
Adaptive Computing recommends that you use a remote shell, such as SSH, to install packages on remote systems. Set up shared SSH keys if you do not want to supply a password for each Torque Client Host.
[root]# scp torque-package-clients-linux-x86_64.sh <torque-client-host>:
[root]# scp contrib/init.d/trqauthd <torque-client-host>:/etc/init.d
[root]# scp contrib/systemd/trqauthd.service <torque-client-host>:/usr/lib/systemd/system/
[root]# scp contrib/init.d/suse.trqauthd <torque-client-host>:/etc/init.d/trqauthd
[root]# scp contrib/systemd/trqauthd.service <torque-client-host>:/usr/lib/systemd/system/
[root]# ./torque-package-clients-linux-x86_64.sh --install
2.2.7 Configure Data Management
When a batch job completes, stdout and stderr files are generated and placed in the spool directory on the master Torque MOM Host for the job instead of the submit host. You can configure the Torque batch environment to copy the stdout and stderr files back to the submit host. See Configuring Data Management for more information.