You are here: Manual Installation > Upgrading > Upgrading Torque Resource Manager

2.19 Upgrading Torque Resource Manager

Torque 6.1 binaries are backward compatible with Torque 5.0 or later. However they are not backward compatible with Torque versions prior to 5.0. When you upgrade to Torque 6.1.1.1 from versions prior to 5.0, all MOM and server daemons must be upgraded at the same time.

The job format is compatible between 6.1 and previous versions of Torque and any queued jobs will upgrade to the new version. It is not recommended to upgrade Torque while jobs are in a running state.

This topic contains instructions on how to upgrade and start Torque Resource Manager (Torque).

If you need to upgrade a Torque version prior to 4.0, contact Adaptive Computing.

See 5.516 Considerations Before Upgrading in the Torque Resource Manager Administrator Guide for additional important information, including about how to handle running jobs during an upgrade, mixed server/MOM versions, and the possibility of upgrading the MOMs without having to take compute nodes offline.

In this topic:

2.19.1 Before You Upgrade

This section contains information of which you should be aware before upgrading.

In this section:

2.19.1.A Running Jobs

Before upgrading the system, all running jobs must complete. To prevent queued jobs from starting, nodes can be set to offline or all queues can be disabled (using the "started" queue attribute). See pbsnodes or Queue Attributes in the Torque Resource Manager Administrator Guide for more information.

2.19.1.B Cray Systems

For upgrading Torque to 6.1.1.1 on a Cray system, refer to the Installation Notes for Moab and Torque for Cray in Appendix G of the Moab Workload Manager Administrator Guide.

2.19.1.C hwloc

Using "zypper install hwloc" may install an older, non-supported version.

When cgroups are enabled (recommended), hwloc version 1.9.1 or later is required. NVIDIA K80 requires libhwloc 1.11.0. If cgroups are to be enabled, check the Torque Server Host to see if the required version of hwloc is installed. You can check the version number by running the following command:

  • [root]# hwloc-info --version
  • The following instructions are for installing version 1.9.1.

    If hwloc is not installed or needs to be upgraded to the required version, do the following:

    1. On the Torque Server Host, each Torque MOM Host, and each Torque Client Host, do the following:
      1. Download hwloc-1.9.1.tar.gz from https://www.open-mpi.org/software/hwloc/v1.9.
      2. Run each of the following commands in order.
        [root]# zypper install gcc make
        [root]# tar -xzvf hwloc-1.9.1.tar.gz
        [root]# cd hwloc-1.9.1
        [root]# ./configure
        [root]# make
        [root]# make install
    2. Run the following commands on the Torque Server Host only.
      [root]# echo /usr/local/lib >/etc/ld.so.conf.d/hwloc.conf
      [root]# ldconfig

    2.19.2 Stop Torque Services

    Do the following:

    1. On the Torque Server Host, shut down the Torque server.
      [root]# systemctl stop pbs_server.service
    2. On each Torque MOM Host, shut down the Torque MOM service.

      Confirm all jobs have completed before stopping pbs_mom. You can do this by typing "momctl -d3". If there are no jobs running, you will see the message "NOTE: no local jobs detected" towards the bottom of the output. If jobs are still running and the MOM is shutdown, you will only be able to track when the job completes and you will not be able to get completion codes or statistics.

      [root]# systemctl stop pbs_mom.service
    3. On each Torque Client Host (including the Moab Server Host, the Torque Server Host, and the Torque MOM Hosts, if applicable), shut down the trqauthd service.
      [root]# systemctl stop trqauthd.service

    2.19.3 Upgrade the Torque Server

    You must complete all the previous upgrade steps in this topic before upgrading Torque server. See the list of steps at the beginning of this topic.

    On the Torque Server Host, do the following:

    1. Back up your server_priv directory.
      [root]# tar -cvf backup.tar.gz TORQUE_HOME/server_priv
    2. If not already installed, install the Boost C++ headers.
      [root]# zypper install boost-devel
    3. Download the latest Torque build from the Adaptive Computing website.
    4. Depending on your system configuration, you will need to add ./configure command options.

      At a minimum, you add:

      • ‑‑enable‑cgroups
      • ‑‑with‑hwloc‑path=/usr/local See 1.2.1 Torque for more information.

      These instructions assume you are using cgroups. When cgroups are supported, cpusets are handled by the cgroup cpuset subsystem. If you are not using cgroups, use ‑‑enable‑cpusets instead.

      If ‑‑enable‑gui is part of your configuration, do the following:

      $ cd /usr/lib64
      $ ln -s libXext.so.6.4.0 libXext.so
      $ ln -s libXss.so.1 libXss.so

      When finished, cd back to your install directory.

      See Customizing the Install in the Torque Resource ManagerAdministrator Guide for more information on which options are available to customize the ./configure command.

    5. Install the latest Torque tarball.
      [root]# cd /tmp
      [root]# tar xzvf torque-6.1.1.1.tar.gz
      [root]# cd torque-6.1.1.1
      [root]# ./configure --enable-cgroups --with-hwloc-path=/usr/local # add any other specified options
      [root]# make
      [root]# make install

    2.19.4 Update the Torque MOMs

    Do the following:

    1. On the Torque Server Host, do the following:
      1. Create the self-extracting packages that are copied and executed on your nodes.
        [root]# make packages
        Building ./torque-package-clients-linux-x86_64.sh ...
        Building ./torque-package-mom-linux-x86_64.sh ...
        Building ./torque-package-server-linux-x86_64.sh ...
        Building ./torque-package-gui-linux-x86_64.sh ...
        Building ./torque-package-devel-linux-x86_64.sh ...
        Done.
        
        The package files are self-extracting packages that can be copied and executed on your production machines.  Use --help for options.
      2. Copy the self-extracting mom package to each Torque MOM Host.

        Adaptive Computing recommends that you use a remote shell, such as SSH, to install packages on remote systems. Set up shared SSH keys if you do not want to supply a password for each Torque MOM Host.

        [root]# scp torque-package-mom-linux-x86_64.sh <torque-mom-host>:
    2. On each Torque MOM Host, do the following:
      1. Install cgroup-tools.
        [root]# zypper install libcgroup-tools
      2. Install the self-extracting MOM package.
        [root]# ./torque-package-mom-linux-x86_64.sh --install

    2.19.5 Update the Torque Clients

    This section contains instructions on updating the Torque clients on the Torque Client Hosts (including the Moab Server Host and Torque MOM Hosts, if applicable).

    1. On the Torque Server Host, do the following:
      1. Copy the self-extracting client package to each Torque Client Host.

        Adaptive Computing recommends that you use a remote shell, such as SSH, to install packages on remote systems. Set up shared SSH keys if you do not want to supply a password for each Torque MOM Host.

        [root]# scp torque-package-clients-linux-x86_64.sh <torque-client-host>:
      2. If Moab Workload Manager is part of your configuration, copy the self-extracting devel package to the Moab Server Host.
        [root]# scp torque-package-devel-linux-x86_64.sh <moab-server-host>:
    2. On each Torque Client Host, do the following:

      This step can be done from the Torque server from a remote shell, such as SSH. Set up shared SSH keys if you do not want to supply a password for each Torque Client Host.

      [root]# ./torque-package-clients-linux-x86_64.sh --install
    3. If Moab Workload Manager is part of your configuration, do the following on the Moab Server Host:
      [root]# ./torque-package-devel-linux-x86_64.sh --install

    2.19.6 Start Torque Services

    Do the following:

    1. On each Torque Client Host (including the Moab Server Host, Torque Server Host and Torque MOM Hosts, if applicable), start up the trqauthd service.
      [root]# systemctl daemon-reload
      [root]# systemctl start trqauthd.service
    2. On each Torque MOM Host, start up the Torque MOM service.
      [root]# systemctl daemon-reload
      [root]# systemctl start pbs_mom.service
    3. On the Torque Server Host, start up the Torque server.
      [root]# systemctl daemon-reload
      [root]# systemctl start pbs_server.service

    2.19.7 Perform Status and Error Checks

    On the Torque Server Host, do the following:

    © 2017 Adaptive Computing