You are here: 2 Installation and Configuration > Torque on NUMA Systems > Torque NUMA Configuration

2.24 Torque NUMA-Aware Configuration

This topic provides instructions for enabling NUMA-aware, including cgroups, and requires Torque 6.0 or later. For instructions on NUMA-support configurations, see 2.25 Torque NUMA-Support Configuration. This topic assumes you have a basic understanding of cgroups. See RedHat Resource Management Guide or cgroups on kernel.org for basic information on cgroups.

Torque uses cgroups to better manage cpu and memory accounting, memory enforcement, cpuset management, and binding jobs to devices such as MICs and GPUs.

Be aware of the following:

  • If you are building with cgroups enabled, you must have boost version 1.41 or later.
  • The pbs_mom daemon is the binary that interacts cgroups, but both the server and the MOM must be built with ‑‑enable-cgroups to understand all of the new structures.
  • Beginning with Torque 6.0.2, Cray-enabled Torque may be configured with cgroups. On the login node, each job will have all of the cpus and all of the memory controllers in it's cgroup.

Prerequisites

  1. Install the prerequisites found in Installing Torque Resource Manager.
  2. hwloc version 1.9.1 or later is required. Version 1.11.0 is needed if installing with NVIDIA K80 or newer GPU hardware
    • download hwloc-1.9.1.tar.gz from: https://www.open-mpi.org/software/hwloc/v1.9
    • perform the following command line actions:
    • $ tar -xzvf hwloc-1.9.1.tar.gz
      $ cd hwloc-1.9.1.tar.gz
      $ sudo ./configure
    • You do not need to overwrite the default installation of hwloc. By default hwloc will install to the /usr/local directory. You can also configure hwloc with the --prefix option to have it install to a location of your choosing. If you do not install hwloc to /usr directory you can tell Torque where to find the version you want it to use at configure time using the --with-hwloc-path option. For example:
    • ./configure --enable-cgroups --with-hwloc-path=/usr/local
    • Run make
    • sudo make install

Installation Instructions

Do the following:

  1. Install the libcgroup package.

    Red Hat-based Systems must use libcgroup version 0.40.rc1-16.el6 or later; SUSE-based systems need to use a comparative libcgroup version.

    • Red Hat 6-based systems
      yum install libcgroup
    • Red Hat 7-based systems
      yum install libcgroup-tools
    • SUSE 11-based systems
      zypper install libcgroup1
    • SUSE 12-based systems
      zypper install libcgroup-tools
  2. Enable Torque to access cgroups.
    $ ./configure --enable-cgroups --with-hwloc-path=/usr/local
  3. For a Red Hat 6-based system or a SUSE 11-based system, on each Torque MOM Host, confirm that cgroups have been mounted; if not, mount them.
    1. Run lssubsys -am.
    2. If the command is not found, or you do not see something similar to the following, then cgroups are not mounted, continue with these instructions.
      ns
      perf_event
      net_prio
      cpuset /cgroup/cpuset
      cpu /cgroup/cpu
      cpuacct /cgroup/cpuacct
      memory /cgroup/memory
      devices /cgroup/devices
      freezer /cgroup/freezer
      net_cls /cgroup/net_cls
      blkio /cgroup/blkio
    3. For Red Hat 6-based systems, install the cgroup library package and mount cgroups.
      [root]# yum install libcgroup
      [root]# service cgconfig start
      [root]# chkconfig cgconfig on
    4. For SUSE 11-based systems, do the following:
      1. Install the cgroup library package.
        [root]# zypper install libcgroup1
      2. Edit /etc/cgconfig.conf and add the following:
        mount {
                devices = /mnt/cgroups/devices;
                cpuset = /mnt/cgroups/cpuset;
                cpu = /mnt/cgroups/cpu;
                cpuacct = /mnt/cgroups/cpuacct;
                memory = /mnt/cgroups/memory;
        }
      3. Mount cgroups.
        [root]# service cgconfig start
        [root]# chkconfig cgconfig on
    5. Run lssubsys -am again and confirm cgroups are mounted.

2.24.1 Multiple cgroup Directory Configuration

If your system has more than one cgroup directory configured, you must create the trq‑cgroup‑paths file in the $TORQUE_HOME directory. This file has a list of the cgroup subsystems and the mount points for each subsystem in the syntax of <subsystem> <mount point>.

All five subsystems used by pbs_mom must be in the trq‑cgroup‑paths file. In the example that follows, a directory exists at /cgroup with subdirectories for each subsystem. Torque uses this file first to configure where it will look for cgroups.

cpuset  /cgroup/cpuset
cpuacct /cgroup/cpuacct
cpu     /cgroup/cpu
memory  /cgroup/memory
devices /cgroup/devices

© 2017 Adaptive Computing