You are here: 2 Installation and Configuration > Torque on NUMA Systems > Torque NUMA Configuration

2.23 Torque NUMA-Aware Configuration

Torque NUMA-Aware Configuration

This topic provides instructions for enabling NUMA-aware, including cgroups, and requires Torque 6.0 or later. For instructions on NUMA-support configurations, see 2.24 Torque NUMA-Support Configuration.

Torque uses cgroups to better manage cpu and memory accounting, memory enforcement, cpuset management, and binding jobs to devices such as MICs and GPUs.

If you are building with cgroups enabled, you must have boost version 1.41 or later.

The pbs_mom daemon is the only Torque binary that uses cgroups.

This topic assumes you have a basic understanding of cgroups. See RedHat Resource Management Guide or cgroups on kernel.org for basic information on cgroups.

Prerequisites

  1. Install the prerequisites found in Installing Torque Resource Manager.
  2. hwloc version 1.9 or later is required. Version 1.11 is needed if installing with NVIDIA K80 or newer GPU hardware
    • download hwloc-1.9.tar.gz from: https://www.open-mpi.org/software/hwloc/v1.9
    • perform the following command line actions:
    • $ tar -xzvf hwloc-1.9.tar.gz

      $ cd hwloc-1.9.tar.gz

      $ sudo ./configure

    • You do not need to overwrite the default installation of hwloc. By default hwloc will install to the /usr/local directory. You can also configure hwloc with the --prefix option to have it install to a location of your choosing. If you do not install hwloc to /usr directory you can tell Torque where to find the version you want it to use at configure time using the --with-hwloc-path option. For example:
    • ./configure --enable-cgroups --with-hwloc-path=/usr/local
    • Run make
    • sudo make install

Installation Instructions

Do the following:

  1. Install the libcgroup package.

    Red Hat-based Systems must use libcgroup version 0.40.rc1-16.el6 or later; SUSE-based systems need to use a comparative libcgroup version.

    • Red Hat-based systems
      sudo yum install libcgroup
      yum install libcgroup-tools
    • SUSE-based systems
      zypper install libcgroup-tools
  2. Enable Torque to access cgroups.

    $ ./configure --enable-cgroups

  3. Run lssubsys -am to determine whether your system has mounted cgroups.
    1. If cgroups are not mounted, you will see:
      $ lssubsys -am
      
      nsperf_event
      net_prio
      cpuset
      cpu
      cpuacct
      memory
      devices
      freezer
      net_cls blkio
    2. If cgroups are mounted, you will see:
      $ lssubsys -am
      
      ns
      perf_event
      net_prio
      cpuset,cpu,cpuacct /cgroup/cpu
      memory /cgroup/memory
      devices /cgroup/devices
      freezer /cgroup/freezer
      net_cls /cgroup/net_cls
      blkio /cgroup/blkio
  4. If you determined that cgroups are not mounted, take one of the following actions; otherwise, proceed to the next step.
    • Follow the cgroup mounting instructions for your Red Hat operating system.
    • Manually mount cgroups from the command line.
      mount -t cgroup -o <subsystem>[,<subsystem>,...] name <dir path>/name 

      The name parameter will be the name of the hierarchy.

      The following commands create five hierarchies, one for each subsystem.

      mount -t cgroup -o cpuset cpuset /var/spool/torque/cgroup/cpuset
      mount -t cgroup -o cpu cpu /var/spool/torque/cgroup/cpu
      mount -t cgroup -o cpuacct cpuacct /var/spool/torque/cgroup/cpuacct
      mount -t cgroup -o memory memory /var/spool/torque/cgroup/memory
      mount -t cgroup -o devices devices /var/spool/torque/cgroup/devices

    Once you have mounted the cgroups, run lssubsys -am again. You should now see:

    cpuset /var/spool/torque/cgroup/cpuset
    cpu /var/spool/torque/cgroup/cpu
    cpuacct /var/spool/torque/cgroup/cpuacct
    memory /var/spool/torque/cgroup/memory
    devices /var/spool//torque/cgroup/devices
    freezer
    blkio
    perf_event

2.23.1 Multiple cgroup Directory Configuration

If your system has more than one cgroup directory configured, you must create the trq‑cgroup‑paths file in the $TORQUE_HOME directory. This file has a list of the cgroup subsystems and the mount points for each subsystem in the syntax of <subsystem> <mount point>.

All five subsystems used by pbs_mom must be in the trq‑cgroup‑paths file. In the example that follows, a directory exists at /cgroup with subdirectories for each subsystem. Torque uses this file first to configure where it will look for cgroups.

cpuset  /cgroup/cpuset
cpuacct /cgroup/cpuacct
cpu     /cgroup/cpu
memory  /cgroup/memory
devices /cgroup/devices

2.23.2 Change Considerations for pbs_mom

In order to improve performance when removing cgroup hierarchies and job files Torque 6.0.0 added a new MOM configuration parameter $thread_unlink_calls. This parameter puts the job file cleanup processes on their own thread which increases the performance of the MOM. However, the addition of new threads also increases the size of pbs_mom from around 50 mb to 100mb.

$thread_unlink_calls is true by default which will thread job deletion. If pbs_mom is too large for your configuration set $thread_unlink_calls to false and jobs will be deleted within the main pbs_mom thread.

© 2016 Adaptive Computing