(Click to open topic with navigation)
Linux kernel 2.6 Cpusets are logical, hierarchical groupings of CPUs and units of memory. Once created, individual processes can be placed within a cpuset. The processes will only be allowed to run/access the specified CPUs and memory. Cpusets are managed in a virtual file system mounted at /dev/cpuset. New cpusets are created by simply making new directories. Cpusets gain CPUs and memory units by simply writing the unit number to files within the cpuset.
All nodes using cpusets must have the hwloc library version 1.2 or higher installed. If you use CentOS 6, an automatic download – yum install hwloc-devel, for instance – will install the correct version; however, if you use CentOS 5 or SLES11 SP2, you will need to build it from the source using contrib/hwloc_install.sh or equivalent.
When started, pbs_mom will create an initial top-level cpuset at /dev/cpuset/torque. This cpuset contains all CPUs and memory of the host machine. If this "torqueset" already exists, it will be left unchanged to allow the administrator to override the default behavior. All subsequent cpusets are created within the torqueset.
When a job is started, the jobset is created at /dev/cpuset/torque/$jobid and populated with the CPUs listed in the exec_host job attribute. Also created are individual tasksets for each CPU within the jobset. This happens before prologue, which allows it to be easily modified, and it happens on all nodes.
The top-level batch script process is executed in the jobset. Tasks launched through the TM interface (pbsdsh and PW's mpiexec) will be executed within the appropriate taskset.
On job exit, all tasksets and the jobset are deleted.
To configure cpuset
mount -t cpuset none /dev/cpuset
Do this for each MOM that is to use cpusets.
Presently, any job can request a single CPU and proceed to use everything available in the machine. This is occasionally done to circumvent policy, but most often is simply an error on the part of the user. Cpuset support will easily constrain the processes to not interfere with other jobs.
Jobs on larger NUMA systems may see a performance boost if jobs can be intelligently assigned to specific CPUs. Jobs may perform better if striped across physical processors, or contained within the fewest number of memory controllers.
TM tasks are constrained to a single core, thus a multi-threaded process could seriously suffer.