(Click to open topic with navigation)
This topic serves as a central information repository for NUMA-aware systems. This topic provides basic information and contains links to the various NUMA-aware topics found throughout the documentation.
Support for NUMA-aware systems is available only with Torque Resource Manager 6.0 and later and Moab Workload Manager 9.0 and later.
In this topic:
The NUMA-aware architecture is a hardware design which separates its cores into multiple clusters where each cluster has its own local memory region and still allows cores from one cluster to access all memory in the system. However, if a processor needs to use memory that is not its own memory region, it will take longer to access that (remote) memory. For applications where performance is crucial, preventing the need to access memory from other clusters is critical.
Torque uses cgroups to better manage cpu and memory accounting, memory enforcement, cpuset management, and binding jobs to devices such as MICs and GPUs. Torque will try to place jobs which request GPUs or MICs on NUMA nodes next to the GPU or MIC device to be used.
PCIe devices are similar to cores in that these devices will be closer to the memory of one NUMA node than another. Examples of PCIe devices are GPUs, NICs, disks, etc.
The resources of a processor chip have a hierarchy. The largest unit is a socket. A socket can contain one or more NUMA nodes with its cores and memory. A NUMA node will contain a set of cores and threads and memory which is local to the NUMA node. A core may have 0 or more threads.
The following image is a simple depiction of a NUMA-aware architecture. In this example, the system has two NUMA nodes with four cores per NUMA node. The cores in each NUMA node have access to their own memory region but they can also access the memory region of the other NUMA node through the inter-connect.
If the cores from NUMA chip 0 need to get memory from NUMA chip 1 there will be a greater latency to fetch the memory.
Click to enlarge
Once Torque is first installed, you need to perform configuration steps.
NUMA-aware resources can be requested at the time of job submission using the qsub/msub -L parameter. In addition, the req_infomation_max and req_information_min queue attributes let you specify the maximum and minimum resource limits allowed for jobs submitted to a queue.
When using NUMA-aware, job resources are tracked per task. qstat -f produces a new category of information that begins with the " req_information" keyword. Following each "req_information keyword" is another keyword giving information about how the job was allocated. When the job has completed, the output will also include the per task resident memory used and per task cpu time used.
Moab does not require special configuration to support this NUMA-aware system. However, there are a few Moab-specific things that would be helpful to know and understand.