Moab-NUMA Integration Guide

Scheduling a NUMA type system requires some special configuration. Moab uses NODESETs to guarantee feasibility of large memory jobs and to enforce the topology.

Configuration

To integrate Moab and NUMA, follow these steps:

  1. Configure Moab to schedule large memory jobs. By default Moab creates a partition for each resource manager.
  2. RMCFG[torque]  TYPE=TORQUE
    PARCFG[torque] FLAGS=SharedMem

  3. Configure NODESETs with the following settings.
  4. NODESETPOLICY ONEOF
    NODESETATTRIBUTE FEATURE
    NODESETISOPTIONAL FALSE
    NODESETPRIORITYTYPE FIRSTFIT			

  5. For a simple setup, configure each node with an identical feature and then set the NODESETLIST to the that feature.
  6. NODESETLIST uv

    For a more advanced setup to handle topology, configure each node with features that represents the nodes' placement. Moab chooses the first feasible nodeset, from left to right, to start the job.
    /var/spool/torque/server_priv/nodes:
    node0   blade1 pair1 nodeset1
    node1   blade1 pair1 nodeset1
    node2   blade2 pair1 nodeset1
    node3   blade2 pair1 nodeset1
    node4   blade3 pair2 nodeset1
    node5   blade3 pair2 nodeset1
    node6   blade4 pair2 nodeset1
    node7   blade4 pair2 nodeset1
    node8   blade5 pair3 nodeset2
    node9   blade5 pair3 nodeset2
    node10  blade6 pair3 nodeset2
    node11  blade6 pair3 nodeset2
    node12  blade7 pair4 nodeset2
    node13  blade7 pair4 nodeset2
    node14  blade8 pair4 nodeset2
    node15  blade8 pair4 nodeset2
    node16  blade9 pair5 nodeset3
    node17  blade9 pair5 nodeset3
    node18  blade10 pair5 nodeset3
    node19  blade10 pair5 nodeset3
    node20  blade11 pair6 nodeset3
    node21  blade11 pair6 nodeset3
    node22  blade12 pair6 nodeset3
    node23  blade12 pair6 nodeset3
    node24  blade13 pair7 nodeset4
    node25  blade13 pair7 nodeset4
    node26  blade14 pair7 nodeset4
    node27  blade14 pair7 nodeset4
    node28  blade15 pair8 nodeset4
    node29  blade15 pair8 nodeset4
    node30  blade16 pair8 nodeset4
    node31  blade16 pair8 nodeset4
     
    moab.cfg:	
    NODESETLIST blade1,blade2,blade3,blade4,blade5,blade6,blade7,blade8,blade9,blade10,blade11,blade12,blade13,blade14,blade15,blade16,pair1,pair2,pair3,pair4,pair5,pair6,pair7,pair8,nodeset1,nodeset2,nodeset3,nodeset4

  7. Configure Moab to use the "PRIORITY" NODEALLOCATIONPOLICY. This allocation policy is used to allocate enough nodes to fulfill a job's processor and memory requirement.
  8. NODEALLOCATIONPOLICY PRIORITY

  9. In order to schedule large memory correctly and efficiently, NODEACCESSPOLICY must be set to SINGLEJOB. This is necessary even when a job uses only memory on a NUMA node in order to preserve performance and not adversely affect the job's walltime estimate since allowing another job to use the NUMA node's processors will vastly affect performance.

  10. NODEACCESSPOLICY SINGLEJOB

Job Submission

Jobs can request processors and memory using the -l nodes=<number of cpus> and -l mem=<amount of memory> syntaxes. JOBNODEMATCHPOLICY EXACTNODE should not be configured on a NUMA system. Submissions should use the sharedmem job flag to force jobs to run only on a sharedmem partition. For example:

qsub -l nodes=3,mem=64gb,flags=sharedmem

Copyright © 2012 Adaptive Computing Enterprises, Inc.®