The pbs_server requires awareness of how the MOM is reporting nodes since there is only one MOM daemon and multiple MOM nodes. Configure the server_priv/nodes file with the num_node_boards and numa_gpu_node_str attributes. The attribute num_node_boards tells pbs_server how many NUMA nodes are reported by the MOM. If each NUMA node has the same number of GPUs, add the total number of GPUs to the nodes file. Following is an example of how to configure the nodes file with num_node_boards:
numahost gpus=12 num_node_boards=6
This line in the nodes file tells pbs_server there is a host named numahost and that it has 12 GPUs and 6 nodes. The pbs_server divides the value of GPUs (12) by the value for num_node_boards (6) and determines there are 2 GPUs per NUMA node.
In this example, the NUMA system is uniform in its configuration of GPUs per node board, but a system does not have to be configured with the same number of GPUs per node board. For systems with non-uniform GPU distributions, use the attribute numa_gpu_node_str to let pbs_server know where GPUs are located in the cluster.
If there are equal numbers of GPUs on each NUMA node, you can specify them with a string. For example, if there are 3 NUMA nodes and the first has 0 GPUs, the second has 3, and the third has 5, you would add this to the nodes file entry:
numa_gpu_node_str=0,3,5
In this configuration, pbs_server knows it has three MOM nodes and the nodes have 0, 3s, and 5 GPUs respectively. Note that the attribute gpus is not used. The gpus attribute is ignored because the number of GPUs per node is specifically given.
In TORQUE 3.0.2 or later, qsub supports the mapping of -l gpus=X to -l gres=gpus:X. This allows users who are using NUMA systems to make requests such as -l ncpus=20,gpus=5 ( or -l ncpus=20:gpus=5)indicating they are not concerned with the GPUs in relation to the NUMA nodes they request; they only want a total of 20 cores and 5 GPUs.
Related topics