(Click to open topic with navigation)
Mom hierarchy is designed for large systems to configure how information is passed directly to the pbs_server.
The MOM hierarchy allows you to override the compute nodes' default behavior of reporting status updates directly to the pbs_server. Instead, you configure compute nodes so that each node sends its status update information to another compute node. The compute nodes pass the information up a tree or hierarchy until eventually the information reaches a node that will pass the information directly to pbs_server. This can significantly reduce network traffic and ease the load on the pbs_server in a large system.
Adaptive Computing recommends approximately 25 nodes per path. Numbers larger than this may reduce the system performance.
5.31.1 MOM Hierarchy Example
The following example illustrates how information is passed to the pbs_server without and with mom_hierarchy.
Click to enlarge |
The dotted lines indicates an alternate path if the hierarchy-designated node goes down.
The following is the mom_hierachy_file for the mom_hierarchy example:
<path> <level>hostA,hostB</level> <level>hostB,hostC,hostD</level> </path>
<path> <level>hostE,hostF</level> <level>hostE,hostF,hostG</level> </path>
5.31.2 Setting Up the MOM Hierarchy
The name of the file that contains the configuration information is named mom_hierarchy. By default, it is located in the /var/spool/torque/server_priv directory. The file uses syntax similar to XML:
<path> <level>comma-separated node list</level> <level>comma-separated node list</level> ... </path> ...
The <path></path> tag pair identifies a group of compute nodes. The <level></level> tag pair contains a comma-separated list of compute node names listed by their hostnames. Multiple paths can be defined with multiple levels within each path.
Within a <path></path> tag pair, the levels define the hierarchy. All nodes in the top level communicate directly with the server. All nodes in lower levels communicate to the first available node in the level directly above it. If the first node in the upper level goes down, the nodes in the subordinate level will then communicate to the next node in the upper level. If no nodes are available in an upper level then the node will communicate directly to the server.
If an upper level node has gone down and then becomes available, the lower level nodes will eventually find that the node is available and start sending their updates to that node.
If you want to specify MOMs on a different port than the default, you must list the node in the form: hostname:mom_manager_port.
For example:
<path> <level>hostname:mom_manager_port,... </level> ... </path> ...
5.31.3 Putting the MOM Hierarchy on the MOMs
You can put the MOM hierarchy file directly on the MOMs. The default location is /var/spool/torque/mom_priv/mom_hierarchy. This way, the pbs_server doesn't have to send the hierarchy to all the MOMs during each pbs_server startup. The hierarchy file still has to exist on the pbs_server and if the file versions conflict, the pbs_server version overwrites the local MOM file. When using a global file system accessible from both the MOMs and the pbs_server, it is recommended that the hierarchy file be symbolically linked to the MOMs.
Once the hierarchy file exists on the MOMs, start pbs_server with the -n option which tells pbs_server to not send the hierarchy file on startup. Instead, pbs_server waits until a MOM requests it.