Appendices > Appendix R: Node Allocation Plug-in Developer Kit

Conventions

Appendix R: Node Allocation Plug-in Developer Kit

R.1 Overview

Each time Moab schedules a job, it must choose the nodes on which the job will run. Moab uses the Node Allocation policy to select the available nodes to be used. Because there are so many different systems and cluster topologies, you now have the ability to create and use a node allocation plugin for allocating nodes based on your cluster's interconnect topology.

The plugin policy allows you to write your own algorithm to choose which nodes will be used. This algorithm is contained in a shared library that Moab loads at run time.

To obtain the Plug-in Developer Kit (PDK) with the header file and example code, contact your sales representative.

R.1.1 Writing the plugin

A plugin is a shared library that has specific functions and variables that will be called directly from Moab. The plugin conforms to a C language API. The API is specified through an include file: moab-plugin.h. This file must be included in the plugin code. The include file provides function definitions, structures and variables that will be used when communicating with Moab.

When you write the plugin, you need to ensure that the plugin code is robust. If the plugin crashes, Moab will crash. You will need to handle your own memory appropriately. If the plugin has memory leaks, Moab will have similar issues. If you want to maintain logs, the plugin will need to be responsible for its own logging.

R.1.1.1 API and Data Structures

The Application Programmer Interface (API) for the Moab Node Allocation Plugin consists of three data items and three entry points that must be supplied to Moab by the plugin.

Plugin Supplied Data Description
const char *PLUGIN_NAME = "Node Allocation plugin 1.1"; This character pointer is used by Moab when logging information regarding the operation of the plugin.
const char *PLUGIN_TYPE = PLUGIN_TYPE_NAME_NODEALLOCATION; This character pointer is used by Moab to verify the type of plugin. The value of this data is supplied by the moab-plugin.h source file. The plugin must set this as shown so that Moab does not attempt to use a plugin incorrectly. Moab uses this to determine whether the plugin API type is correct and to allow Moab to correctly communicate with the plugin.
const char *PLUGIN_VERSION = PLUGIN_API_VERSION; This character pointer is used by Moab to verify the API version number. The value of this data is supplied by the moab-plugin.h source file. The plugin must set this as shown so that the correct version of the moab-plugin.h is supplied to Moab. Moab uses this to determine whether the API version is correct and to allow Moab to correctly communicate with the plugin.
Load Time API Description
initialize()

int initialize(const char *name, void **data_handle)
The plugin must supply an initialize() entry point. This entry point is called for each use instance of the plugin. For example, if the plugin is used on two different partitions, the initialize() entry point will be called once for each partition.

  • Name — The name is the unique identifier which is used to distinguish multiple instances of the plugin and for logging. When configured globally, the name “ALL” will be given.
  • Data handle — The data_handle points to a location where the plugin should store a pointer to any internal data needed by the plugin between calls to the API. The actual format and structure of the data is up to the plugin. Moab will supply this pointer back to the plugin each time a plugin entry point is called. This data can provide context for the plugin usage instance.
Return codes

The initialize() entry point should return one of two return statuses as defined in moab-plugin.h:

#define   PLUGIN_RC_SUCCESS   0
#define   PLUGIN_RC_FAILURE   1



Gathering node info The initialize() entry point must gather any information about system nodes, their topology, interconnection, and configuration that it needs to make correct node allocations. Since Moab does not know what information the plugin may need, the plugin must gather this information itself.
Memory considerations The plugin may allocate memory for temporary or persistent data as needed, but must de-allocate or return the memory when finished. Not returning memory can result in memory leaks and unstable operation on the part of Moab.
Multiple access A given loaded plugin can be used by more than one partition. This means that the plugin must maintain its internal data in such a way that calls to the plugin for the separate partitions do not conflict. It is recommended that internal data be allocated and a pointer to the data be kept in the data_handle described above as opposed to using global or static variables. Any global or static data will be shared between possible multiple instances of the plugin.
Runtime API Description
node_allocate()
int node_allocate (

  void                 *data_handle,

  const char           *job_name,

  int                  container_count,

  nalloc_container_t   container[])

The plugin must provide a node_allocate() entry point. This entry point is called each time Moab needs to determine where (on what nodes) a job will eventually run. Note that this entry point can be called many times before the job is actually scheduled to run.

  • Data structures — Moab uses C data structures to pass information and lists of nodes to the plugin and receive them back from the plugin. See moab-plugin.h for the definitions of these structures and for information on how they relate to one another.
Operations

A node allocation request consists of one or more requirements. Each of these requirements is provided within a “container” structure. The container has information regarding the requirement to be met, the count and list of all nodes that are available to meet the requirement and a place to return the list of nodes that the plugin has chosen to use for the job.

Command Moab Job Task Count Job Node Count Job Tasks Per Node Node CFG Procs Node AVL Procs Plugin Node Mapped TC requirement ->taskcount return_node_count
Non-ExactNode
-l nodes=12 12 0 0 8 8 8 12 2
-l nodes=12:ppn=2 24 0 2 8 8 8 24 3
ExactNode
-l nodes=4 4 4 0 8 8 1 4 4
-l nodes=4:ppn=2 8 4 2 8 8 2 8 4
-l nodes=12 12 0 0 8 6 6 12 2

The duty of the plugin is to use the information that it has previously gathered (during the initialization) to select from the available nodes those that will best fulfill the requirements.

The basic algorithm is to consume all the taskcount and memory on each node until the consumed task count is greater than or equal to the container's task_count and memory requirements.

A job's taskcount is calculated differently based on the JOBNODEMATCHPOLICY parameter. By default, it isn't defined and -l nodes=# actually requests the number of tasks without respect to the number of nodes. In this case, the plugin should consume all the tasks of each chosen node until the taskcount is greater and/or equal to the container's taskcount requirement. The plugin is for node allocation and not task placement.

When the JOBNODEMATCHPOLICY EXACTNODE is configured, then -l nodes=# means the job wants # of nodes with 1 task per node. In this case, the nodes passed to the plugin will have a taskcount that is mapped down to what the job can only use on that node. Each node's taskcount should be consumed on each node until the summed amount is equal to the container's requirement taskcount requirement.

The following table shows how commands are interpreted by Moab and translated to the plugin and what is expected of the plugin.

Errors and return codes

The plugin may internally log any errors encountered and must return a success or error status as defined in moab-plugin.h:

#define    PLUGIN_RC_SUCCESS   0
#define    PLUGIN_RC_FAILURE   1
Multiple access safe The node_allocate() entry point must support multiple access as described above.
Unload Time API Description
finish() void finish(void *data_handle)
The plugin must supply a finish() entry point. This entry point is called when Moab is preparing to disable and/or unload an instance of the plugin.
Memory/resource cleanup The plugin must de-allocate and free up any resources acquired either during the initialize() entry point or during any calls to the node_allocate() entry point. When the last entry point returns, there should be no allocated memory or other resources still in use by the plugin instance.
Multiple access safe The finish() entry point must support multiple access as described above.

R.1.2 Moab configuration

The actual loading of a plugin is accomplished by specifying the plugin in the Moab configuration file, moab.cfg.

R.1.2.1 Moab.cfg

We recommend that you store all Moab plugins in the $MOABHOMEDIR/lib directory (e.g., /opt/moab/lib) as shared libraries (*.so). The name of the actual plugin shared library file is up to the plugin developer, which means you must give the correct name in the moab.cfg file to form the absolute plugin filename.

If a plug-in's specified shared library filename starts with a forward slash (/), it is an absolute file path name and Moab simply uses it without alteration. For example, if a plugin's specified shared library filename is /opt/moab/plugins/plugin.so, Moab will use it as the absolute plugin file path name.

If a plugin's specified shared library filename does not start with a forward slash (/), it is a plugin name and Moab forms the plugin's absolute path name by concatenating the Moab home directory, "/lib/lib", the specified plugin name, and ".so" to obtain the absolute path name. For example, if the $MOABHOMEDIR environment variable contains /opt/moab and the plugin name is plugin, Moab will create /opt/moab/lib/libplugin.so and use it as the absolute plugin file path name.

R.1.2.2 Syntax rules

In order for Moab to use a plugin for the Node Allocation policy, instead of a built-in Moab policy, you must configure the policy in the moab.cfg file with the value "PLUGIN:" followed by the plugin's shared library file name. The examples below assume the environment variable $MOABHOMEDIR has a value of /opt/moab. Note the use of relative and absolute plugin shared library file path names in the parameter value and how they affect Moab's construction of the full path name.

Partition Plug-in Name moab.cfg Parameter Moab-derived Full Path Name
global plugin.so NODEALLOCATIONPOLICY PLUGIN:plugin.so /opt/moab/lib/libplugin.so
global /usr/local/plugins/plugin.so NODEALLOCATIONPOLICY PLUGIN:/usr/local/plugins/plugin.so /usr/local/plugins/plugin.so
abc plugin.so PARCFG[abc] NODEALLOCATIONPOLICY=PLUGIN:plugin.so /opt/moab/lib/libplugin.so
xyz /usr/local/plugins/plugin.so PARCFG[xyz] NODEALLOCATIONPOLICY=PLUGIN:/usr/local/plugins/plugin.so

/usr/local/plugins/plugin.so

R.1.2.3 Troubleshooting

There are several commands that can be used to confirm that the Plugin Node Allocation Policy was loaded properly.

mschedctl -l

mschedctl -l is used to print out Moab's in memory configurations. If the plugin policy, with its full path, doesn't show for the configured partition then Moab failed to load the partition. Note that when the NODEALLOCATIONPOLICY is configured globally, it is configured on the "ALL" partition.

$ mschedctl -l -v|grep ^NODEALLOCATIONPOLICY
NODEALLOCATIONPOLICY[ALL] PLUGIN:/opt/moab/lib/libfirstavailable.so
NODEALLOCATIONPOLICY[a] PLUGIN:/opt/moab/lib/liblastavailable.so
NODEALLOCATIONPOLICY[b] CONTIGUOUS
NODEALLOCATIONPOLICY[c] PLUGIN:/opt/moab/lib/libfirstavailable.so
NODEALLOCATIONPOLICY[d] [NONE]

mdiag -C

mdiag -C is used to validate the moab.cfg configuration. With a plugin node allocation policy, Moab will validate that it can successfully load the plugin and that all of the required symbols are present.

$ mdiag -C
...
INFO: line #35 is valid: 'NODEALLOCATIONPOLICY PLUGIN:firstavailable'
INFO: line #36 is valid: 'PARCFG[a]NODEALLOCATIONPOLICY=PLUGIN:lastavailable'
INFO: line #37 is valid: 'PARCFG[b]NODEALLOCATIONPOLICY=CONTIGUOUS'
INFO: line #38 is valid: 'PARCFG[d]NODEALLOCATIONPOLICY=PLUGIN:firstavailable'