Conventions

Moab-SLURM Integration Guide

Overview

Moab can be used as the scheduler for the SLURM resource manager.  In this configuration, the SLURM handles the job queue and the compute resources while Moab determines when, where and how jobs should be executed according to current cluster state and site mission objectives.

The documentation below describes how to configure Moab to interface with SLURM.

For Moab-SLURM integration, Moab 6.0 or higher and SLURM 2.2 or higher are recommended. From the downloads page, the generic version is needed to install SLURM.

SLURM Configuration Steps

To configure SLURM to utilize Moab as the scheduler, the SchedulerType parameters must be set in the slurm.conf config file located in the SLURM etc directory (/usr/local/etc by default)

# slurm.conf

SchedulerType=sched/wiki2

The SchedulerType parameter controls the communication protocol used between Moab and SLURM.  This interface can be customized using the wiki.conf configuration file located in the same directory and further documented in the SLURM Admin Manual.

Note: To allow sharing of nodes, the SLURM partition should be configured with 'Shared=yes' attribute.


Moab Configuration Steps

By default, Moab is built with WIKI interface support (which is used to interface with SLURM) when running the standard configure and make process.

To configure Moab to use SLURM, the parameter 'RMCFG' should be set to use the WIKI:SLURM protocol as in the example below.

# moab.cfg

SCHEDCFG[base] MODE=NORMAL
RMCFG[base] TYPE=WIKI:SLURM
...

Note: The RMCFG index (set to base in the example above) can be any value chosen by the site.  Also, if SLURM is running on a node other than the one on which Moab is running, then the SERVER attribute of the RMCFG parameter should be set.

Note: SLURM possesses a SchedulerPort parameter which is used to communicate with the scheduler.  Moab will auto-detect this port and communicate with SLURM automatically with no explicit configuration required.  Do NOT set Moab's SCHEDCFG[] PORT attribute to this value, this port controls Moab client communication and setting it to match the SchedulerPort value will cause conflicts.  With no changes, the default configuration will work fine.

Note: If the SLURM client commands/executables are not available on the machine running Moab, SLURM partition and other certain configuration information will not be automatically imported from SLURM, thereby requiring a manual setup of this information in Moab. In addition, the SLURM VERSION should be set as an attribute on the RMCFG parameter.  If it is not set, the default is version 1.2.0.  The following example shows how to set this line if SLURM v1.1.24 is running on a host named Node01 (set using the SERVER attribute).

# moab.cfg with SLURM on Host Node01

RMCFG[base] TYPE=WIKI:SLURM SERVER=Node01 VERSION=10124
...

Configuration for Standby and Expedite Support

SLURM's 'Standby' and 'Expedite' options are mapped to the Moab QOS feature. By default, when a SLURM interface is detected, Moab will automatically create a 'standby' and an 'expedite' QoS. By default, the 'standby' QoS will be globally accessible to all users and on all nodes and will have a lower than normal priority. Also by default, the 'expedite' QoS will not be accessible by any user, will have no node constraints, and will have a higher than normal priority.

Authorizing Users to Use 'Expedite'

To allow users to request 'expedite' jobs, the user will need to be added to the 'expedite' QoS.  This can be accomplished using the MEMBERULIST attribute as in the following example:

MEMBERULIST

# allow josh, steve, and user c1443 to submit 'expedite' jobs
QOSCFG[expedite] MEMBERULIST=josh,steve,c1443
...

Excluding Nodes for 'Expedite' and 'Standby' Usage

Both 'expedite' and 'standby' jobs can be independently excluded from certain nodes by creating a QoS-based standing reservation.

Specifically, this is accomplished by creating a reservation with a logical-not QoS ACL and a hostlist indicating which nodes are to be exempted as in the following example:

MEMBERULIST

# block expedite jobs from reserved nodes
SRCFG[expedite-blocker] QOSLIST=!expedite
SRCFG[expedite-blocker] HOSTLIST=c001[3-7],c200
SRCFG[expedite-blocker] PERIOD=INFINITY

# block standby jobs from rack 13 
SRCFG[standby-blocker] QOSLIST=!standby
SRCFG[standby-blocker] HOSTLIST=R:r13-[0-13]
SRCFG[standby-blocker] PERIOD=INFINITY
...

Quadrics Integration

If managing a cluster with a Quadrics high speed network, significant performance improvement can be obtained by instructing Moab to allocate contiguous collections of nodes. This can be accomplished by setting the NODEALLOCATIONPOLICY parameter to CONTIGUOUS as in the example below:

# moab.cfg

SCHEDCFG[cluster1]   MODE=NORMAL SERVER=head.cluster1.org
RMCFG[slurm]         TYPE=wiki:slurm
NODEALLOCATIONPOLICY CONTIGUOUS
...

Setting Up Authentication

By default, Moab will not require server authentication. However, if SLURM's wiki.conf file (default location is /usr/local/etc) contains the AuthKey parameter or a secret key is specified via SLURM's configure using the --with-key option, Moab must be configured to honor this setting. Moab configuration is specified by setting the resource manager AUTHTYPE attribute to CHECKSUM and the KEY value in the moab-private.cfg file to the secret key as in the example below.

# /usr/local/etc/wiki.conf

AuthKey=4322953
...
# moab.cfg

RMCFG[slurm]         TYPE=wiki:slurm AUTHTYPE=CHECKSUM
...
# moab-private.cfg

CLIENTCFG[RM:slurm]  KEY=4322953
...

Note: For the CHECKSUM authorization method, the key value specified in the moab-private.cfg file must be a decimal, octal, or hexadecimal value, it cannot be an arbitrary non-numeric string.

Queue/Class Support

While SLURM supports the concept of classes and queues, Moab provides a flexible alternative queue interface system.  In most cases, sites can create and manage queues by defining partitions within SLURM.  Internally, these SLURM partitions are mapped to Moab classes which can then be managed and configured using Moab's CLASSCFG parameter and mdiag -c command.

Policies

By default, SLURM systems only allow tasks from a single job to utilize the resources of a compute node. Consequently, when a SLURM interface is detected, Moab will automatically set the NODEACCESSPOLICY parameter to SINGLEJOB. To allow node sharing, the SLURM partition attribute 'Shared' should be set to FORCE in the slurm.conf as in the example below:

# slurm.conf

PartitionName=batch Nodes=node[1-64] Default=YES MaxTime=INFINITE State=UP Shared=FORCE

Moab Queue and RM Emulation

With a SLURM system, jobs can be submitted either to SLURM or to Moab. If submitted to SLURM, the standard SLURM job submission language must be used. If jobs are submitted to Moab using the msub command, then either LSF*, PBS, or Loadleveler* job submission syntax can be used. These jobs will be translated by Moab and migrated to SLURM using its native job language.

SLURM High Availability

If SLURM high availability mode is enabled, Moab will automatically detect the presence of the SLURM BackupController and utilize it if the primary fails. To verify SLURM is properly configured, issue the SLURM command 'scontrol show config | grep Backup'. To verify Moab properly detects this information, run 'mdiag -R -v | grep FallBack'.

Note: To use SLURM high availability, the SLURM parameter StateSaveLocation must point to a shared directory which is readable and writable by both the primary and backup hosts.  See the slurm.conf man page for additional information.


Related topics