(Click to open topic with navigation)
11.2.1 Defining and Configuring Resource Manager Interfaces
Moab resource manager interfaces are defined using the RMCFG parameter. This parameter allows specification of key aspects of the interface. In most cases, only the TYPE attribute needs to be specified and Moab determines the needed defaults required to activate and use the selected interface. In the following example, an interface to a Loadleveler resource manager is defined.
RMCFG[orion] TYPE=LL...
Note that the resource manager is given a label of orion. This label can be any arbitrary site-selected string and is for local usage only. For sites with multiple active resource managers, the labels can be used to distinguish between them for resource manager specific queries and commands.
11.2.1.A Resource Manager Attributes
The following table lists the possible resource manager attributes that can be configured.
CHECKPOINTSIG | |
---|---|
Format | One of suspend, <INTEGER>, or SIG<X> |
Description | Specifies what signal to send the resource manager when a job is checkpointed. See Checkpoint Overview. |
Example |
RMCFG[base] CHECKPOINTSIG=SIGTERM Moab routes the signal SIGTERM through the resource manager to the job when a job is checkpointed. |
CHECKPOINTTIMEOUT | |
---|---|
Format | [[[DD:]HH:]MM:]SS |
Default | 0 (no timeout) |
Description | Specifies how long Moab waits for a job to checkpoint before canceling it. If set to 0, Moab does not cancel the job if it fails to checkpoint. See Checkpoint Overview. |
Example |
RMCFG[base] CHECKPOINTTIMEOUT=5:00 Moab cancels any job that has not exited 5 minutes after receiving a checkpoint request. |
CLIENT | |
---|---|
Format | <PEER> |
Default | Use name of resource manager for peer client lookup |
Description | If specified, the resource manager will use the peer value to authenticate remote connections. See configuring peers. If not specified, the resource manager will search for a CLIENTCFG[<X>] entry of RM:<RMNAME>in the moab-private.cfg file. |
Example |
RMCFG[clusterBI] CLIENT=clusterB Moab will look up and use information for peer clusterB when authenticating the clusterBI resource manager. |
CLUSTERQUERYURL | |
---|---|
Format | [file://<path> |
http://<address> | <path>] If file:// is specified, Moab treats the destination as a flat text file. If http:// is specified, Moab treats the destination as a hypertext transfer protocol file. If just a path is specified, Moab treats the destination as an executable. |
Description | Specifies how Moab queries the resource manager See Native RM, URL Notes, and interface details. |
Example |
RMCFG[base] CLUSTERQUERYURL=file:///tmp/cluster.config Moab reads /tmp/cluster.config when it queries base resource manager. |
FBSERVER | |
---|---|
Format: | <RMNAME> |
Description: | Specifies the fallback server to use when talking to Moab in an HA configuration. |
Example: |
RMCFG[base] TYPE=MOAB SERVER=server1 FBSERVER=server1-ha |
FLAGS | |
---|---|
Format | Comma-delimited list of zero or selected resource manger flags. See 11.2.2.D Resource Manager Flags for valid values. |
Description | Specifies various attributes of the resource manager. |
Example |
RMCFG[base] FLAGS=asyncstart Moab directs the resource manager to start the job asynchronously. |
HOST | |
---|---|
Format | <STRING> |
Default | localhost |
Description | The host name of the machine on which the resource manager server is running. |
Example |
RMCFG[base] host=server1 |
IGNHNODES | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | Specifies whether to read in the PBSPro host nodes. This parameter is used in conjunction with USEVNODES. When both are set to TRUE, the host nodes are not queried. |
Example |
RMCFG[pbs] IGNHNODES=TRUE |
JOBCANCELURL | |
---|---|
Format | <protocol>://[<host>[:<port>]][<path>] |
Default | --- |
Description | Specifies how Moab cancels jobs via the resource manager. See URL Notes. |
Example |
RMCFG[base] JOBCANCELURL=exec:///opt/moab/job.cancel.lsf.pl Moab executes /opt/moab/job.cancel.lsf.pl to cancel specific jobs. |
JOBEXTENDDURATION | |
---|---|
Format | [[[DD:]HH:]MM:]SS[,[[[DD:]HH:]MM:]SS][!][<] (or <MIN TIME>[,<MAX TIME>][!]) |
Default | --- |
Description |
Specifies the minimum and maximum amount of time that can be added to a job's walltime if it is possible for the job to be extended. See MINWCLIMIT. As the job runs longer than its current specified minimum wallclock limit (-l minwclimit, for example), Moab attempts to extend the job's limit by the minimum JOBEXTENDDURATION. This continues until either the extension can no longer occur (it is blocked by a reservation or job), the maximum JOBEXTENDDURATION is reached, or the user's specified wallclock limit (-l walltime) is reached. When a job is extended, it is marked as PREEMPTIBLE, unless the ! is appended to the end of the configuration string. If the < is at the end of the string, however, the job is extended the maximum amount possible. JOBEXTENDDURATION and JOBEXTENDSTARTWALLTIME TRUE cannot be configured together. If they are in the same moab.cfg or are both active, then the JOBEXTENDDURATION will not be honored. For example, comment out the JOBEXTENDSTARTWALLTIME. RMCFG[base] JOBEXTENDDURATION=30,1:00:00 #JOBEXTENDSTARTWALLTIME TRUE |
Example |
RMCFG[base] JOBEXTENDDURATION=30,1:00:00 Moab extends a job's walltime by 30 seconds each time the job is about to run out of walltime until it is bound by one hour, a reservation/job, or the job's original "maximum" wallclock limit. |
JOBMODIFYURL | |
---|---|
Format | <protocol>://[<host>[:<port>]][<path>] |
Default | --- |
Description | Specifies how Moab modifies jobs via the resource manager. See URL Notes, and interface details. |
Example |
RMCFG[base] JOBMODIFYURL=exec://$TOOLSDIR/job.modify.dyn.pl Moab executes /opt/moab/job.modify.dyn.pl to modify specific jobs. |
JOBRSVRECREATE | |
---|---|
Format | Boolean |
Default | TRUE |
Description | Specifies whether Moab will re-create a job reservation each time job information is updated by a resource manager. See Considerations for Large Clusters for more information. |
Example |
RMCFG[base] JOBRSVRECREATE=FALSE Moab only creates a job reservation once when the job first starts. |
JOBSTARTURL | |
---|---|
Format | <protocol>://[<host>[:<port>]][<path>] |
Default | TRUE |
Description | Specifies how Moab starts jobs via the resource manager. See URL Notes. |
Example |
RMCFG[base] JOBSTARTURL=http://orion.bsu.edu:1322/moab/jobstart.cgi Moab triggers the jobstart.cgi script via http to start specific jobs. |
JOBSUBMITURL | |
---|---|
Format | <protocol>://[<host>[:<port>]][<path>] |
Description | Specifies how Moab submits jobs to the resource manager. See URL Notes. |
Example |
RMCFG[base] JOBSUBMITURL=exec://$TOOLSDIR/job.submit.dyn.pl Moab submits jobs directly to the database located on host dbserver.flc.com. |
JOBSUSPENDURL | |
---|---|
Format | <protocol>://[<host>[:<port>]][<path>] |
Description | Specifies how Moab suspends jobs via the resource manager. See URL Notes. |
Example |
RMCFG[base] JOBSUSPENDURL=EXEC://$HOME/scripts/job.suspend Moab executes the job.suspend script when jobs are suspended. |
JOBVALIDATEURL | |
---|---|
Format | <protocol>://[<host>[:<port>]][<path>] |
Description | Specifies how Moab validates newly submitted jobs. See URL Notes. If the script returns with a non-zero exit code, the job is rejected. See User Proxying/Alternate Credentials. |
Example |
RMCFG[base] JOBVALIDATEURL=exec://$TOOLS/job.validate.pl Moab executes the 'job.validate.pl' script when jobs are submitted to verify they are acceptable. |
MAXDSOP | |
---|---|
Format | <INTEGER> |
Default | -1 (unlimited) |
Description | Specifies the maximum number of data staging operations that may be simultaneously active. |
Example |
RMCFG[ds] MAXDSOP=16 |
NODEFAILURERSVPROFILE | |
---|---|
Format | <STRING> |
Description | Specifies the rsv template to use when placing a reservation onto failed nodes. See also NODEFAILURERESERVETIME. |
Example |
The scheduler will use the long rsv profile when creating reservations over failed nodes belonging to base. |
OMAP | |
---|---|
Format | <protocol>://[<host>[:<port>]][<path>] |
Description | Specifies an object map file that is used to map credentials and other objects when using this resource manager peer. See Grid Credential Management for full details. |
Example |
moab.cfg RMCFG[peer1] OMAP=file:///opt/moab/omap.dat When communicating with the resource manager peer1, objects are mapped according to the rules defined in the /opt/moab/omap.dat file. |
RESOURCETYPE | |
---|---|
Format | {COMPUTE|FS|LICENSE|NETWORK|PROV} |
Description | Specifies which type of resource this resource manager is configured to control. See Native Resource Managers for more information. |
Example |
RMCFG[base] TYPE=NATIVE RESOURCETYPE=FS Resource manager base will function as a NATIVE resource manager and control file systems. |
SOFTTERMSIG | |
---|---|
Format | <INTEGER>or SIG<X> |
Description | Specifies what signal to send the resource manager when a job reaches its soft wallclock limit. See JOBMAXOVERRUN. |
Example |
RMCFG[base] SOFTTERMSIG=SIGUSR1 Moab routes the signal SIGUSR1 through the resource manager to the job when a job reaches its soft wallclock limit. |
SYNCJOBID | |
---|---|
Format | <BOOLEAN> |
Description |
Specifies that Moab should migrate jobs to the local resource manager with the job's Moab-assigned job ID. In a grid, the grid-head will only pass dependencies to the underlying Moab if SYNCJOBID is set. This attribute can be used with the JOBIDFORMAT attribute and PROXYJOBSUBMISSION flag in order to synchronize job IDs between Moab and the resource manager. For more information about all steps necessary to synchronize job IDs between Moab and Torque, see Synchronizing Job IDs in Torque and Moab. |
Example |
RMCFG[slurm] TYPE=wiki:slurm SYNCJOBID=TRUE |
SYSTEMMODIFYURL | |
---|---|
Format | [exec://<path> |
http://<address> | <path>] If exec:// is specified, Moab treats the destination as an executable file; if http:// is specified, Moab treats the destination as a hypertext transfer protocol file. |
Description | Specifies how Moab modifies attributes of the system. This interface is used in data staging. |
Example |
RMCFG[base] SYSTEMMODIFYURL=exec:///tmp/system.modify.pl Moab executes /tmp/system.modify.pl when it modifies system attributes in conjunction with the resource manager base. |
SYSTEMQUERYURL | |
---|---|
Format | [exec://<path> |
http://<address> | <path>] If file:// is specified, Moab treats the destination as a flat text file; if http:// is specified, Moab treats the destination as a hypertext transfer protocol file; if just a path is specified, Moab treats the destination as an executable. |
Description | Specifies how Moab queries attributes of the system. This interface is used in data staging. |
Example |
RMCFG[base] SYSTEMQUERYURL=file:///tmp/system.query Moab reads /tmp/system.query when it queries the system in conjunction with base resource manager. |
TRIGGER | |
---|---|
Format | <TRIG_SPEC> |
Description | A trigger specification indicating behaviors to enforce in the event of certain events associated with the resource manager, including resource manager start, stop, and failure. |
Example |
RMCFG[base] TRIGGER=<X> |
TYPE | |
---|---|
Format | <RMTYPE>[:<RMSUBTYPE>] where <RMTYPE> is one of the following: Torque, NATIVE, PBS, RMS, SSS, or WIKI and the optional <RMSUBTYPE> value is one of RMS. |
Default | PBS |
Description | Specifies type of resource manager to be contacted by the scheduler.
For TYPE WIKI, AUTHTYPE must be set to CHECKSUM. The <RMSUBTYPE> option is currently only used to support Compaq's RMS resource manager in conjunction with PBS. In this case, the value PBS:RMS should be specified. |
Example |
RMCFG[clusterA] TYPE=PBS HOST=clusterA PORT=15003 RMCFG[clusterB] TYPE=PBS HOST=clusterB PORT=15005 Moab interfaces to two different PBS resource managers, one located on server clusterA at port 15003 and one located on server clusterB at port 15005. |
USEVNODES | |
---|---|
Format | <BOOLEAN> |
Default | FALSE |
Description | Specifies whether to schedule on PBS virtual nodes. When set to TRUE, Moab queries PBSPro for vnodes and puts jobs on vnodes rather than hosts. In some systems, such as PBS + Altix, it may not be desirable to read in the host nodes; for such situations refer to the IGNHNODES attribute. |
Example |
RMCFG[pbs] USEVNODES=TRUE |
WORKLOADQUERYURL | |
---|---|
Format | [file://<path> |
http://<address> | <path>] If file:// is specified, Moab treats the destination as a flat text file; if http:// is specified, Moab treats the destination as a hypertext transfer protocol file; if just a path is specified, Moab treats the destination as an executable. |
Description | Specifies how Moab queries the resource manager for workload information. (See Native RM, URL Notes, and interface details.) |
Example |
RMCFG[Torque] WORKLOADQUERYURL=exec://$TOOLSDIR/job.query.dyn.pl Moab executes /opt/moab/tools/job.query.dyn.pl to obtain updated workload information from resource manager Torque. |
URL parameters can load files by using the file, exec, and http protocols.
For the protocol file, Moab loads the data directly from the text file pointed to by path.
RMCFG[base] SYSTEMQUERYURL=file:///tmp/system.query
For the protocol exec, Moab executes the file pointed to by path and loads the output written to STDOUT. If the script requires arguments, you can use a question mark (?) between the script name and the arguments, and an ampersand (&) for each space.
RMCFG[base] JOBVALIDATEURL=exec://$TOOLS/job.validate.pl RMCFG[native] CLUSTERQUERYURL=exec://opt/moab/tools/cluster.query.pl?-group=group1&-arch=x86
Synchronizing Job IDs in Torque and Moab
Unless you use an msub submit filter or you're in a grid, it is recommended that you use your RM-specific job submission command (for instance, qsub).
In order to synchronize your job IDs between Torque and Moab you must perform the following steps:
RMCFG[torque] TYPE=PBS SYNCJOBID=TRUE
RMCFG[torque] TYPE=PBS SYNCJOBID=TRUE
RMCFG[torque] FLAGS=PROXYJOBSUBMISSION
RMCFG[torque] TYPE=PBS SYNCJOBID=TRUE
RMCFG[torque] FLAGS=PROXYJOBSUBMISSION
RMCFG[internal] JOBIDFORMAT=INTEGER
11.2.2 Resource Manager Configuration Details
As with all scheduler parameters, follows the syntax described within the Parameters Overview.
11.2.2.A Resource Manager Types
The RMCFG parameter allows the scheduler to interface to multiple types of resource managers using the TYPE or SERVER attributes. Specifying these attributes, any of the following listed resource managers may be supported.
Type | Resource managers | Details |
---|---|---|
Moab | Moab Workload Manager | Use the Moab peer-to-peer (grid) capabilities to enable grids and other configurations. (See Grid Configuration.) |
MWS | Moab Web Services | The MWS resource manager type is a native integration between Moab and MWS. Resource manager data is passed directly between Moab and MWS using JSON (rather than Moab's native WIKI syntax). This simplifies RM configuration for systems where one or more MWS plugins are acting as resource managers. See the "Moab Workload Manager resource manager integration" section of the MWS plugins chapter in the MWS documentation for more information. |
Native | Moab Native Interface | Used for connecting directly to scripts, files, and databases. (See Managing Resources Directly with the Native Interface.) |
PBS | Torque (all versions) | N/A |
SSS | Scalable Systems Software Project version 2.0 and higher | N/A |
WIKI | Wiki interface specification version 1.0 and higher | Used for LRM, YRM, ClubMASK, BProc, SLURM, and others. |
11.2.2.B Resource Manager Name
Moab can support more than one resource manager simultaneously. Consequently, the RMCFG parameter takes an index value such as RMCFG[clusterA]. This index value essentially names the resource manager (as done by the deprecated parameter RMNAME). The resource manager name is used by the scheduler in diagnostic displays, logging, and in reporting resource consumption to the accounting manager. For most environments, the selection of the resource manager name can be arbitrary.
11.2.2.C Resource Manager Location
The HOST, PORT, and SERVER attributes can be used to specify how the resource manager should be contacted. For many resource managers the interface correctly establishes contact using default values. These parameters need only to be specified for resource managers such as the WIKI interface (that do not include defaults) or with resources managers that can be configured to run at non-standard locations (such as PBS). In all other cases, the resource manager is automatically located.
11.2.2.D Resource Manager Flags
The FLAGS attribute can be used to modify many aspects of a resources manager's behavior.
AUTOSYNC, COLLAPSEDVIEW, HOSTINGCENTER, PRIVATE, REPORT, SHARED, and STATIC are deprecated.
Flag | Description |
---|---|
ASYNCDELETE |
Moab directs the resource manager to not wait for confirmation that the job correctly cancels before the API call returns. See Large Cluster Tuning for more information. This flag is only applicable for Torque or Moab Native resource managers. |
ASYNCSTART |
Jobs started on this resource manager start asynchronously. In this case, the scheduler does not wait for confirmation that the job correctly starts before proceeding. See Large Cluster Tuning for more information. This flag is only applicable for Torque or Moab Native resource managers. |
AUTOSTART | Jobs staged to this resource manager do not need to be explicitly started by the scheduler. The resource manager itself handles job launch. |
BECOMEMASTER | Nodes reported by this resource manager will transfer ownership to this resource manager if they are currently owned by another resource manager that does not have this flag set. |
CLIENT | A client resource manager object is created for diagnostic/statistical purposes or to configure Moab's interaction with this resource manager. It represents an external entity that consumes server resources or services, allows a local administrator to track this usage, and configures specific policies related to that resource manager. A client resource manager object loads no data and provides no services. |
CLOCKSKEWCHECKING | Setting CLOCKSKEWCHECKING allows you to configure clock skew adjustments. Most of the time it is sufficient to use an NTP server to keep the clocks in your system synchronized. |
DYNAMICCRED | The resource manager creates credentials within the cluster as needed to support workload. See Identity Manager Overview. |
EnableCondensedQuery |
Enables the condensed workload query. Only applies if the Torque parameter job_full_report_time is used (Torque Resource Manager version 5.1.x or later). See Server Parameters in the Torque Resource Manager Administrator Guide. |
EXECUTIONSERVER | The resource manager is capable of launching and executing batch workload. |
FSISREMOTE | Add this flag if the working file system doesn't exist on the server to prevent Moab from validating files and directories at migration. |
FULLCP | Always checkpoint full job information (useful with Native resource managers). |
IgnOS | Ignore the operating system reported by the resource manager on each node and use the OS in Moab's configuration files. See OS for more information. |
IGNQUEUESTATE | The queue state reported by the resource manager should be ignored. May be used if queues must be disabled inside of a particular resource manager to allow an external scheduler to properly operate. |
IGNWORKLOADSTATE |
When this flag is applied to a native resource manager, any jobs that are reported via that resource manager's "workload query URL" have their reported state ignored. For example, if an RM has the IgnWorkloadState flag and it reports that a set of jobs have a state of "Running," this state is ignored and the jobs will either have a default state set or will inherit the state from another RM reporting on that same set of jobs. This flag only changes the behavior of RMs of type NATIVE. |
LOCALWORKLOADEXPORT | When set, destination peers share information about local and remote jobs, allowing job management of different clusters at a single peer. For more information, see Workload Submission and Control. |
MIGRATEALLJOBATTRIBUTES | When set, this flag causes additional job information to be migrated to the resource manager; additional job information includes things such as node features applied via CLASSCFG[name] DEFAULT.FEATURES, the account to which the job was submitted, job walltime limit, and node exclusivity. |
NOAUTORES | If the resource manager does not report CPU usage to Moab because CPU usage is at 0%, Moab assumes full CPU usage. When set, Moab recognizes the resource manager report as 0% usage. This is only valid for PBS. |
NoCondensedQuery | Disables the condensed workload query. This is the default for Moab 9.0 and later. Only applies if the Torque parameter job_full_report_time is used (Torque Resource Manager version 5.1.x or later). See Server Parameters in the Torque Resource Manager Administrator Guide. |
NOCREATERESOURCE | To use resources discovered from this resource manager, they must be created by another resource manager first. For example, if you set NOCREATERESOURCE on RM A, which reports nodes 1 and 2, and RM B only reports node 1, then node 2 will not be created because RM B did not report it. |
PROXYJOBSUBMISSION | Enables Admin proxy job submission, which means administrators may submit jobs in behalf of other users. |
PUSHSLAVEJOBUPDATES | Enables job changes made on a grid slave to be pushed to the grid head or master. Without this flag, jobs being reported to the grid head do not show any changes made on the remote Moab server (via mjobctl and so forth). |
RECORDGPUMETRICS | Enables the recording of GPU metrics for nodes. |
RECORDMICMETRICS | Enables the recording of MIC metrics for nodes. |
THREADEDQUERIES | When this flag is set for an individual RM, the queries that Moab performs to get information from the RM is done in a separate thread from the main Moab process. This allows Moab to remain responsive during the query and ultimately reduces the time spent in a scheduling cycle. If multiple RMs are being used the effect can be more significant because all RMs will be queried in parallel. |
USEMOABDEPENDENCIES | Valid only for SLURM resource managers. Tells Moab to use its own internal job dependency rules, rather than considering SLURM the master. This is useful for multiple job name dependencies. |
USEPHYSICALMEMORY |
Tells Moab to use a node's physical memory instead of the swap space. For example: If a node has 12 GB of RAM and an additional 12 GB of swap space, it has 24 GB of virtual memory. If a 4 GB job is assigned to that node, the reported available memory shows 12 GB because the job is using the swap space not the physical memory. The reported available memory doesn't decrease until the swap space is used up. When this flag is set, the 4 GB job immediately reduces the available memory to 8 GB (physical memory - used memory). |
USERSPACEISSEPARATE | Tells Moab to ignore validating the user's uid and gid in the case that information doesn't exist on the Moab server. |
Example
# resource manager 'torque' should use asynchronous job start RMCFG[torque] FLAGS=asyncstart
11.2.3 Scheduler/Resource Manager Interactions
In the simplest configuration, Moab interacts with the resource manager using the following four primary functions:
Function | Description |
---|---|
GETJOBINFO | Collect detailed state and requirement information about idle, running, and recently completed jobs. |
GETNODEINFO | Collect detailed state information about idle, busy, and defined nodes. |
STARTJOB | Immediately start a specific job on a particular set of nodes. |
CANCELJOB | Immediately cancel a specific job regardless of job state. |
Using these four simple commands, Moab enables nearly its entire suite of scheduling functions. More detailed information about resource manager specific requirements and semantics for each of these commands can be found in the specific resource manager (such as WIKI) overviews.
In addition to these base commands, other commands are required to support advanced features such as suspend/resume, gang scheduling, and scheduler initiated checkpoint restart.
Information on creating a new scheduler resource manager interface can be found in the Adding New Resource Manager Interfaces section.