Holistic Scheduling - The Native Resource Manager is a video tutorial of a session offered at Moab Con that offers further details for understanding the native resource manager. |
In any case, Moab provides the ability to directly query and manage resources along side of or without the use of a resource manager. This interface, called the NATIVE interface can also be used to launch, cancel, and otherwise manage jobs. This NATIVE interface offers several advantages including the following:
However, the NATIVE interface may also have some drawbacks.
At a high level, the native interface works by launching threaded calls to perform standard resource manager activities such as managing resources and jobs. The desired calls are configured within Moab and used whenever an action or updated information is required.
RMCFG[local] TYPE=NATIVE RMCFG[local] CLUSTERQUERYURL=exec:///tmp/query.sh
Format | Description |
---|---|
EXEC | Execute the script specified by the URL path. Use the script stdout as data. |
FILE | Load the file specified by the URL path. Use the file contents as data. |
GANGLIA | Query the Ganglia service located at the URL host and port. Directly process the query results using native Ganglia data formatting. |
HTTP | Read the data specified by the URL. Use the raw data returned. |
SQL | Load data directly from an SQL database using the FULL format described below. |
Moab considers a NativeRM script to have failed if it returns with a non-zero exit code, or if the CHILDSTDERRCHECK parameter is set and its appropriate conditions are met. In addition, the NativeRM script associated with a job submit URL will be considered as having failed if its standard output stream contains the text, "ERROR".
This simple example queries a file on the server for information about every node in the cluster. This differs from Moab remotely querying the status of each node individually.
RMCFG[local] TYPE=NATIVE RMCFG[local] CLUSTERQUERYURL=file:///tmp/query.txt
describes any set of node attributes with format: <NAME> <ATTR>=<VAL> [<ATTR>=<VAL>]... |
<NAME> - name of node
<ATTR> - node attribute
<VAL> - value of node attribute (See Resource Data Format) |
n17 CPROC=4 AMEMORY=100980 STATE=idle |
RMCFG[TORQUE] TYPE=pbs RMCFG[ganglia] TYPE=NATIVE CLUSTERQUERYURL=ganglia://<NodeName>:<Port> RMCFG[ganglia] FLAGS=SLAVEPEER NODESTATEPOLICY=OPTIMISTIC
<NodeName> is the name of a machine with Ganglia running on it. Also, <Port> is the xml port number to query Ganglia. If only ganglia:// is supplied as the CLUSTERQUERYURL, Moab will query the localhost on Ganglia's default port, 8649.
If Ganglia and Moab are running on different machines, the machine running Moab needs to be specified as a trusted host in Ganglia's configuration file.
Because Ganglia is not a real resource manager, in that it does not manage a job queue, Moab cannot control it or manage it, it can only read in information. TORQUE is a real resource manager in that it reports nodes and can start jobs on those nodes. The two can run concurrently without any issue, because their "responsiblities" do not overlap. However, it is mostly true that if Ganglia and TORQUE report conflicting data, you will want to trust TORQUE over Ganglia. For this reason you give the Ganglia RM the "slave" flag. Also, Ganglia cannot report node "state" where state means "availability to run jobs."
To verify that Ganglia is correctly reporting information, issue the mdiag -R -v command or run telnet localhost 8649 and verify that appropriate XML is displayed. |
The following list of Ganglia metrics are supported up to Ganglia version 3.1.1:
Information reported by Ganglia can be used to prioritize nodes using the NODECFG[] PRIORITYF parameter in conjunction with the NODEALLOCATIONPOLICY of PRIORITY.
Moab can interface with FLEXlm to provide scheduling based on license availability. Informing Moab of license dependencies can reduce the number of costly licenses required by your cluster by allowing Moab to intelligently schedule around license limitations.
Provided with Moab in the tools directory is a Perl script, license.mon.flexLM.pl. This script queries a FLEXlm license server and gathers data about available licenses. This script then formats this data for Moab to read through a native interface. This script can easily be used by any site to help facilitate FLEXlm integration--the only modification necessary to the script is setting the @FLEXlmCmd to specify the local command to query FLEXlm. To make this change, edit license.mon.flexLM.pl and, near the top of the file, look for the line:
my @FLEXlmCmd = ("SETME");
Set the '@FLEXlmCmd' to the appropriate value for your system to query a license server and license file (if applicable). If lmutil is not in the PATH variable, specify its full path. Using lmutil's -a argument will cause it to report all licenses. The -c option can be used to specify an optional license file.
To test this script, run it manually. If working correctly, it will produce output similar to the following:
> ./license.mon.flexLM.pl GLOBAL UPDATETIME=1104688300 STATE=idle ARES=autoCAD:130,idl_mpeg:160 CRES=autoCAD:200,idl_mpeg:330
If the output looks incorrect, set the $LOGLEVEL variable inside of license.mon.flexLM.pl, run it again, and address the reported failure.
Once the license interface script is properly configured, the next step is to add a license native resource manager to Moab via the moab.cfg file:
RMCFG[FLEXlm] TYPE=NATIVE RESOURCETYPE=LICENSE RMCFG[FLEXlm] CLUSTERQUERYURL=exec://$TOOLSDIR/license.mon.flexLM.pl ...
Once this change is made, restart Moab. The command mdiag -R can be used to verify that the resource manager is properly configured and is in the state Active. Detailed information regarding configured and utilized licenses can be viewed by issuing the mdiag -n. Floating licenses (non-node-locked) will be reported as belonging to the GLOBAL node.
Due to the inherent conflict with the plus sign ("+"), the provided license manager script replaces occurrences of the plus sign in license names with the underscore symbol ("_"). This replacement requires that licenses with a plus sign in their names be requested with an underscore in place of any plus signs. |
Interfacing to Multiple License Managers Simultaneously
If multiple license managers are used within a cluster, Moab can interface to each of them to obtain the needed license information. In the case of FLEXlm, this can be done by making one copy of the license.mon.flexLM.pl script for each license manager and configuring each copy to point to a different license manager. Then, within Moab, create one native resource manager interface for each license manager and point it to the corresponding script as in the following example:
RMCFG[FLEXlm1] TYPE=NATIVE RESOURCETYPE=LICENSE RMCFG[FLEXlm1] CLUSTERQUERYURL=exec://$TOOLSDIR/license.mon.flexLM1.pl RMCFG[FLEXlm2] TYPE=NATIVE RESOURCETYPE=LICENSE RMCFG[FLEXlm2] CLUSTERQUERYURL=exec://$TOOLSDIR/license.mon.flexLM2.pl RMCFG[FLEXlm3] TYPE=NATIVE RESOURCETYPE=LICENSE RMCFG[FLEXlm3] CLUSTERQUERYURL=exec://$TOOLSDIR/license.mon.flexLM3.pl ...
For an overview of license management, including job submission syntax, see Section 13.7, License Management. |
It may be necessary to increase the default limit, MMAX_GRES. See Appendix D for more implementation details. |
Nagios installation and configuration documentation can be found at Nagios.org.
Provided with Moab in the tools directory is a Perl script, node.query.nagios.pl. This script reads the Nagios status.dat file and gathers data about network hosts and services. This script then formats data for Moab to read through a native interface. This script can be used by any site to help facilitate Nagios integration. To customize the data that will be formatted for Moab, make the changes in this script.
You may need to customize the associated configuration file in the etc directory, config.nagios.pl. The statusFile line in this script tells Moab where the Nagios status.dat file is located. Make sure that the path name specified is correct for your site. Note that the interval which Nagios updates the Nagios status.dat file is specified in the Nagios nagios.cfg file. Refer to Nagios documentation for further details.
To make these changes, familiarize yourself with the format of the Nagios status.dat file and make the appropriate additions to the script to include the desired Wiki Interface attributes in the Moab output.
To test this script, run it manually. If working correctly, it will produce output similar to the following:
> ./node.query.nagios.pl gateway STATE=Running localhost STATE=Running CPULOAD=1.22 ADISK=75332
Once the Nagios interface script is properly configured, the next step is to add a Nagios native resource manager to Moab via the moab.cfg file:
RMCFG[nagios] TYPE=NATIVE RMCFG[nagios] CLUSTERQUERYURL=exec://$TOOLSDIR/node.query.nagios.pl ...
Once this change is made, restart Moab. The command mdiag -R can be used to verify that the resource manager is properly configured and is in the state Active. Detailed information regarding configured Nagios node information can be viewed by issuing the mdiag -n -v.
> mdiag -n -v compute node summary Name State Procs Memory Disk Swap Speed Opsys Arch Par Load Rsv Classes Network Features gateway Running 0:0 0:0 0:0 0:0 1.00 - - dav 0.00 0 - - - WARNING: node 'gateway' is busy/running but not assigned to an active job WARNING: node 'gateway' has no configured processors localhost Running 0:0 0:0 75343:75347 0:0 1.00 - - dav 0.48 0 - - - WARNING: node 'localhost' is busy/running but not assigned to an active job WARNING: node 'localhost' has no configured processors ----- --- 3:8 1956:1956 75345:75349 5309:6273 Total Nodes: 2 (Active: 2 Idle: 0 Down: 0)
RMCFG[TORQUE] TYPE=pbs RMCFG[supermon] TYPE=NATIVE CLUSTERQUERYURL=exec://$HOME/tools/node.query.supermon.pl
To confirm that Supermon is properly connected to Moab, issue "mdiag -R -v." The output should be similar to the following example, specifically there are no errors about the CLUSTERQURYURL.
diagnosing resource managers RM[TORQUE] State: Active Type: PBS ResourceType: COMPUTE Server: keche Version: '2.2.0-snap.200707181818' Job Submit URL: exec:///usr/local/bin/qsub Objects Reported: Nodes=3 (6 procs) Jobs=0 Flags: executionServer Partition: TORQUE Event Management: EPORT=15004 (no events received) Note: SSS protocol enabled Submit Command: /usr/local/bin/qsub DefaultClass: batch RM Performance: AvgTime=0.26s MaxTime=1.04s (4 samples) RM Languages: PBS RM Sub-Languages: - RM[supermon] State: Active Type: NATIVE:AGFULL ResourceType: COMPUTE Cluster Query URL: exec://$HOME/node.query.supermon.pl Objects Reported: Nodes=3 (0 procs) Jobs=0 Partition: supermon Event Management: (event interface disabled) RM Performance: AvgTime=0.03s MaxTime=0.11s (4 samples) RM Languages: NATIVE RM Sub-Languages: - Note: use 'mrmctl -f messages ' to clear stats/failures
Run the Perl script by itself. The script's results should look similar to this:
vm01 GMETRIC[CPULOAD]=0.571428571428571 GMETRIC[NETIN]=133 GMETRIC[NETOUT]=702 GMETRIC[NETUSAGE]=835 vm02 GMETRIC[CPULOAD]=0.428571428571429 GMETRIC[NETIN]=133 GMETRIC[NETOUT]=687 GMETRIC[NETUSAGE]=820 keche GMETRIC[CPULOAD]=31 GMETRIC[NETIN]=5353 GMETRIC[NETOUT]=4937 GMETRIC[NETUSAGE]=10290
If the preceeding functioned properly, issue a checknode command on one of the nodes that Supermon is gathering statistics for. The output should look similiar to below.
node keche State: Idle (in current state for 00:32:43) Configured Resources: PROCS: 2 MEM: 1003M SWAP: 3353M DISK: 1M Utilized Resources: --- Dedicated Resources: --- Generic Metrics: CPULOAD=33.38,NETIN=11749.00,NETOUT=9507.00,NETUSAGE=21256.00 MTBF(longterm): INFINITY MTBF(24h): INFINITY Opsys: linux Arch: --- Speed: 1.00 CPULoad: 0.500 Network Load: 0.87 kB/s Flags: rmdetected Network: DEFAULT Classes: [batch 2:2][interactive 2:2] RM[TORQUE]: TYPE=PBS EffNodeAccessPolicy: SHARED Total Time: 2:03:27 Up: 2:03:27 (100.00%) Active: 00:00:00 (0.00%) Reservations: ---
Native resource managers using HTTP URLs send and receive information using the standard HTTP 1.0 protocol. Information is sent using the HTTP GET method, while results are to be returned in the HTTP body using the format described in the Flat Cluster Query Data section. Not all available native resource manager query URLs are currently supported. Following is a chart showing the supported query URLs and the parameters that will be provided by MOAB in the GET request.
Query URL | Parameters |
---|---|
CLUSTERQUERYURL | none |
JOBCANCELURL | jobname=<JOB_ID> |
JOBMODIFYURL |
jobname=<JOB_ID> attr=<Attribute_Name> value=<Attribute_Value> |
WORKLOADQUERYURL | none |
CGI scripts pointed to by the query URLs should always return at least one line of output on success to insure that Moab does not consider empty result sets to be a failure. In the case of empty result sets, this can be accomplished by returning an empty comment line (i.e., the '#' character followed by a newline).
It is possible to have a separate CLUSTERQUERYURL for each node. This is possible using the NODECFG parameter for each node or for the DEFAULT node. Moab will look first on the specific node for CLUSTERQUERYURL information. If no information is found on the specific node it will look for CLUSTERQUERYURL information on the Resource Manager. If the Resource Manager has no query information specified then it will use the CLUSTERQUERYURL command configured for the DEFAULT node.
The example configuration below demonstrates a possible setup.
RMCFG[local] TYPE=NATIVE RESOURCELIST node1,node2,node3,node4,flexlm1 NODECFG[DEFAULT] CLUSTERQUERYURL=exec:///usr/local/bin/query.pl NODECFG[flexlm1] CLUSTERQUERYURL=http://supercluster.org/usr/local/flquery.cgi
In the example above, a four node cluster and a license manager are controlled via the native interface. The state of the four compute nodes will be determined by running the /usr/local/bin/query.pl query command (remotely on the node) specified within the DEFAULT NODECFG parameter while querying the license manager will be accomplished using the /usr/local/bin/flquery.cgi script. For local executable scripts, the launched script is either locally generated or taken from the library of contributed native scripts included with the distribution file.
As above, an optional parameter, RESOURCELIST, may be specified to constrain which resources obtained by the native interface should be processed. By default, all resources described by the interface data are loaded. The RESOURCELIST parameter, if specified, acts as a filter eliminating either full or extension resource information from being incorporated. If an environment exists where data is not aggregated, and the native interface provides primary node information, the RESOURCELIST parameter is required to indicate to Moab which resources should be in the cluster.
Native Resource managers can also perform special tasks when they are given a specific resource type. These types are specified using the RESOURCETYPE attribute of the RMCFG parameter.
TYPE | EXPLANATION |
---|---|
COMPUTE | normal compute resources (no special handling) |
FS | file system resource manager (see Multiple Resource Managers for an example) |
LICENSE | software license manager (see Interfacing with FLEXlm and License Management) |
NETWORK | network resource manager |
Using the scripts found in the $TOOLSDIR ($INSTDIR/tools) directory as a template, new tools can be quickly created to monitor or manage most any resource. Each tool should be associated with a particular resource manager service and specified using one of the following resource manager URL attributes.
CLUSTERQUERYURL | |||
Description: | Queries resource state, configuration, and utilization information for compute nodes, networks, storage systems, software licenses, and other resources. For more details, see RM configuration. | ||
Input: | --- | ||
Output: | Node status and configuration for one or more nodes. See Resource Data Format. | ||
Example: | RMCFG[v-stor] CLUSTERQUERYURL=exec://$HOME/storquery.pl Moab will execute the storquery.pl script to obtain information about 'v-stor' resources.
| ||
JOBMODIFYURL | |||
Description: | Modified a job or application. For more details, see RM configuration. | ||
Input: | [-j <JOBEXPR>] [--s[et]|--c[lear]|--i[ncrement]|--d[ecrement]] <ATTR>[=<VALUE>] [<ATTR>[=<VALUE>]]... | ||
Output: | --- | ||
Example: | RMCFG[v-stor] JOBMODIFYURL=exec://$HOME/jobmodify.pl Moab will execute the jobmodify.pl script to modify the specified job. | ||
JOBRESUMEURL | |||
Description: | Resumes a suspended job or application. | ||
Input: | <JOBID> | ||
Output: | --- | ||
Example: | RMCFG[v-stor] JOBRESUMEURL=exec://$HOME/jobresume.pl Moab will execute the jobresume.pl script to resume suspended jobs. | ||
JOBSTARTURL | |||
Description: | Launches a job or application on a specified set of resources. | ||
Input: | <JOBID> <TASKLIST> <USERNAME> [ARCH=<ARCH>] [OS=<OPSYS>] [IDATA=<STAGEINFILEPATH>[,<STAGEINFILEPATH>]...] [EXEC=<EXECUTABLEPATH>] | ||
Output: | --- | ||
Example: | RMCFG[v-stor] JOBSTARTURL=exec://$HOME/jobstart.pl Moab will execute the jobstart.pl script to execute jobs. | ||
JOBSUBMITURL | |||
Description: | Submits a job to the resource manager, but it does not execute the job. The job executes when the JOBSTARTURL is called. | ||
Input: | [ACCOUNT=<ACCOUNT>] [ERROR=<ERROR>] [GATTR=<GATTR>] [GNAME=<GNAME>] [GRES=<GRES>:<Value>[,<GRES>:<Value>]*] [HOSTLIST=<HOSTLIST>] [INPUT=<INPUT>] [IWD=<IWD>] [NAME=<NAME>] [OUTPUT=<OUTPUT>] [RCLASS=<RCLASS>] [REQUEST=<REQUEST>] [RFEATURES=<RFEATURES>] [RMFLAGS=<RMFLAGS>] [SHELL=<SHELL>] [TASKLIST=<TASKLIST>] [TASKS=<TASKS>] [TEMPLATE=<TEMPLATE>] [UNAME=<UNAME>] [VARIABLE=<VARIABLE>] [WCLIMIT=<WCLIMIT>] [ARGS=<Value>[ <Value>]*]
| ||
Output: | --- | ||
Example: | RMCFG[v-stor] JOBSUBMITURL=exec://$HOME/jobsubmit.pl Moab submits the job to the jobsubmit.pl script for future job execution.
| ||
JOBSUSPENDURL | |||
Description: | Suspends in memory an active job or application. | ||
Input: | <JOBID> | ||
Output: | --- | ||
Example: | RMCFG[v-stor] JOBSUSPENDURL=exec://$HOME/jobsuspend.pl Moab will execute the jobsuspend.pl script to suspend active jobs. | ||
NODEMODIFYURL | |||
Description: | Provide method to dynamically modify/provision compute resources including operating system, applications, queues, node features, power states, etc. | ||
Input: | <NODEID>[,<NODEID>] [--force] {--set <ATTR>=<VAL>|--clear <ATTR>} ATTR is one of the node attributes listed in Resource Data Format | ||
Output: | -- | ||
Example: | RMCFG[warewulf] NODEMODIFYURL=exec://$HOME/provision.pl Moab will reprovision compute nodes using the provision.pl script.
| ||
NODEPOWERURL | |||
Description: | Allows Moab to issue IPMI power commands. | ||
Input: | <NODEID>[,<NODEID>] ON | OFF | ||
Output: | --- | ||
Example: | RMCFG[node17rm] NODEPOWERURL=exec://$TOOLSDIR/ipmi.power.pl Moab will issue a power command contained in the ipmi.power.pl script.
| ||
SYSTEMMODIFYURL | |||
Description: | Provide method to dynamically modify aspects of the compute environment which are directly associated with cluster resources. For more details, see RM configuration. | ||
SYSTEMQUERYURL | |||
Description: | Provide method to dynamically query aspects of the compute environment which are directly associated with cluster resources. For more details, see RM configuration. | ||
Input: | default <ATTR> ATTR is one of images | ||
Output: | <STRING> | ||
Example: | RMCFG[warewulf] SYSTEMQUERYURL=exec://$HOME/checkimage.pl Moab will load the list of images available from warewulf using the checkimage.pl script. | ||
WORKLOADQUERYURL | |||
Description: | Provide method to dynamically query the system workload (jobs, services, etc) of the compute environment which are associated with managed resources.
| ||
Input: | --- | ||
Output: | <STRING> | ||
Example: | RMCFG[xt] WORKLOADQUERYURL=exec://$HOME/job.query.xt3.pl | ||