Moab Adaptive Computing Suite Administrator's Guide 5.4

2.2 Resource Monitoring in a Utility Computing Environment

Moab provides an immensely flexible interface for monitoring any type of compute resource. In all cases, external information is imported using the RMCFG parameter. This is true whether the monitoring tool is nothing more than a hand-populated text file, a locally created monitor script, a cluster monitoring tool such as Ganglia, or a cluster resource manager such as TORQUE or LSF.

To allow Moab to monitor, schedule, reserve, and allocate resources, Moab must be aware of what resources are available, how they are configured, and what their health and utilization status is. In the following example, a locally created script is used to import information about node status.

RMCFG[core] TYPE=NATIVE CLUSTERQUERYURL=exec:///opt/hosting/bin/resquery.pl

The RMCFG line indicates that the script /opt/hosting/bin/resquery.pl should be executed to generate resource data. This example script converts the output of the XCAT hardware monitor interface to WIKI text data providing information regarding basic node state and configuration. With all exec based queries, Moab reads the script stdout as source data for processing. Moab allows significant flexibility in how resource data is organized. A sample output of the resource data follows:

node001 STATE=Idle CPROC=2 CMEM=512
node002 STATE=Idle CPROC=2 CMEM=512
node010 STATE=Down CPROC=2 CMEM=1024
node011 STATE=Idle CPROC=2 CMEM=1024

Example

In some cases, the utility computing hosting master may want to use both dynamic and static information sources simultaneously. In the following example, core compute node configuration information is maintained in a flat file while the Ganglia resource monitor and a system level hardware monitor are used to amend the data.

RMCFG[base] TYPE=NATIVE RESOURCETYPE=COMPUTE POLLINTERVAL=00:15:00 
RMCFG[base] CLUSTERQUERYURL=file://$HOME/resources_min.txt
RMCFG[base] DESCRIPTION='static node info'

RMCFG[ganglia]  TYPE=NATIVE FLAGS=SLAVE CLUSTERQUERYURL=ganglia://localhost
RMCFG[ganglia]  DESCRIPTION='dynamic node info'
...
Note Resource information may be obtained from additional sources if desired, including databases, web services, and locally generated scripts and tools.

2.2.1 Monitoring Non-Compute Resources

In the case of license managers, storage managers, and network managers, the interface developed need only support the services required by the particular host center. For example, in the case of a hosting center with tight security requirements and supporting a compute-intensive workload, there may not be a need for bandwidth guarantees, but there will be a need for basic network health monitoring and VLAN/VPN support. Consequently, the interface to the network manager may do nothing more than perform a simple health check against the router and monitor per customer network activity. In such cases, a simple PERL or shell script of a few dozen lines may be adequate.