Moab Workload Manager

21.4 Information Services for Enterprises and Grids

Moab can be used to collect information from multiple scattered resources. Beyond information collection, Moab can also be set up to perform automated diagnostics, produce summary reports, and initiate automated resource recovery, event, and threshold based reprovisioning. Managed resources can include compute clusters, network resources, storage resources, license resources, system services, applications, and even databases.

21.4.1 General Collection Infrastructure

While significant flexibility is possible, a simple approach for monitoring and managing resources involves setting up a Moab Information Daemon (minfod) to access each of the resources to be monitored. These minfod daemons collect configuration, state, load, and other usage information and report it back to one or more central moab daemons. The central Moab is responsible for assembling this information, handling conflict resolution, identifying critical events, generating reports, and performing various automated actions.

The minfod daemon can be configured to import information from most existing HPC information sources, including both specialized application APIs and general communication standards. These interfaces include IPMI, Ganglia, SQL, Nagios, HTTP Services, Web/Soap based services, flat files, LSF, TORQUE/PBS, Loadleveler, SLURM, locally developed scripts, network routers, license managers, and so forth.

The information service feature takes advantage of the Moab peer-to-peer communication facility, identity management interface, generic event/metric facilities, generalized resource management infrastructure, and advanced accounting/reporting capabilities. With these technologies, solutions ranging from pure information services to more active systems that perform resource healing and automated load-balancing can be created.

With the flexibility of Moab, hybrid solutions anywhere along the active monitoring spectrum can be enabled. Services and resources associated with both open source/open standard protocols and vendor-specific protocols can be integrated and simultaneously managed by Moab. In real-time, the information gathered by Moab can be exported to a database, as HTML, or as a Web service. This flexibility allows the information to be of immediate use via human-readable and machine-readable interfaces.

21.4.2 Sample Uses

Organizations use this capability for multiple purposes including the following:

  • Monitoring performance statistics of multiple independent clusters
  • Detecting and diagnosing failures from geographically distributed clusters
  • Tracking cluster, storage, network, service, and application resources
  • Generating load-balancing and resource state information for users and middleware services

21.4.3 General Configuration Guidelines

  1. Establish peer relationships between information service daemons (minfod or moab).
  2. (optional) Enable Starttime Estimation Reporting if manual or automated load-balancing is to occur.
    • Set ENABLESTARTESTIMATESTATS to generate local start estimation statistics.
    • Set REPORTPEERSTARTINFO to report start estimate information to peers.
  3. (optional) Enable Generic Event/Generic Metric Triggers if automated resource recovery or alerts are to be used.
  4. (optional) Enable automated periodic reporting.
  5. (optional) Enable automated data/job staging and environmental translation.
  6. (optional) Enable automated load/event based resource provisioning.

21.4.4 Examples

21.4.4.1 Grid Resource Availability Information Service

The objective of this project is to create a centralized service that can assist users in better utilizing geographically distributed resources within a loosely coupled-grid. In this grid, many independent clusters exist, but many jobs may only be able to use a portion of the available resources due to architectural and environmental differences from cluster to cluster. The information service must provide information to both users and services to allow improved decisions regarding job to resource mapping.

To address this, a centralized Moab information service is created that collects information from each of the participating grids. On each cluster where Moab is already managing the local workload, the existing cluster-level Moab is configured to report the needed information to the central Moab daemon. On each cluster where another system is managing local cluster workload, a Moab Information Service Daemon (minfod) is started.

Because load-balancing information is required, the Moab daemon running on each cluster is configured to report backlog and start estimate information using the REPORTPEERSTARTINFO parameter.

To make information available via a Web service, on the master Moab node, the cluster.mon.ws.pl service is started, allowing Moab to receive Web service based requests and report responses in XML over SOAP. To allow human-readable browser access to the same information and services, the local Web service is configured to use the moab.is.cgi script to drive the Web service interface and report results via a standard Web page.

Due to the broad array of users within the grid, many types of information are provided. This information includes the following:

  • Per cluster configuration (operating system, architecture, node count, processor count, cumulative memory)
  • Per cluster state (active, maintenance, down states)
  • Per cluster messages (local admin-specified cluster messages)
  • Per cluster usage (currently up and currently available node count, processor count, and cumulative memory)
  • Per cluster backlog (in terms of processor seconds and estimated time to completion)
  • Per cluster responsiveness matrix (job size/duration matrix of historical average queue time and xfactor)
  • Per cluster starttime estimate matrix for generic workload (job size/duration matrix of estimated absolute and relative starttime for generic jobs based on priority, policy, backlog, reservation, system efficiency, resource failures, wallclock accuracy, and other factors)
  • Per cluster starttime estimate for specific resource request (based on all factors listed plus job credentials and specific resource requests including memory, features, licenses, and so forth)
  • Per cluster estimate accuracy statistics (indicate how accurate starttime estimates have been in the past)
  • Adjusted starttime estimates (starttime estimates for both specific and generic job requests with estimate accuracy and composite estimate information integrated via an automated learning feedback algorithm)
  • Best destination matrix for generic workload request (composite matrix representing best grid value and best target cluster for each cell)
  • Prioritized best destination cluster report (list of potential destination clusters prioritized in order of best probable responsiveness first)

With these queries, users/services can obtain and process raw resource information or can ask a question as simple as What is the best cluster for this request?.

ENABLESTARTESTIMATESTATS TRUE
REPORTPEERSTARTINFO      TRUE
...

RMCFG[clusterA] SERVER=moab://clusterA.bnl.gov
RMCFG[clusterB] SERVER=moab://clusterB.qrnl.gov
RMCFG[clusterC] SERVER=moab://clusterC.ocsa.edu
RMCFG[clusterD] SERVER=moab://clusterD.ocsa.edu
...

> mdiag -t -v
Partition Status

System Partition Settings:  PList: clusterA,clusterB  

Name                    Procs

ALL                      1400
clusterA                  800
  RM=clusterA
clusterB                  600
  RM=clusterB

Partition    Configured         Up     U/C  Dedicated     D/U     Active     A/U

Nodes ----------------------------------------------------------------------------
ALL                 700        700 100.00%        650  86.67%        647  85.39%
clusterA            400        400 100.00%          0   0.00%          0   0.00%
clusterB            300        300 100.00%          1 100.00%          1 100.00%

Processors ----------------------------------------------------------------------------
ALL                1400       1400  84.21%          2  12.50%          2  12.50%
clusterA            800        800  84.21%          2  12.50%          2  12.50%
clusterB            600        600  84.21%          2  12.50%          2  12.50%
...

Backlog

             BacklogPS  BacklogDuration  AvgQTime

clusterA      13472.00         00:14:27  00:22:14 
clusterB       7196.00         00:07:55  00:07:06
...

See Also