Moab Workload Manager

13.1 Resource Manager Overview

For most installations, the Moab Workload Manager uses the services of a resource manager to obtain information about the state of compute resources (nodes) and workload (jobs). Moab also uses the resource manager to manage jobs, passing instructions regarding when, where, and how to start or otherwise manipulate jobs.

Moab can be configured to manage more than one resource manager simultaneously, even resource managers of different types. Using a local queue, jobs may even be migrated from one resource manager to another. However, there are currently limitations regarding jobs submitted directly to a resource manager (not to the local queue.) In such cases, the job is constrained to only run within the bound of the resource manager to which it was submitted.


13.1.1 Scheduler/Resource Manager Interactions

Moab interacts with all resource managers using a common set of commands and objects. Each resource manager interfaces, obtains, and translates Moab concepts regarding workload and resources into native resource manager objects, attributes, and commands.

Information on creating a new scheduler resource manager interface can be found in the Adding New Resource Manager Interfaces section.

13.1.1.1 Resource Manager Commands

For many environments, Moab interaction with the resource manager is limited to the following objects and functions:

Object Function Details
Query Collect detailed state, requirement, and utilization information about jobs
Modify Change job state and/or attributes
Start Execute a job on a specified set of resource
Cancel Cancel an existing job
Preempt/Resume Suspend, resume, checkpoint, restart, or requeue a job
Query Collect detailed state, configuration, and utilization information about compute resources
Modify Change node state and/or attributes
Query Collect detailed policy and configuration information from the resource manager

Using these functions, Moab is able to fully manage workload, resources, and cluster policies. More detailed information about resource manager specific capabilities and limitations for each of these functions can be found in the individual resource manager overviews. (LL, PBS, LSF, SGE, BProc, or WIKI).

Beyond these base functions, other commands exist to support advanced features such as dynamic job support, provisioning, and cluster level resource management.

13.1.1.2 Resource Manager Flow

In general, Moab interacts with resource managers in a sequence of steps each scheduling iteration. These steps are outlined in what follows:

  1. load global resource information
  2. load node specific information (optional)
  3. load job information
  4. load queue/policy information (optional)
  5. cancel/preempt/modify jobs according to cluster policies
  6. start jobs in accordance with available resources and policy constraints
  7. handle user commands

Typically, each step completes before the next step is started. However, with current systems, size and complexity mandate a more advanced parallel approach providing benefits in the areas of reliability, concurrency, and responsiveness.

13.1.2 Resource Manager Specific Details (Limitations/Special Features)

13.1.3 Synchronizing Conflicting Information

Moab does not trust resource manager information. Node, job, and policy information is reloaded on each iteration and discrepancies are detected. Synchronization issues and allocation conflicts are logged and handled where possible. To assist sites in minimizing stale information and conflicts, a number of policies and parameters are available.

  • Node State Synchronization Policies (see NODESYNCTIME)
  • Job State Synchronization Policies (see JOBSYNCTIME)
  • Stale Data Purging (see JOBPURGETIME)
  • Thread Management (preventing resource manager failures from affecting scheduler operation)
  • Resource Manager Poll Interval (see RMPOLLINTERVAL)
  • Node Query Refresh Rate (see NODEPOLLFREQUENCY)

13.1.4 Evaluating Resource Manager Availability and Performance

Each resource manager is individually tracked and evaluated by Moab. Using the mdiag -R command, a site can determine how a resource manager is configured, how heavily it is loaded, what failures, if any, have occurred in the recent past, and how responsive it is to requests.

See Also