For most installations, the Moab Workload Manager uses the services of a resource manager to obtain information about the state of compute resources (nodes) and workload (jobs). Moab also uses the resource manager to manage jobs, passing instructions regarding when, where, and how to start or otherwise manipulate jobs.
Moab can be configured to manage more than one resource manager simultaneously, even resource managers of different types. Using a local queue, jobs may even be migrated from one resource manager to another. However, there are currently limitations regarding jobs submitted directly to a resource manager (not to the local queue.) In such cases, the job is constrained to only run within the bound of the resource manager to which it was submitted.
Moab interacts with all resource managers using a common set of commands and objects. Each resource manager interfaces, obtains, and translates Moab concepts regarding workload and resources into native resource manager objects, attributes, and commands.
Information on creating a new scheduler resource manager interface can be found in the Adding New Resource Manager Interfaces section.
Object | Function | Details |
---|---|---|
Job | Query | Collect detailed state, requirement, and utilization information about jobs |
Modify | Change job state and/or attributes | |
Start | Execute a job on a specified set of resource | |
Cancel | Cancel an existing job | |
Preempt/Resume | Suspend, resume, checkpoint, restart, or requeue a job | |
Node | Query | Collect detailed state, configuration, and utilization information about compute resources |
Modify | Change node state and/or attributes | |
Queue | Query | Collect detailed policy and configuration information from the resource manager |
Using these functions, Moab is able to fully manage workload, resources, and cluster policies. More detailed information about resource manager specific capabilities and limitations for each of these functions can be found in the individual resource manager overviews. (LL, PBS, LSF, SGE, BProc, or WIKI).
Beyond these base functions, other commands exist to support advanced features such as dynamic job support, provisioning, and cluster level resource management.
In general, Moab interacts with resource managers in a sequence of steps each scheduling iteration. These steps are outlined in what follows:
Typically, each step completes before the next step is started. However, with current systems, size and complexity mandate a more advanced parallel approach providing benefits in the areas of reliability, concurrency, and responsiveness.
Moab does not trust resource manager information. Node, job, and policy information is reloaded on each iteration and discrepancies are detected. Synchronization issues and allocation conflicts are logged and handled where possible. To assist sites in minimizing stale information and conflicts, a number of policies and parameters are available.
Each resource manager is individually tracked and evaluated by Moab. Using the mdiag -R command, a site can determine how a resource manager is configured, how heavily it is loaded, what failures, if any, have occurred in the recent past, and how responsive it is to requests.