Moab Workload Manager

13.6 Utilizing Multiple Resource Managers

13.6.1 Multi-RM Overview

In many instances a site may have certain resources controlled by different resource managers. For example, a site may use a particular resource manager for licensing software for jobs, another resource manager for managing file systems, another resource manager for job control, and another for node monitoring. Moab can be configured to communicate with each of these resource mangers, gathering all their data and incorporating such into scheduling decisions. With a more distributed approach to resource handling, failures are more contained and scheduling decisions can be more intelligent.

13.6.2 Configuring Multiple Independent Resource Manager Partitions

Moab must know how to communicate with each resource manager. In most instances, this is simply done by configuring a query command.

13.6.3 Migrating Jobs between Resource Managers

With multi-resource manager support, a job may be submitted either to a local resource manager queue or to the Moab global queue. In most cases, submitting a job to a resource manager queue constrains the job to only run within the resources controlled by that resource manager. However, if the job is submitted to the Moab global queue, it can use resources of any active resource manager. This is accomplished through job translation and staging.

When Moab evaluates resource availability, it determines the cost in terms of both data and job staging. If staging a job's executable or input data requires a significant amount of time, Moab integrates data and compute resource availability to determine a job's earliest potential start time on a per resource manager basis and makes an optimal scheduling decision accordingly. If the optimal decision requires a data stage operation, Moab reserves the required compute resources, stages the data, and then starts the job when the required data and compute resources are available.

13.6.4 Aggregating Information into a Cohesive Node View

Using the native interface, Moab can actually perform most of these functions without the need for an external resource manager. First, configure the native resource managers:

RESOURCELIST	node01,node02
...
RMCFG[base]	TYPE=PBS
RMCFG[network]	TYPE=NATIVE:AGFULL
RMCFG[network]	CLUSTERQUERYURL=/tmp/network.sh
RMCFG[fs]	TYPE=NATIVE:AGFULL
RMCFG[fs]	CLUSTERQUERYURL=/tmp/fs.sh

The network script can be as simple as the following:

> _RX=`/sbin/ifconfig eth0 | grep "RX by" | cut -d: -f2 | cut -d' ' -f1`; \
> _TX=`/sbin/ifconfig eth0 | grep "TX by" | cut -d: -f3 | cut -d' ' -f1`; \
> echo `hostname` NETUSAGE=`echo "$_RX + $_TX" | bc`;

The preceding script would output something like the following:

node01 NETUSAGE=10928374

Moab grabs information from each resource manager and includes its data in the final view of the node.

> checknode node01
node node01

State:   Running  (in current state for 00:00:20)
Configured Resources: PROCS: 2  MEM: 949M  SWAP: 2000M  disk: 1000000
Utilized   Resources: SWAP: 9M
Dedicated  Resources: PROCS: 1  disk: 1000
Opsys:      Linux-2.6.5-1.358  Arch:       linux
Speed:      1.00  CPULoad:       0.320
Location:   Partition: DEFAULT  Rack/Slot:  NA
Network Load: 464.11 b/s
Network:    DEFAULT
Features:   fast
Classes:    [batch 1:2][serial 2:2]

Total Time: 00:30:39  Up: 00:30:39 (100.00%)  Active: 00:09:57 (32.46%)

Reservations:
  Job '5452'(x1)  -00:00:20 -> 00:09:40 (00:10:00)
JobList:  5452

Notice that the Network Load is now being reported along with disk usage.

Example File System Utilization Tracker (per user)

The following configuration can be used to track file system usage on a per user basis:

.....
RMCFG[file]	POLLINTERVAL=24:00:00 POLLTIMEISRIGID=TRUE
RMCFG[file]	TYPE=NATIVE:AGFULL
RMCFG[file]	RESOURCETYPE=FS
RMCFG[file]	CLUSTERQUERYURL=/tmp/fs.pl
.....

Assuming that /tmp/fs.pl outputs something of the following format:

DEFAULT STATE=idle AFS=<fs id="user1" size="789456"></fs><fs id="user2" size="123456"></fs>

This will track disk usage for users user1 and user2 every 24 hours.