Moab Workload Manager

6.4 Charging and Allocation Management

6.4.1 Charging and Allocation Management Overview

Charging is the process of assigning a value to the use of resources and tracking this usage on a per consumer basis. Often, charging is accompanied by a corresponding assignment of resources (an allocation) to each consumer. Within Moab, charging can be quite flexible. Moab supports the following:

  1. Assignment of fixed, expirable allocations to users, groups, and projects
  2. Assignment of fixed, non-expirable allocations to users, groups, and projects
  3. Assignment of dynamic allocations available within a sliding window
  4. Specification of Quality of Service levels with distinct service targets and charging rates
  5. Management over which consumers can request/access which Quality of Service levels
  6. Ability to specify the metric of consumption for charging (i.e., CPU hours, dedicated node hours, PE's, etc.)
  7. Ability to charge by requested QoS, delivered QoS, or other factors
  8. Creation of complete persistent internal record of services delivered and resources allocated
  9. Ability to call out to external auditing/accounting systems in real-time to authorize usage
  10. Ability to call out to external auditing/accounting systems in real-time to register usage
  11. Ability to adjust charge rates according to the configuration of resources allocated (i.e., processor speed, RAM installed, etc.)

6.4.2 Using an External Allocation Manager

An allocation manager (also known as an allocation bank or CPU bank) is a software system that manages resource allocations. A resource allocation grants a job a right to use a particular amount of resources. While full details of each allocation manager may be found within its respective documentation, the following brief review highlights a few of the values of using such a system.

An allocation manager functions much like a bank in that it provides a form of currency that allows jobs to run on an HPC system. The owners of the resource (cluster/supercomputer) determine how they want the system to be used (often via an allocations committee) over a particular time frame—often a month, quarter, or year. To enforce their decisions, they distribute allocations to various projects via accounts and assign each account an account manager. These allocations can be used for particular machines or globally. They can also have activation and expiration dates associated with them. All transaction information is typically stored in a database or directory server allowing extensive statistical and allocation tracking.

Each account manager determines how the allocations are made available to individual users within a project. Allocation managers such as Gold (from U.S. Dept of Energy) allow the account manager to dedicate portions of the overall allocation to individual users, specify some allocations as shared by all users, and hold some of the allocations in reserve for later use.

When using an allocations manager, each job must be associated with an account. To accomplish this with minimal user impact, the allocation manager could be set up to handle default accounts on a per user basis. However, as is often the case, some users may be active on more than one project and thus have access to more than one account. In these situations, a mechanism, such as a job command file keyword, should be provided to allow a user to specify which account should be associated with the job.

The amount of each job's allocation charge is directly associated with the amount of resources used (processors) by that job and the amount of time it was used. Optionally, the allocation manager can also be configured to charge accounts varying amounts based on the QoS desired by the job, the type of compute resources used, and the time when the resources were used (both in terms of time of day and day of week).

The allocation manager interface provides near real-time allocation management, giving a great deal of flexibility and control over how available compute resources are used over the medium- and long-term, and works hand-in-hand with other job management features such as Moab's usage limit policies and fairshare mechanism.

Note The ENFORCEACCOUNTACCESS parameter controls whether the scheduler enforces account constraints.

6.4.2.1 Configuring the Allocation Manager Interface

Moab's allocation manager interface(s) are defined using the AMCFG parameter. This parameter allows specification of key aspects of the interface as shown in the following table:

APPENDMACHINENAME CHARGEPOLICY FALLBACKACCOUNT FALLBACKQOS
FLUSHINTERVAL FLAGS NODECHARGEPOLICY SERVER
SOCKETPROTOCOL STRICTQUOTE TIMEOUT WIREPROTOCOL
JOBFAILUREACTION
APPENDMACHINENAME
BOOLEAN
FALSE
If specified, Moab appends the machine name to the consumer account to create a unique account name per cluster.
AMCFG[tg13] APPENDMACHINENAME=TRUE
Moab appends the machine name to each account before making a debit from the allocation manager.
   
CHARGEPOLICY
one of DEBITALLWC, DEBITALLCPU, DEBITALLPE, DEBITALLBLOCKED, DEBITSUCCESSFULWC, DEBITSUCCESSFULCPU, DEBITSUCCESSFULPE, or DEBITSUCCESSFULBLOCKED
DEBITSUCCESSFULWC
Specifies how consumed resources should be charged against the consumer's credentials. See Charge Policy Overview for details.
AMCFG[bank] CHARGEPOLICY=DEBITALLCPU
Allocation charges are based on actual CPU usage only, not dedicated CPU resources.
Note If the LOCALCOST flag (AMCFG[] FLAGS=LOCALCOST) is set, Moab uses the information gathered with CHARGEPOLICY to calculate charges. If LOCALCOST is not set, Moab sends this information to Gold to calculate charges.
   
FALLBACKACCOUNT
STRING
---
If specified, Moab verifies adequate allocations for all new jobs. If adequate allocations are not available in the job's primary account, Moab changes the job's credentials to use the fallback account. If not specified, Moab places a hold on jobs that do not have adequate allocations in their primary account.
AMCFG[bank] FALLBACKACCOUNT=freecycle
Moab assigns the account freecycle to jobs that do not have adequate allocations in their primary account.

Note When both FALLBACKACCOUNT and FALLBACKQOS are specified, only FALLBACKACCOUNT takes effect.
   
FALLBACKQOS
STRING
---
If specified, Moab verifies adequate allocations for all new jobs. If adequate allocations are not available in the job's primary QoS, Moab changes the job's credentials to use the fallback QoS. If not specified, Moab places a hold on jobs that do not have adequate allocations in their primary QoS.
AMCFG[bank] FALLBACKQOS=freecycle
Moab assigns the QoS freecycle to jobs that do not have adequate allocations in their primary QoS.

Note When both FALLBACKACCOUNT and FALLBACKQOS are specified, only FALLBACKACCOUNT takes effect.
   
FLAGS
<STRING>
---
AMCFG flags are used to enable special services.
AMCFG[xxxx] FLAGS=LOCALCOST
Moab calculates the charge for the job locally and sends that as a charge to Gold, which then charges that amount for the job. This prevents Gold from having to calculate the charge for the job itself.
   
FLUSHINTERVAL
[[[DD:]HH:]MM:]SS
24:00:00
Indicates the amount of time between allocation manager debits for long running reservation and job based charges.
AMCFG[bank] FLUSHINTERVAL=12:00:00
Moab updates its charges every twelve hours for long running jobs and reservations.
   
JOBFAILUREACTION
<SERVERFAILUREACTION>[,<FUNDSFAILUREACTION>] where the action is one of CANCEL, HOLD, IGNORE, or RETRY
IGNORE,HOLD
The server failure action is taken if the allocation manager is down or otherwise unresponsive. The funds failure action is taken if the allocation manager reports that insufficient allocations are available to execute the job under the given user and account. If the action is set to CANCEL, Moab cancels the job; if set to HOLD, Moab defers the job; if set to IGNORE, Moab ignores the failure and continues to start the job; if set to RETRY, Moab does not start the job on this attempt but will attempt to start the job at the next opportunity.
AMCFG[wg13] JOBFAILUREACTION=HOLD

Allocation management is strictly enforced, preventing jobs from starting if the allocation manager is unavailable.

   
NODECHARGEPOLICY
one of AVG, MAX, or MIN
MIN
When charging for resource usage, the allocation manager will charge by node allocation according to the specified policy. For AVG, MAX, and MIN, the allocation manager will charge by the average, maximum, and minimum node charge rate of all allocated nodes. (Also see CHARGEPOLICY attribute.)
NODECFG[node01]  CHARGERATE=1.5
NODECFG[node02]  CHARGERATE=1.75
AMCFG[wg13] NODECHARGEPOLICY=MAX
Allocation management charges jobs by the maximum allocated node's charge rate.
   
SERVER
URL
N/A
Specifies the type and location of the allocation manager service. If the keyword ANY is specified instead of a URL, Moab will use the local service directory to locate the allocation manager.

Note The URL protocol must be one of file or gold.
AMCFG[bio-sys] SERVER=gold://tiny.supercluster.org:4368
   
SOCKETPROTOCOL
one of SUTCP, SSS-HALF, HTTP, or SSS-CHALLENGE
SSS-HALF
Specifies the socket protocol to be used for scheduler-allocation manager communication.
AMCFG[bank] SOCKETPROTOCOL=SSS-CHALLENGE
   
TIMEOUT
[[[DD:]HH:]MM:]SS
15
Specifies the maximum delay allowed for scheduler-allocation manager communications.
AMCFG[bank] TIMEOUT=30
   
WIREPROTOCOL
one of AVP, HTML, SSS2, or XML
XML
Specifies the wire protocol to be used for scheduler-allocation manager communication.
AMCFG[bank] WIREPROTOCOL=SSS2

The first step to configure the allocation manager involves specifying where the allocation service can be found. This is accomplished by setting the AMCFG parameter's SERVER attribute to the appropriate URL.

After the interface URL is specified, secure communications between scheduler and allocation manager must be enabled. As with other interfaces, this is configured using the CLIENTCFG parameter within the moab-private.cfg file as described in the Security Appendix. In the case of an allocation manager, the KEY and AUTHTYPE attributes should be set to values defined during initial allocation manager build and configuration as in the following example:

CLIENTCFG[AM:bank] KEY=mysecr3t AUTHTYPE=HMAC64

6.4.2.2 AMCFG Flags

AMCFG flags can be used to enable special services and to disable default services. These services are enabled/disabled by setting the AMCFG FLAGS attribute.

Flag Name Description
When this flag is set, logic failures within the Allocation Manager are treated as fund failures and are canceled. When ACCOUNTFAILASFUNDS is not set, Allocation Manager failures are treated as a server failure and the result is a job which requests an account to which the user does not have access.
Moab calculates the charge for the job locally and sends that as a charge to Gold, which then charges the amount for the job, instead of calculating the charge in Gold. This flag has only been tested for the Gold allocation manager.
Sends an estimated process count from Moab to Gold when an initial quote is requested for a newly-submitted job.

6.4.2.3 Allocation Management Policies

In most cases, the scheduler interfaces with a peer service. (If the protocol FILE is specified, the allocation manager transactions are written to the specified flat file.) With all peer services based allocation managers, the scheduler checks with the allocation manager before starting any job. For allocation tracking to work, however, each job must specify an account to charge or the allocation manager must be set up to handle default accounts on a per user basis.

Under this configuration, when Moab starts a job, it contacts the allocation manager and requests an allocation reservation (or lien) be placed on the associated account. This allocation reservation is equivalent to the total amount of allocation that could be consumed by the job (based on the job's wallclock limit) and is used to prevent the possibility of allocation over subscription. Moab then starts the job. When the job completes, Moab debits the amount of allocation actually consumed by the job from the job's account and then releases the allocation reservation, or lien.

These steps should be transparent to users. Only when an account has insufficient allocations to run a requested job will the presence of the allocation manager be noticed. If preferred, an account may be specified for use when a job's primary account is out of allocations. This account, specified using the AMCFG parameter's FALLBACKACCOUNT attribute, is often associated with a low QoS privilege and priority, and is often configured to run only when no other jobs are present.

The scheduler can also be configured to charge for reservations. One of the big hesitations with dedicating resources to a particular group is that if the resources are not used by that group, they go idle and are wasted. By configuring a reservation to be chargeable, sites can charge every idle cycle of the reservation to a particular project. When the reservation is in use, the consumed resources will be associated with the account of the job using the resources. When the resources are idle, the resources will be charged to the reservation's charge account. In the case of standing reservations, this account is specified using the parameter SRCFG attribute CHARGEACCOUNT. In the case of administrative reservations, this account is specified via a command line flag to the setres command.

Moab only interfaces to the allocation manager when running in NORMAL mode. However, this behavior can be overridden by setting the environment variable MOABAMTEST to any value. With this variable set, Moab attempts to interface to the allocation manager regardless of the scheduler's mode of operation.

Charge Metrics

The allocation manager interface allows a site to charge accounts in a number of different ways. Some sites may wish to charge for all jobs regardless of whether the job completed successfully. Sites may also want to charge based on differing usage metrics, such as dedicated wallclock time or processors actually used. Moab supports the following charge policies specified via the CHARGEPOLICY attribute:

  • DEBITALLWC - Charges all jobs regardless of job completion state using processor weighted wallclock time dedicated as the usage metric.
  • DEBITALLCPU - Charges all jobs based on processors used by job.
  • DEBITALLPE - Charges all jobs based on processor-equivalents dedicated to job,
  • DEBITALLBLOCKED - Charges all jobs based on processors dedicated and blocked according to node access policy or QoS node exclusivity.
  • DEBITREQUESTEDWC - Charges for reservations based on requested wallclock time. Only applicable when using virtual private clusters.
  • DEBITSUCCESSFULWC - Charges only jobs that successfully complete using processor weighted wallclock time dedicated as the usage metric. This is the default metric.
  • DEBITSUCCESSFULCPU - Charges only jobs that successfully complete using CPU time as the usage metric.
  • DEBITSUCCESSFULPE - Charges only jobs that successfully complete using PE weighted wallclock time dedicated as the usage metric.
  • DEBITSUCCESSFULBLOCKED - Charges only jobs that successfully complete based on processors dedicated and blocked according to node access policy or QoS node exclusivity.
Note On systems where job wallclock limits are specified, jobs that exceed their wallclock limits and are subsequently canceled by the scheduler or resource manager are considered to have successfully completed as far as charging is concerned, even though the resource manager may report these jobs as having been removed or canceled.
Note If machine-specific allocations are created within the allocation manager, the allocation manager machine name should be synchronized with the Moab resource manager name as specified with the RMCFG parameter, such as the name orion in RMCFG[orion] TYPE=PBS.
Note To control how jobs are charged when heterogeneous resources are allocated and per resource charges may vary within the job, use the NODECHARGEPOLICY attribute.
Note When calculating the cost of the job, Moab will use the most restrictive node access policy. See NODEACCESSPOLICY for more information.

Allocation Management Example

In the following example, Moab charges allocations according to blocked resources and records these charges in the specified file.

AMCFG[local] SERVER=file://opt/moab/chargelog.txt  CHARGEPOLICY=DEBITALLBLOCKED

NODEACCESSPOLICY          SINGLEJOB
...

6.4.2.3 Allocation Charge Rates

By default, Moab refers the decision of how much to charge to the allocation manager itself. However, if using the FILE Allocation Manager, job and reservation charge rates can be specified on a per-QoS basis using the DEDRESCOST parameter. If using the Gold Allocation Manager, per-QoS charge rates can be configured in Gold as demonstrated in these examples.

6.4.3.1 Gold Allocation Manager

Gold is an accounting and allocation management system developed at PNNL under the DOE Scalable Systems Software (SSS) project. Gold supports a dynamic approach to allocation tracking and enforcement with reservations, quotations, and so forth. It offers more flexible controls for managing access to computational resources and exhibits a more powerful query interface. Gold supports hierarchical project nesting. Journaling allows preservation of all historical state information.

Gold is dynamically extensible. New object/record types and their fields can be dynamically created and manipulated through the regular query language turning this system into a generalized accounting and information service. This capability offers custom accounting, meta-scheduler resource-mapping, and an external persistence interface.

Gold supports strong authentication and encryption and role based access control. Gold features a powerful web-based GUI for easy remote access for users, managers and administrators. Gold supports interaction with peer accounting systems with a traceback feature enabling it to function in a meta-scheduling or grid environment.

To configure a Gold allocation manager interface, set the SERVER attribute to point to the Gold server host and port; example follows:

moab.cfg:

AMCFG[bank] SERVER=gold://master.ufl.edu JOBFAILUREACTION=IGNORE TIMEOUT=15
...

moab-private.cfg:

CLIENTCFG[AM:bank] KEY=mysecr3t AUTHTYPE=HMAC64
...

Create the secret key by running make auth_key during configuration.

[root]# make auth_key

Monitor Mode

Gold can be enabled in an effective monitor-only mode where resource consumption is tracked but under no cases are jobs blocked or delayed based on allocation status. In this mode, full Gold reporting and accounting information is available.

  1. Create an account that is valid for all projects, users, and machines.

    > gmkaccount -n ANY -p ANY -u ANY -m ANY
    
    Successfully created Account 5
    

  2. Create an allocation with massive funds and no time bounds (using the account number created by the previous command).

    > gdeposit -a 5 1000000000000
    
    Successfully deposited 1000000000000 credits into account 5
    

  3. To prevent failures due to unknown users, users that don't belong to the specified projects, and so forth, edit goldd.conf to automatically create users, projects, and machines.

    user.autogen = true
    project.autogen = true
    machine.autogen = true
    

6.4.3.2 Native Allocation Manager

Note The native allocation manager interface model has not been tested with HPC workload; it has only been tested with VPC style clouds.

The native allocation manager permits Moab to interface with a separate allocation manager to perform allocation management functions such as charging, billing, charge queries, and so forth so long as the separate allocation manager uses a native Wiki interface.

The design for the native allocation manager interface (NAMI) is different from Gold. NAMI extracts logic from Moab and places it in the native software. Moab acts as the event engine for the native software. That is, Moab sends XML that defines an object to a variety of URLs that signify events. Moab currently supports the following URLs:

  • Create
  • Delete
  • Quote
  • Reserve
  • Charge

A user runs the mshow command and Moab calls to NAMI to get the QUOTE for the requested resources. If the TID is committed, then Moab calls the CREATE and RESERVE URLS for each object (reservation or job). Depending on the flush interval Moab periodically calls out to the CHARGE URL. When the object has reached its end of life Moab calls out to CHARGE and finally DELETE. Moab keeps track of the last time the object was charged, but it does not re-create reservations when restarting nor for intermittent charging. If Moab is down during a flush interval, then Moab does not attempt to catch up; it simply charges double the next flush interval.

The following is sample XML for the life of a particular object from Quote to Delete:

URL XML
<Reservation>
  <ObjectID>cost</ObjectID>
  <Processors>1</Processors>
  <WallDuration>12960000</WallDuration>
  <ChargeDuration>12960000</ChargeDuration>
</Reservation>
<Reservation>
  <ObjectID>host.1</ObjectID>
  <User>test</User>
  <Processors>1</Processors>
  <WallDuration>12959999</WallDuration>
  <ChargeDuration>12959999</ChargeDuration>
</Reservation>
<Reservation>
  <ObjectID>host.1</ObjectID>
  <User>test</User>
  <Account>blue</Account>
  <Processors>1</Processors>
  <WallDuration>12959999</WallDuration>
  <ChargeDuration>12959999</ChargeDuration>
  <Var name="VPCHOSTLIST">n05</Var>
  <Var name="VPCID">vpc.1</Var>
</Reservation>
<Reservation>
  <ObjectID>host.2</ObjectID>
  <User>test</User>
  <Account>blue</Account>
  <WallDuration>12959999</WallDuration>
  <ChargeDuration>108</ChargeDuration>
  <Var name="blue">green</Var>
  <Var name="VPCHOSTLIST">n05,GLOBAL</Var>
  <Var name="VPCID">vpc.1</Var>
  <GRes name="storage">100</GRes>
</Reservation>
<Reservation>
  <ObjectID>host.2</ObjectID>
  <User>test</User>
  <Account>blue</Account>
  <WallDuration>12959999</WallDuration>
  <ChargeDuration>12959999</ChargeDuration>
  <Var name="blue">green</Var>
  <Var name="VPCHOSTLIST">n05,GLOBAL</Var>
  <Var name="VPCID">vpc.1</Var>
  <GRes name="storage">100</GRes>
</Reservation>

Note that only the Quote URL should return any information. It should return nothing more than the cost of the object—no words—just the cost.

The following is a representation of how you might set up the native allocation manager interface in the Moab configuration file (moab.cfg):

AMCFG[bank] TYPE=NATIVE
AMCFG[bank] ChargeURL=exec://$HOME/tools/bank.charge.pl
AMCFG[bank] DeleteURL=exec:///$HOME/tools/bank.delete.pl
AMCFG[bank] CreateURL=exec:///$HOME/tools/bank.create.pl
AMCFG[bank] ReserveURL=exec:///$HOME/tools/bank.reserve.pl
AMCFG[bank] QuoteURL=exec:///$HOME/tools/bank.quote.pl
AMCFG[bank] FLUSHINTERVAL=hour

To view URL output, run mdiag -R -v. The following shows sample output from running the mdiag command:

AM[bank] Type: native State: 'Active'
  FlushPeriod: HOUR
  Charge URL: ChargeURL=exec:///$HOME/tools/charge.pl
  Delete URL: DeleteURL=exec:///$HOME/tools/delete.pl
  Quote URL: QuoteURL=exec:///$HOME/tools/quote.pl
  Reserve URL: ReserveURL=exec:///$HOME/tools/reserve.pl
  Create URL: CreateURL=exec:///$HOME/tools/create.pl

6.4.3.3 File Allocation Manager

The file allocation manager protocol allows a site to append job allocation records directly to a local file for batch processing by local allocation management systems. These records are line delimited with whitespace delimited attributes. Specifically, the file job usage record uses the following format:

WITHDRAWAL TYPE=job MACHINE=<MACHINENAME> ACCOUNT=<PROJECTNAME> USER=<USERNAME> PROCS=<PROCCOUNT> PROCCRATE=<PROCRATE> RESOURCETYPE=<NODETYPE> DURATION=<WALLDURATION> REQUESTID=<JOBID>

For example, the following record might be created:

WITHDRAWAL TYPE=job MACHINE=ia; ACCOUNT=s USER=jb PROCS=64 PROCCRATE=0.93 RESOURCETYPE=ia64 DURATION=60 REQUESTID=1632

To configure a file allocation manager interface, set the SERVER attribute to point to the local file pathname as follows:

AMCFG[local] SERVER=file:///opt/data/alloc.txt

See Also