6.4 Charging and Allocation Management

6.4.1 Charging and Allocation Management Overview

Note Either Moab HPC Suite 7.0 - Enterprise Edition or Moab Cloud Suite 7.0 are required for support of charging and allocation management capabilities.

An allocation manager is a software system that manages resource allocations. A resource allocation grants a job a right to use a particular amount of resources. While full details of each allocation manager may be found within its respective documentation, the following brief review highlights a few of the values of using such a system.

An allocation manager functions much like a bank in that it provides a form of currency that allows jobs to run on an HPC system. The owners of the resource (cluster/supercomputer) determine how they want the system to be used (often via an allocations committee) over a particular time frame-often a month, quarter, or year. To enforce their decisions, they distribute allocations to various projects via accounts. These allocations can be used for particular clusters or globally. They can also have time frames associated with them to establish an allocation cycle. All transaction information is typically stored in a database or directory server allowing extensive statistical and allocation tracking.

When using an allocation manager, each job must be associated with an account. To accomplish this with minimal user impact, the allocation manager could be set up to handle default accounts on a per-user basis. However, as is often the case, some users may be active on more than one project and thus have access to more than one account. In these situations, a mechanism such as a job command file keyword should be provided to allow a user to specify which account should be associated with the job.

The amount of each job's allocation charge is directly associated with the amount of resources used (processors) by that job and the amount of time it was used. Optionally, the allocation manager can also be configured to charge accounts varying amounts based on the QoS desired by the job, the type of compute resources used, and the time when the resources were used.

The allocation manager interface provides near real-time allocation management, giving a great deal of flexibility and control over how available compute resources are used over the medium- and long-term, and works hand-in-hand with other job management features such as Moab's usage limit policies and fairshare mechanism.

Supported allocation managers include Moab Accounting Manager (MAM), Native Allocation Manager Interface (NAMI), Gold Allocation Manager, and File.

MAM is a commercial charge-back accounting system that can be used in cloud or HPC environments. It is based on a version of Gold Allocation Manager and uses a script interface (NAMI) to communicate between Moab Workload Manager and Moab Accounting Manager, thus allowing a high level of customization. This type is a special case of the NAMI allocation manager type.

NAMI is a script-based interface that can be used to connect Moab Workload Manager to a third-party allocation manager system. Moab makes calls out to the scripts that handle the interaction with the external system.

The Gold Allocation Manager is an open-source charge system from the U.S. Department of Energy that has been used on mostly HPC environments. When configured to use Gold, Moab communicates with it directly using the Scalable Systems Software (SSS) wire protocol and message format.

The File allocation manager appends job charge records to a file. This file may then be parsed to generate usage reports or to rectify account balances in a third party system.

6.4.1.1 Configuring the Allocation Manager Interface

Configure Moab to use the Moab Accounting Manager by running ./configure with the applicable options when installing Moab:

The --with-am option specifies the accounting manager type that you want to use as either native, which is the default, or gold. Specifying this option will add the necessary entries into the moab.cfg file and cause the install process to copy config files, NAMI scripts, and libraries into place.

Use --with-am-dir to specify the prefix directory for Moab Accounting Manager if the native type is being used and it has been installed in a non-default location.

The --with-cloud option specifies that you are installing Moab in a cloud context (HPC is the default context) and makes some adjustments to the config files and interface scripts necessary for contextually appropriate charging behaviors. This option includes automatically setting the --with-am=native option, since the gold accounting manager type is not supported in the cloud context. If you are specifying a cloud context but do not wish to use the accounting manager, use the --without-am configure option.

The following is an example of configuring Moab charging for HPC:

./configure --with-am				

The following is an example of configuring Moab charging for cloud:

./configure --with-cloud				

If using a native allocation manager type, it will also be necessary to run make perldeps as root when installing so that the prerequisite bundled Perl modules are installed.

make perldeps				

If you want to configure one of the other types of allocation manager, follow the instructions in the appropriate section.

Moab's allocation manager interface(s) are defined using the AMCFG parameter. This parameter allows specification of key aspects of the interface as shown in the following table:

APPENDMACHINENAME CHARGEPOLICY FALLBACKACCOUNT FALLBACKQOS
FLUSHINTERVAL FLAGS NODECHARGEPOLICY SERVER
SOCKETPROTOCOL TIMEOUT WIREPROTOCOL JOBFAILUREACTION
APPENDMACHINENAME
BOOLEAN
FALSE
If specified, Moab appends the machine name to the consumer account to create a unique account name per cluster.
AMCFG[tg13] APPENDMACHINENAME=TRUE
Moab appends the machine name to each account before making a debit from the allocation manager.
   
CHARGEPOLICY
one of DEBITALLWC, DEBITALLCPU, DEBITALLPE, DEBITALLBLOCKED, DEBITSUCCESSFULWC, DEBITSUCCESSFULCPU, DEBITSUCCESSFULPE, or DEBITSUCCESSFULBLOCKED
DEBITSUCCESSFULWC
Specifies how consumed resources should be charged against the consumer's credentials. See Charge Policy Overview for details.
AMCFG[bank] CHARGEPOLICY=DEBITALLCPU
Allocation charges are based on actual CPU usage only, not dedicated CPU resources.
NoteIf the LOCALCOST flag (AMCFG[] FLAGS=LOCALCOST) is set, Moab uses the information gathered with CHARGEPOLICY to calculate charges. If LOCALCOST is not set, Moab sends this information to Gold to calculate charges.
   
FALLBACKACCOUNT
STRING
---
If specified, Moab verifies adequate allocations for all new jobs. If adequate allocations are not available in the job's primary account, Moab changes the job's credentials to use the fallback account. If not specified, Moab places a hold on jobs that do not have adequate allocations in their primary account.
AMCFG[bank] FALLBACKACCOUNT=freecycle
Moab assigns the account freecycle to jobs that do not have adequate allocations in their primary account.
   
FALLBACKQOS
STRING
---
If specified, Moab verifies adequate allocations for all new jobs. If adequate allocations are not available in the job's primary QoS, Moab changes the job's credentials to use the fallback QoS. If not specified, Moab places a hold on jobs that do not have adequate allocations in their primary QoS.
AMCFG[bank] FALLBACKQOS=freecycle
Moab assigns the QoS freecycle to jobs that do not have adequate allocations in their primary QoS.
   
FLAGS
<STRING>
---
AMCFG flags are used to enable special services.
AMCFG[xxxx] FLAGS=LOCALCOST
Moab calculates the charge for the job locally and sends that as a charge to Gold, which then charges that amount for the job. This prevents Gold from having to calculate the charge for the job itself.
   
FLUSHINTERVAL
[[[DD:]HH:]MM:]SS
24:00:00
Indicates the amount of time between allocation manager debits for long running reservation and job based charges.
AMCFG[bank] FLUSHINTERVAL=12:00:00
Moab updates its charges every twelve hours for long running jobs and reservations.
   
JOBFAILUREACTION
<SERVERFAILUREACTION>[,<FUNDSFAILUREACTION>] where the action is one of CANCEL, HOLD, IGNORE, or RETRY
IGNORE,HOLD
The server failure action is taken if the allocation manager is down or otherwise unresponsive. The funds failure action is taken if the allocation manager reports that insufficient allocations are available to execute the job under the given user and account. If the action is set to CANCEL, Moab cancels the job; if set to HOLD, Moab defers the job; if set to IGNORE, Moab ignores the failure and continues to start the job; if set to RETRY, Moab does not start the job on this attempt but will attempt to start the job at the next opportunity.
AMCFG[wg13] JOBFAILUREACTION=HOLD

Allocation management is strictly enforced, preventing jobs from starting if the allocation manager is unavailable.

   
NODECHARGEPOLICY
one of AVG, MAX, or MIN
MIN
When charging for resource usage, the allocation manager will charge by node allocation according to the specified policy. For AVG, MAX, and MIN, the allocation manager will charge by the average, maximum, and minimum node charge rate of all allocated nodes. (Also see CHARGEPOLICY attribute.)
NODECFG[node01]  CHARGERATE=1.5
NODECFG[node02]  CHARGERATE=1.75
AMCFG[wg13] NODECHARGEPOLICY=MAX
Allocation management charges jobs by the maximum allocated node's charge rate.
   
SERVER
URL
N/A
Specifies the type and location of the allocation manager service. If the keyword ANY is specified instead of a URL, Moab will use the local service directory to locate the allocation manager.

NoteThe URL protocol must be one of file, ggf, gold, or res.
AMCFG[bio-sys] SERVER=gold://tiny.supercluster.org:4368
   
SOCKETPROTOCOL
one of SUTCP, SSS-HALF, HTTP, or SSS-CHALLENGE
SSS-HALF
Specifies the socket protocol to be used for scheduler-allocation manager communication.
AMCFG[bank] SOCKETPROTOCOL=SSS-CHALLENGE
   
TIMEOUT
[[[DD:]HH:]MM:]SS
15
Specifies the maximum delay allowed for scheduler-allocation manager communications.
AMCFG[bank] TIMEOUT=30
   
WIREPROTOCOL
one of AVP, HTML, SSS2, or XML
XML
Specifies the wire protocol to be used for scheduler-allocation manager communication.
AMCFG[bank] WIREPROTOCOL=SSS2

The first step to configure the allocation manager involves specifying where the allocation service can be found. This is accomplished by setting the AMCFG parameter's SERVER attribute to the appropriate URL.

In the case of the Gold allocation manager, after the interface URL is specified, secure communications between scheduler and allocation manager must be enabled. As with other interfaces, this is configured using the CLIENTCFG parameter within the moab-private.cfg file as described in the Security Appendix. The KEY and AUTHTYPE attributes should be set to values defined during initial allocation manager build and configuration as in the following example:

CLIENTCFG[AM:bank] KEY=mysecr3t AUTHTYPE=HMAC64

6.4.1.2 AMCFG Flags

AMCFG flags can be used to enable special services and to disable default services. These services are enabled/disabled by setting the AMCFG FLAGS attribute.

Flag Name Description
When this flag is set, logic failures within the Allocation Manager are treated as fund failures and are canceled. When ACCOUNTFAILASFUNDS is not set, Allocation Manager failures are treated as a server failure and the result is a job which requests an account to which the user does not have access.
Moab calculates the charge for the job locally and sends that as a charge to Gold, which then charges the amount for the job, instead of calculating the charge in Gold. This flag has only been tested for the Gold allocation manager.
Sends an estimated process count from Moab to Gold when an initial quote is requested for a newly-submitted job.

6.4.1.3 Allocation Management Policies

In most cases, the scheduler interfaces with a peer service. (If the protocol FILE is specified, the allocation manager transactions are written to the specified flat file.) With all peer services based allocation managers, the scheduler checks with the allocation manager before starting any job. For allocation tracking to work, however, each job must specify an account to charge or the allocation manager must be set up to handle default accounts on a per user basis.

Under this configuration, when Moab starts a job, it contacts the allocation manager and requests an allocation reservation (or lien) be placed on the associated account. This allocation reservation is equivalent to the total amount of allocation that could be consumed by the job (based on the job's wallclock limit) and is used to prevent the possibility of allocation over subscription. Moab then starts the job. When the job completes, Moab debits the amount of allocation actually consumed by the job from the job's account and then releases the allocation reservation, or lien.

These steps should be transparent to users. Only when an account has insufficient allocations to run a requested job will the presence of the allocation manager be noticed. If preferred, an account may be specified for use when a job's primary account is out of allocations. This account, specified using the AMCFG parameter's FALLBACKACCOUNT attribute, is often associated with a low QoS privilege and priority, and is often configured to run only when no other jobs are present.

The scheduler can also be configured to charge for reservations. One of the big hesitations with dedicating resources to a particular group is that if the resources are not used by that group, they go idle and are wasted. By configuring a reservation to be chargeable, sites can charge every idle cycle of the reservation to a particular project. When the reservation is in use, the consumed resources will be associated with the account of the job using the resources. When the resources are idle, the resources will be charged to the reservation's charge account. In the case of standing reservations, this account is specified using the parameter SRCFG attribute CHARGEACCOUNT. In the case of administrative reservations, this account is specified via a command line flag to the setres command.

Moab only interfaces to the allocation manager when running in NORMAL mode.

Charge Metrics

The allocation manager interface allows a site to charge accounts in a number of different ways. Some sites may wish to charge for all jobs regardless of whether the job completed successfully. Sites may also want to charge based on differing usage metrics, such as dedicated wallclock time or processors actually used. Moab supports the following charge policies specified via the CHARGEPOLICY attribute:

Note On systems where job wallclock limits are specified, jobs that exceed their wallclock limits and are subsequently canceled by the scheduler or resource manager are considered to have successfully completed as far as charging is concerned, even though the resource manager may report these jobs as having been removed or canceled.
Note If machine-specific allocations are created within the allocation manager, the allocation manager machine name should be synchronized with the Moab resource manager name as specified with the RMCFGparameter, such as the name orion in RMCFG[orion] TYPE=PBS.
Note To control how jobs are charged when heterogeneous resources are allocated and per resource charges may vary within the job, use the NODECHARGEPOLICY attribute.
Note When calculating the cost of the job, Moab will use the most restrictive node access policy. See NODEACCESSPOLICY for more information.

Allocation Management Example

In the following example, Moab charges allocations according to blocked resources and records these charges in the specified file.

AMCFG[local] SERVER=file://opt/moab/chargelog.txt  CHARGEPOLICY=DEBITALLBLOCKED

NODEACCESSPOLICY          SINGLEJOB
...

6.4.1.4 Allocation Charge Rates

By default, Moab refers the decision of how much to charge to the allocation manager itself. However, if using the FILE Allocation Manager, job and reservation charge rates can be specified on a per-QoS basis using the DEDRESCOST parameter. If using the Gold Allocation Manager, per-QoS charge rates can be configured in Gold as demonstrated in these examples.

6.4.2 Allocation Manager Types

Moab supports five allocation manager types: Moab Accounting Manager (MAM), Gold Allocation Manager, File, GGF, and NAMI (for third party integration).

6.4.2.1 Moab Accounting Manager

Moab Accounting Manager is an accounting management system that provides usage tracking, charge accounting, and allocation enforcement for resource or service usage in cloud and technical computing environments. It acts much like a bank in which credits are deposited into accounts with constraints designating which entities may access the account. As resources or services are utilized, accounts are charged and usage recorded. MAM supports familiar operations such as deposits, withdrawals, transfers, and refunds and provides balance and usage feedback to users, managers, and system administrators.

To configure Moab to use MAM for allocation management, use the configure options as described in the Configuring the Allocation Manager Interface section above.

Example:

./configure –with-am ...

Consequently, make install will add the essential allocation manager entries into moab.cfg and install the bank-related scripts ($PREFIX/tools/gold/bank.*.gold.pl) and configuration files ($MOABHOMEDIR/etc/{nami.cfg,config.gold.pl}) in the correct place.

The following are typical entries in the moab.cfg file that support MAM for the HPC context:

AMCFG[mam] TYPE=NATIVE
AMCFG[mam] CHARGEURL=exec://$TOOLSDIR/gold/bank.charge.gold.pl
AMCFG[mam] RESERVEURL=exec://$TOOLSDIR/gold/bank.reserve.gold.pl
AMCFG[mam] DELETEURL=exec://$TOOLSDIR/gold/bank.delete.gold.pl
AMCFG[mam] RESERVEFAILUREACTION=IGNORE

The RESERVEFAILUREACTION parameter specifies the action that should be taken if Moab fails to secure a hold against the funds in the bank account at job start time. Valid values are IGNORE, HOLD, CANCEL and RETRY. If the action is IGNORE, the job will be allowed to run regardless of whether or not the reservation was successful. If the action is HOLD, the job will be placed on system hold if the reservation fails. If the action is CANCEL, the job will be canceled. If the action is RETRY, the job will be left in the queue to be scheduled and attempted again in subsequent iterations.

Moab Accounting Manager should be installed, started and initialized. The simplest procedure is to install it on the same server as Moab Workload Manager so that the Gold libraries and configuration files can be shared by the MAM and Moab Web Service (MWS) scripts.

Follow the discussion in the Getting Started chapter of the Moab Accounting Manager User Guide for examples of how to initialize MAM for your initial mode of operation.

6.4.2.2 Gold Allocation Manager

Gold is an accounting and allocation management system developed at PNNL under the DOE Scalable Systems Software (SSS) project. Gold supports a dynamic approach to allocation tracking and enforcement with reservations, quotations, and so forth. It offers more flexible controls for managing access to computational resources and exhibits a more powerful query interface. Gold supports hierarchical project nesting. Journaling allows preservation of all historical state information.

Gold is dynamically extensible. New object/record types and their fields can be dynamically created and manipulated through the regular query language turning this system into a generalized accounting and information service. This capability offers custom accounting, meta-scheduler resource-mapping, and an external persistence interface.

Gold supports strong authentication and encryption and role based access control. Gold features a powerful web-based GUI for easy remote access for users, managers and administrators. Gold supports interaction with peer accounting systems with a traceback feature enabling it to function in a meta-scheduling or grid environment.

To configure a Gold allocation manager interface, set the SERVER attribute to point to the Gold server host and port; example follows:

moab.cfg:

AMCFG[bank] SERVER=gold://master.ufl.edu JOBFAILUREACTION=IGNORE TIMEOUT=15
...

moab-private.cfg:

CLIENTCFG[AM:bank] KEY=mysecr3t AUTHTYPE=HMAC64
...

Synchronize the secret key with Gold by running make auth_key with the same key value you used during the Gold install process.

Monitor Mode

Gold can be enabled in an effective monitor-only mode where resource consumption is tracked but under no cases are jobs blocked or delayed based on allocation status. In this mode, full Gold reporting and accounting information is available.

  1. Create an account that is valid for all projects, users, and machines.
    > gmkaccount -n ANY -p ANY -u ANY -m ANY
    Successfully created Account 5
    
  2. Create an allocation with massive funds and no time bounds (using the account number created by the previous command).
    > gdeposit -a 5 1000000000000
    Successfully deposited 1000000000000 credits into account 5
    
  3. To prevent failures due to unknown users, users that don't belong to the specified projects, and so forth, edit goldd.conf to automatically create users, projects, and machines.
    user.autogen = true
    project.autogen = true
    machine.autogen = true
    

6.4.2.3 Native Allocation Manager

The native allocation manager permits Moab to interface with a separate allocation manager to perform allocation management functions such as charging, billing, charge queries, and so forth so long as the separate allocation manager uses a native Wiki interface.

The design for the native allocation manager interface (NAMI) is different from Gold. NAMI extracts logic from Moab and places it in the native software. Moab acts as the event engine for the native software. That is, Moab sends XML that defines an object to a variety of URLs that signify events. Moab currently supports the following URLs:

A user runs the mshow command and Moab calls to NAMI to get the QUOTE for the requested resources. If the TID is committed, then Moab calls the CREATE and RESERVE URLS for each object (reservation or job). Depending on the flush interval Moab periodically calls out to the CHARGE URL. When the object has reached its end of life Moab calls out to CHARGE and finally DELETE. Moab keeps track of the last time the object was charged, but it does not re-create reservations when restarting nor for intermittent charging. If Moab is down during a flush interval, then Moab does not attempt to catch up; it simply charges double the next flush interval.

The following is sample XML for the life of a particular object from Quote to Delete:

URL XML
<Reservation>
  <ObjectID>cost</ObjectID>
  <Processors>1</Processors>
  <WallDuration>12960000</WallDuration>
  <ChargeDuration>12960000</ChargeDuration>
</Reservation>
<Reservation>
  <ObjectID>host.1</ObjectID>
  <User>test</User>
  <Processors>1</Processors>
  <WallDuration>12959999</WallDuration>
  <ChargeDuration>12959999</ChargeDuration>
</Reservation>
<Reservation>
  <ObjectID>host.1</ObjectID>
  <User>test</User>
  <Account>blue</Account>
  <Processors>1</Processors>
  <WallDuration>12959999</WallDuration>
  <ChargeDuration>12959999</ChargeDuration>
</Reservation>
<Reservation>
  <ObjectID>host.2</ObjectID>
  <User>test</User>
  <Account>blue</Account>
  <WallDuration>12959999</WallDuration>
  <ChargeDuration>108</ChargeDuration>
  <Var name="blue">green</Var>
  <GRes name="storage">100</GRes>
</Reservation>
<Reservation>
  <ObjectID>host.2</ObjectID>
  <User>test</User>
  <Account>blue</Account>
  <WallDuration>12959999</WallDuration>
  <ChargeDuration>12959999</ChargeDuration>
  <Var name="blue">green</Var>
  <GRes name="storage">100</GRes>
</Reservation>

Note that only the Quote URL should return any information. It should return nothing more than the cost of the object-no words-just the cost.

The following is a representation of how you might set up the native allocation manager interface in the Moab configuration file (moab.cfg):

AMCFG[bank] TYPE=NATIVE
AMCFG[bank] ChargeURL=exec://$HOME/tools/bank.charge.pl
AMCFG[bank] DeleteURL=exec:///$HOME/tools/bank.delete.pl
AMCFG[bank] CreateURL=exec:///$HOME/tools/bank.create.pl
AMCFG[bank] ReserveURL=exec:///$HOME/tools/bank.reserve.pl
AMCFG[bank] QuoteURL=exec:///$HOME/tools/bank.quote.pl
AMCFG[bank] FLUSHINTERVAL=hour

To view URL output, run mdiag -R -v. The following shows sample output from running the mdiag command:

AM[bank] Type: native State: 'Active'
  FlushPeriod: HOUR
  Charge URL: ChargeURL=exec:///$HOME/tools/charge.pl
  Delete URL: DeleteURL=exec:///$HOME/tools/delete.pl
  Quote URL: QuoteURL=exec:///$HOME/tools/quote.pl
  Reserve URL: ReserveURL=exec:///$HOME/tools/reserve.pl
  Create URL: CreateURL=exec:///$HOME/tools/create.pl

6.4.2.4 File Allocation Manager

The file allocation manager protocol allows a site to append job allocation records directly to a local file for batch processing by local allocation management systems. These records are line delimited with whitespace delimited attributes. Specifically, the file job usage record uses the following format:

WITHDRAWAL TYPE=job MACHINE=<MACHINENAME> ACCOUNT=<PROJECTNAME> USER=<USERNAME> PROCS=<PROCCOUNT> PROCCRATE=<PROCRATE> RESOURCETYPE=<NODETYPE> DURATION=<WALLDURATION> REQUESTID=<JOBID>

For example, the following record might be created:

WITHDRAWAL TYPE=job MACHINE=ia; ACCOUNT=s USER=jb PROCS=64 PROCCRATE=0.93 RESOURCETYPE=ia64 DURATION=60 REQUESTID=1632

To configure a file allocation manager interface, set the SERVER attribute to point to the local file pathname as follows:

AMCFG[local] SERVER=file:///opt/data/alloc.txt

6.4.2.5 GGF Allocation Manager

The Global Grid Forum (GGF) protocol behaves much like the file allocation manager protocol allowing a site to append job allocation records directly to a local file for batch processing by local allocation management systems. These records follow the GGF XML Usage Record format as defined by the GGF Usage Record Format Recommendation (PDF). Specifically, this interface reports the following usage record attributes:

With this interface, the usage record file is updated each time a job completes.

To configure a GGF allocation manager interface, set the SERVER attribute to point to the local file pathname as follows:

AMCFG[local] SERVER=ggf:///opt/data/alloc.txt

See Also

Copyright © 2012 Adaptive Computing Enterprises, Inc.®