You are here: 6.0 Managing Fairness - Throttling Policies, Fairshare, and Allocation Management > Charging and Allocation Management
|
|
6.4 Charging and Allocation Management |
Charging is the process of assigning a value to the use of resources and tracking this usage on a per consumer basis. Often, charging is accompanied by a corresponding assignment of resources (an allocation) to each consumer. Within Moab, charging can be quite flexible. Moab supports the following:
An allocation manager (also known as an allocation bank or CPU bank) is a software system that manages resource allocations. A resource allocation grants a job a right to use a particular amount of resources. While full details of each allocation manager may be found within its respective documentation, the following brief review highlights a few of the values of using such a system.
An allocation manager functions much like a bank in that it provides a form of currency that allows jobs to run on an HPC system. The owners of the resource (cluster/supercomputer) determine how they want the system to be used (often via an allocations committee) over a particular time frame-often a month, quarter, or year. To enforce their decisions, they distribute allocations to various projects via accounts and assign each account an account manager. These allocations can be used for particular machines or globally. They can also have activation and expiration dates associated with them. All transaction information is typically stored in a database or directory server allowing extensive statistical and allocation tracking.
Each account manager determines how the allocations are made available to individual users within a project. Allocation managers such as Gold (from U.S. Dept of Energy) allow the account manager to dedicate portions of the overall allocation to individual users, specify some allocations as shared by all users, and hold some of the allocations in reserve for later use.
When using an allocations manager, each job must be associated with an account. To accomplish this with minimal user impact, the allocation manager could be set up to handle default accounts on a per user basis. However, as is often the case, some users may be active on more than one project and thus have access to more than one account. In these situations, a mechanism, such as a job command file keyword, should be provided to allow a user to specify which account should be associated with the job.
The amount of each job's allocation charge is directly associated with the amount of resources used (processors) by that job and the amount of time it was used. Optionally, the allocation manager can also be configured to charge accounts varying amounts based on the QoS desired by the job, the type of compute resources used, and the time when the resources were used (both in terms of time of day and day of week).
The allocation manager interface provides near real-time allocation management, giving a great deal of flexibility and control over how available compute resources are used over the medium- and long-term, and works hand-in-hand with other job management features such as Moab's usage limit policies and fairshare mechanism.
The ENFORCEACCOUNTACCESS parameter controls whether the scheduler enforces account constraints. |
Moab's allocation manager interface(s) are defined using the AMCFG parameter. This parameter allows specification of key aspects of the interface as shown in the following table:
APPENDMACHINENAME
CHARGEPOLICY FALLBACKACCOUNT FALLBACKQOS |
FLAGS
FLUSHINTERVAL JOBFAILUREACTION SERVER |
SOCKETPROTOCOL
TIMEOUT WIREPROTOCOL |
APPENDMACHINENAME | |||
Format: | BOOLEAN | ||
Default: | FALSE | ||
Description: | If specified, Moab appends the machine name to the consumer account to create a unique account name per cluster. | ||
Example: |
AMCFG[tg13] APPENDMACHINENAME=TRUE |
||
CHARGEPOLICY | |||
Format: | one of DEBITALLWC, DEBITALLCPU, DEBITALLPE, DEBITALLBLOCKED, DEBITSUCCESSFULWC, DEBITSUCCESSFULCPU, DEBITSUCCESSFULPE, or DEBITSUCCESSFULBLOCKED | ||
Default: | DEBITSUCCESSFULWC | ||
Description: | Specifies how consumed resources should be charged against the consumer's credentials. See Charge Policy Overview for details. | ||
Example: |
AMCFG[bank] CHARGEPOLICY=DEBITALLCPU
|
||
FALLBACKACCOUNT | |||
Format: | STRING | ||
Default: | --- | ||
Description: | If specified, Moab verifies adequate allocations for all new jobs. If adequate allocations are not available in the job's primary account, Moab changes the job's credentials to use the fallback account. If not specified, Moab places a hold on jobs that do not have adequate allocations in their primary account. | ||
Example: |
AMCFG[bank] FALLBACKACCOUNT=freecycle |
||
FALLBACKQOS | |||
Format: | STRING | ||
Default: | --- | ||
Description: | If specified, Moab verifies adequate allocations for all new jobs. If adequate allocations are not available in the job's primary QoS, Moab changes the job's credentials to use the fallback QoS. If not specified, Moab places a hold on jobs that do not have adequate allocations in their primary QoS. | ||
Example: |
AMCFG[bank] FALLBACKQOS=freecycle |
||
FLAGS | |||
Format: | <STRING> | ||
Default: | --- | ||
Description: | AMCFG flags are used to enable special services. | ||
Example: |
AMCFG[xxxx] FLAGS=LOCALCOST |
||
FLUSHINTERVAL | |||
Format: | [[[DD:]HH:]MM:]SS | ||
Default: | 24:00:00 | ||
Description: | Indicates the amount of time between allocation manager debits for long running reservation and job based charges. | ||
Example: |
AMCFG[bank] FLUSHINTERVAL=12:00:00 |
||
JOBFAILUREACTION | |||
Format: | <SERVERFAILUREACTION>[,<FUNDSFAILUREACTION>] where the action is one of CANCEL, HOLD, IGNORE, or RETRY | ||
Default: | IGNORE,HOLD | ||
Description: | The server failure action is taken if the allocation
manager is down or otherwise unresponsive. The funds failure action is taken if
the allocation manager reports that insufficient allocations are available to
execute the job under the given user and account. If the action is set to
CANCEL, Moab cancels the job; if set to HOLD, Moab defers the job;
if set to IGNORE, Moab ignores the failure and continues to start the job;
if set to RETRY, Moab does not start the job on this attempt but will
attempt to start the job at the next opportunity. |
||
Example: |
AMCFG[wg13] JOBFAILUREACTION=HOLD Allocation management is strictly enforced, preventing jobs from starting if the allocation manager is unavailable. |
||
SERVER | |||
Format: | URL | ||
Default: | N/A | ||
Description: | Specifies the type and location of the allocation manager service. If the
keyword ANY is specified instead of a URL, Moab will use the local
service directory to locate the allocation manager.
|
||
Example: |
AMCFG[bio-sys] SERVER=gold://tiny.supercluster.org:4368 |
||
SOCKETPROTOCOL | |||
Format: | one of SUTCP, SSS-HALF, HTTP, or SSS-CHALLENGE | ||
Default: | SSS-HALF | ||
Description: | Specifies the socket protocol to be used for scheduler-allocation manager communication. | ||
Example: |
AMCFG[bank] SOCKETPROTOCOL=SSS-CHALLENGE |
||
TIMEOUT | |||
Format: | [[[DD:]HH:]MM:]SS | ||
Default: | 15 | ||
Description: | Specifies the maximum delay allowed for scheduler-allocation manager communications. | ||
Example: |
AMCFG[bank] TIMEOUT=30 |
||
WIREPROTOCOL | |||
Format: | one of AVP, HTML, SSS2, or XML | ||
Default: | XML | ||
Description: | Specifies the wire protocol to be used for scheduler-allocation manager communication. | ||
Example: |
AMCFG[bank] WIREPROTOCOL=SSS2 |
The first step to configure the allocation manager involves specifying where the allocation service can be found. This is accomplished by setting the AMCFGparameter's SERVER attribute to the appropriate URL.
After the interface URL is specified, secure communications between scheduler and allocation manager must be enabled. As with other interfaces, this is configured using the CLIENTCFG parameter within the moab-private.cfg file as described in the Security Appendix. In the case of an allocation manager, the KEY and AUTHTYPEattributes should be set to values defined during initial allocation manager build and configuration as in the following example:
CLIENTCFG[AM:bank] KEY=mysecr3t AUTHTYPE=HMAC64
AMCFG flags can be used to enable special services and to disable default services. These services are enabled/disabled by setting the AMCFG FLAGS attribute.
In most cases, the scheduler interfaces with a peer service. (If the protocol FILE is specified, the allocation manager transactions are written to the specified flat file.) With all peer services based allocation managers, the scheduler checks with the allocation manager before starting any job. For allocation tracking to work, however, each job must specify an account to charge or the allocation manager must be set up to handle default accounts on a per user basis.
Under this configuration, when Moab starts a job, it contacts the allocation manager and requests an allocation reservation (or lien) be placed on the associated account. This allocation reservation is equivalent to the total amount of allocation that could be consumed by the job (based on the job's wallclock limit) and is used to prevent the possibility of allocation over subscription. Moab then starts the job. When the job completes, Moab debits the amount of allocation actually consumed by the job from the job's account and then releases the allocation reservation, or lien.
These steps should be transparent to users. Only when an account has insufficient allocations to run a requested job will the presence of the allocation manager be noticed. If preferred, an account may be specified for use when a job's primary account is out of allocations. This account, specified using the AMCFG parameter's FALLBACKACCOUNT attribute, is often associated with a low QoS privilege and priority, and is often configured to run only when no other jobs are present.
The scheduler can also be configured to charge for reservations. One of the big hesitations with dedicating resources to a particular group is that if the resources are not used by that group, they go idle and are wasted. By configuring a reservation to be chargeable, sites can charge every idle cycle of the reservation to a particular project. When the reservation is in use, the consumed resources will be associated with the account of the job using the resources. When the resources are idle, the resources will be charged to the reservation's charge account. In the case of standing reservations, this account is specified using the parameter SRCFG attribute CHARGEACCOUNT. In the case of administrative reservations, this account is specified via a command line flag to the setres command.
Moab only interfaces to the allocation manager when running in NORMAL mode.
The allocation manager interface allows a site to charge accounts in a number of different ways. Some sites may wish to charge for all jobs regardless of whether the job completed successfully. Sites may also want to charge based on differing usage metrics, such as dedicated wallclock time or processors actually used. Moab supports the following charge policies specified via the CHARGEPOLICY attribute:
On systems where job wallclock limits are specified, jobs that exceed their wallclock limits and are subsequently canceled by the scheduler or resource manager are considered to have successfully completed as far as charging is concerned, even though the resource manager may report these jobs as having been removed or canceled. |
If machine-specific allocations are created within the allocation manager, the allocation manager machine name should be synchronized with the Moab resource manager name as specified with the RMCFGparameter, such as the name orion in RMCFG[orion] TYPE=PBS. |
To control how jobs are charged when heterogeneous resources are allocated and per resource charges may vary within the job, use the NODECHARGEPOLICY attribute. |
When calculating the cost of the job, Moab will use the most restrictive node access policy. See NODEACCESSPOLICY for more information. |
Allocation Management Example
In the following example, Moab charges allocations according to blocked resources and records these charges in the specified file.
AMCFG[local] SERVER=file://opt/moab/chargelog.txt CHARGEPOLICY=DEBITALLBLOCKED NODEACCESSPOLICY SINGLEJOB ...
By default, Moab refers the decision of how much to charge to the allocation manager itself. However, if using the FILE Allocation Manager, job and reservation charge rates can be specified on a per-QoS basis using the DEDRESCOST parameter. If using the Gold Allocation Manager, per-QoS charge rates can be configured in Gold as demonstrated in these examples.
Gold is an accounting and allocation management system developed at PNNL under the DOE Scalable Systems Software (SSS) project. Gold supports a dynamic approach to allocation tracking and enforcement with reservations, quotations, and so forth. It offers more flexible controls for managing access to computational resources and exhibits a more powerful query interface. Gold supports hierarchical project nesting. Journaling allows preservation of all historical state information.
Gold is dynamically extensible. New object/record types and their fields can be dynamically created and manipulated through the regular query language turning this system into a generalized accounting and information service. This capability offers custom accounting, meta-scheduler resource-mapping, and an external persistence interface.
Gold supports strong authentication and encryption and role based access control. Gold features a powerful web-based GUI for easy remote access for users, managers and administrators. Gold supports interaction with peer accounting systems with a traceback feature enabling it to function in a meta-scheduling or grid environment.
To configure a Gold allocation manager interface, set the SERVER attribute to point to the Gold server host and port; example follows:
moab.cfg:
AMCFG[bank] SERVER=gold://master.ufl.edu JOBFAILUREACTION=IGNORE TIMEOUT=15 ...
moab-private.cfg:
CLIENTCFG[AM:bank] KEY=mysecr3t AUTHTYPE=HMAC64 ...
Create the secret key by running make auth_key during configuration.
[root]# make auth_key
Monitor Mode
Gold can be enabled in an effective monitor-only mode where resource consumption is tracked but under no cases are jobs blocked or delayed based on allocation status. In this mode, full Gold reporting and accounting information is available.
> gmkaccount -n ANY -p ANY -u ANY -m ANY Successfully created Account 5
> gdeposit -a 5 1000000000000 Successfully deposited 1000000000000 credits into account 5
user.autogen = true project.autogen = true machine.autogen = true
The native allocation manager interface model has not been tested with HPC workload; it has only been tested with VPC style clouds. |
The native allocation manager permits Moab to interface with a separate allocation manager to perform allocation management functions such as charging, billing, charge queries, and so forth so long as the separate allocation manager uses a native Wiki interface.
The design for the native allocation manager interface (NAMI) is different from Gold. NAMI extracts logic from Moab and places it in the native software. Moab acts as the event engine for the native software. That is, Moab sends XML that defines an object to a variety of URLs that signify events. Moab currently supports the following URLs:
A user runs the mshow command and Moab calls to NAMI to get the QUOTE for the requested resources. If the TID is committed, then Moab calls the CREATE and RESERVE URLS for each object (reservation or job). Depending on the flush interval Moab periodically calls out to the CHARGE URL. When the object has reached its end of life Moab calls out to CHARGE and finally DELETE. Moab keeps track of the last time the object was charged, but it does not re-create reservations when restarting nor for intermittent charging. If Moab is down during a flush interval, then Moab does not attempt to catch up; it simply charges double the next flush interval.
The following is sample XML for the life of a particular object from Quote to Delete:
URL | XML |
---|---|
Quote |
<Reservation> <ObjectID>cost</ObjectID> <Processors>1</Processors> <WallDuration>12960000</WallDuration> <ChargeDuration>12960000</ChargeDuration> </Reservation> |
Create |
<Reservation> <ObjectID>host.1</ObjectID> <User>test</User> <Processors>1</Processors> <WallDuration>12959999</WallDuration> <ChargeDuration>12959999</ChargeDuration> </Reservation> |
Reserve |
<Reservation> <ObjectID>host.1</ObjectID> <User>test</User> <Account>blue</Account> <Processors>1</Processors> <WallDuration>12959999</WallDuration> <ChargeDuration>12959999</ChargeDuration> <Var name="VPCHOSTLIST">n05</Var> <Var name="VPCID">vpc.1</Var> </Reservation> |
Charge |
<Reservation> <ObjectID>host.2</ObjectID> <User>test</User> <Account>blue</Account> <WallDuration>12959999</WallDuration> <ChargeDuration>108</ChargeDuration> <Var name="blue">green</Var> <Var name="VPCHOSTLIST">n05,GLOBAL</Var> <Var name="VPCID">vpc.1</Var> <GRes name="storage">100</GRes> </Reservation> |
Delete |
<Reservation> <ObjectID>host.2</ObjectID> <User>test</User> <Account>blue</Account> <WallDuration>12959999</WallDuration> <ChargeDuration>12959999</ChargeDuration> <Var name="blue">green</Var> <Var name="VPCHOSTLIST">n05,GLOBAL</Var> <Var name="VPCID">vpc.1</Var> <GRes name="storage">100</GRes> </Reservation> |
Note that only the Quote URL should return any information. It should return nothing more than the cost of the object-no words-just the cost.
The following is a representation of how you might set up the native allocation manager interface in the Moab configuration file (moab.cfg):
AMCFG[bank] TYPE=NATIVE AMCFG[bank] ChargeURL=exec://$HOME/tools/bank.charge.pl AMCFG[bank] DeleteURL=exec:///$HOME/tools/bank.delete.pl AMCFG[bank] CreateURL=exec:///$HOME/tools/bank.create.pl AMCFG[bank] ReserveURL=exec:///$HOME/tools/bank.reserve.pl AMCFG[bank] QuoteURL=exec:///$HOME/tools/bank.quote.pl AMCFG[bank] FLUSHINTERVAL=hour
To view URL output, run mdiag -R -v. The following shows sample output from running the mdiag command:
AM[bank] Type: native State: 'Active' FlushPeriod: HOUR Charge URL: ChargeURL=exec:///$HOME/tools/charge.pl Delete URL: DeleteURL=exec:///$HOME/tools/delete.pl Quote URL: QuoteURL=exec:///$HOME/tools/quote.pl Reserve URL: ReserveURL=exec:///$HOME/tools/reserve.pl Create URL: CreateURL=exec:///$HOME/tools/create.pl
The file allocation manager protocol allows a site to append job allocation records directly to a local file for batch processing by local allocation management systems. These records are line delimited with whitespace delimited attributes. Specifically, the file job usage record uses the following format:
WITHDRAWAL TYPE=job MACHINE=<MACHINENAME> ACCOUNT=<PROJECTNAME> USER=<USERNAME> PROCS=<PROCCOUNT> PROCCRATE=<PROCRATE> RESOURCETYPE=<NODETYPE> DURATION=<WALLDURATION> REQUESTID=<JOBID>
For example, the following record might be created:
WITHDRAWAL TYPE=job MACHINE=ia; ACCOUNT=s USER=jb PROCS=64 PROCCRATE=0.93 RESOURCETYPE=ia64 DURATION=60 REQUESTID=1632
To configure a file allocation manager interface, set the SERVER attribute to point to the local file pathname as follows:
AMCFG[local] SERVER=file:///opt/data/alloc.txt
The Global Grid Forum (GGF) protocol behaves much like the file allocation manager protocol allowing a site to append job allocation records directly to a local file for batch processing by local allocation management systems. These records follow the GGF XML Usage Record format as defined by the GGF Usage Record Format Recommendation (PDF). Specifically, this interface reports the following usage record attributes:
With this interface, the usage record file is updated each time a job completes.
To configure a GGF allocation manager interface, set the SERVER attribute to point to the local file pathname as follows:
AMCFG[local] SERVER=ggf:///opt/data/alloc.txt
Copyright © 2012 Adaptive Computing Enterprises, Inc.®