Moab supports the concept of credentials, which provide a means of attributing policy and resource access to entities such as users and groups. These credentials allow specification of job ownership, tracking of resource usage, enforcement of policies, and many other features. There are five types of credentials—user, group, account, class, and QoS. While the credentials have many similarities, each plays a slightly different role.
Internally, credentials are maintained as objects. Credentials can be created, destroyed, queried, and modified. They are associated with jobs and requests providing access and privileges. Each credential type has the following attributes:
All credentials represent a form of identity, and when applied to a job, express ownership. Consequently, jobs are subject to policies and limits associated with their owners.
Each credential may be assigned a priority using the PRIORITY attribute. This priority affects a job's total credential priority factor as described in the Priority Factors section. In addition, each credential may also specify priority weight offsets, which adjust priority weights that apply to associated jobs. These priority weight offsets include FSWEIGHT, QTWEIGHT, and XFWEIGHT.
# set priority weights CREDWEIGHT 1 USERWEIGHT 1 CLASSWEIGHT 1 SERVICEWEIGHT 1 XFACTORWEIGHT 10 QUEUETIMEWEIGHT 1000 # set credential priorities USERCFG[john] PRIORITY=200 CLASSCFG[batch] PRIORITY=15 CLASSCFG[debug] PRIORITY=100 QOSCFG[bottomfeeder] QTWEIGHT=-50 XFWEIGHT=100 ACCOUNTCFG[topfeeder] PRIORITY=100
Usage limits constrain which jobs may run, which jobs may be considered for scheduling, and what quantity of resources each individual job may consume. With usage limits, policies such as MAXJOB, MAXNODE, and MAXMEM may be enforced against both idle and active jobs. Limits may be applied in any combination as shown in the example below where usage limits include 32 active processors per group and 12 active jobs for user john. For a job to run, it must satisfy the most limiting policies of all associated credentials. The Throttling Policy section documents credential usage limits in detail.
Credential service targets allow jobs to obtain special treatment to meet usage or response time based metrics. Additional information about service targets can be found in the Fairshare section.
You can use the ALIST, PLIST, and QLIST attributes (shown in the following table) to specify the list of credentials or partitions that a given credential may access.
|Account||ALIST (allows credential to access specified list of accounts)|
|Partition||PLIST (allows credential to access specified list of partitions)|
|QoS||QLIST (allows credential to access specified list of QoS's)|
USERCFG[bob] ALIST=jupiter,quantum USERCFG[steve] ALIST=quantum
|Account-based access lists are only enforced if using an allocation manager or if the ENFORCEACCOUNTACCESS parameter is set to TRUE.|
Use the *DEF attribute (shown in the following table) to specify the default credential or partition for a particular credential.
|Account||ADEF (specifies default account)|
|Class||CDEF (specifies default class)|
|QoS||QDEF (specifies default QoS)|
# user bob can access accounts a2, a3, and a6. If no account is explicitly requested, # his job will be assigned to account a3 USERCFG[bob] ALIST=a2,a3,a6 ADEF=a3 # user steve can access accounts a14, a7, a2, a6, and a1. If no account is explicitly # requested, his job will be assigned to account a2 USERCFG[steve] ALIST=a14,a7,a2,a6,a1 ADEF=a2
As an alternate to specifying access lists, administrators may also specify membership lists. This allows a credential to specify who can access it rather than allowing each credential to specify which credentials it can access. Membership lists are controlled using the MEMBERULIST, EXCLUDEUSERLIST and REQUIREDUSERLIST attributes, shown in the following table:
|Account, Group, QoS||MEMBERULIST|
|Class||EXCLUDEUSERLIST and REQUIREDUSERLIST|
# account omega3 can only be accessed by users johnh, stevek, jenp ACCOUNTCFG[omega3] MEMBERULIST=johnh,stevek,jenp
Example 1: Controlling Partition Access on a Per User Basis
A site may specify the user john may access partitions atlas, pluto, and zeus and will default to partition pluto. To do this, include the following line in the configuration file:
USERCFG[john] PLIST=atlas,pluto,zeus PDEF=pluto
Example 2: Controlling QoS Access on a Per Group Basis
A site may also choose to allow everyone in the group staff to access QoS standard and special with a default QoS of standard. To do this, include the following line in the configuration file:
GROUPCFG[staff] QLIST=standard,special QDEF=standard
Example 3: Controlling Resource Access on a Per Account Basis
An organization wants to allow everyone in the account omega3 to access nodes 20 through 24. To do this, include the following in the configuration file:
Full statistics are maintained for each credential instance. These statistics record current and historical resource usage, level of service delivered, accuracy of requests, and many other aspects of workload. Note, though, that you must explicitly enable credential statistics as they are not tracked by default. You can enable credential statistics by including the following in the configuration file:
USERCFG[DEFAULT] ENABLEPROFILING=TRUE GROUPCFG[DEFAULT] ENABLEPROFILING=TRUE ACCOUNTCFG[DEFAULT] ENABLEPROFILING=TRUE CLASSCFG[DEFAULT] ENABLEPROFILING=TRUE QOSCFG[DEFAULT] ENABLEPROFILING=TRUE
Credentials may apply defaults and force job configuration settings via the following parameters:
|Description:||Associates a comment string with the target credential.|
USERCFG[steve] COMMENT='works for boss, provides good service' CLASSCFG[i3] COMMENT='queue for I/O intensive workload'
|Description:||Specifies a hold should be placed on all jobs associated with the target credential.
|Description:||Assigns the specified job flag to all jobs with the associated credential.|
CLASSCFG[batch] JOBFLAGS=suspendable QOSCFG[special] JOBFLAGS=restartable
|Description:||Specifies whether jobs belonging to this credential can submit jobs using msub.|
ACCOUNTCFG[general] NOSUBMIT=TRUE CLASSCFG[special] NOSUBMIT=TRUE
|Description:||Specifies the amount of time a job may exceed its wallclock limit before being terminated. (Only applies to user and class credentials.)|
|Description:||Specifies attribute-value pairs associated with the specified credential. These variables may be used in triggers and other interfaces to modify system behavior.|
Credentials may carry additional configuration information. They may specify that detailed statistical profiling should occur, that submitted jobs should be held, or that corresponding jobs should be marked as preemptible.
The user credential is the fundamental credential within a workload manager; each job requires an association with exactly one user. In fact, the user credential is the only required credential in Moab; all others are optional. In most cases, the job's user credential is configured within or managed by the operating system itself, although Moab may be configured to obtain this information from an independent security and identity management service.
As the fundamental credential, the user credential has a number of unique attributes.
Moab supports role-based authorization, mapping particular roles to collections of specific users. See the Security section for more information.
Facilities exist to allow user notification in the event of job or system failures or under other general conditions. This attribute allows these notifications to be mailed directly to the target user.
You can disable Moab email notifications for a specific user.
The group credential represents an aggregation of users. User-to-group mappings are often specified by the operating system or resource manager and typically map to a user's Unix group ID. However, user-to-group mappings may also be provided by a security and identity management service, or you can specify such directly within Moab.
With many resource managers such as TORQUE, PBSPro, and LSF, the group associated with a job is either the user's active primary group as specified within the operating system or a group that is explicitly requested at job submission time. When a secondary group is requested, the user's default group and associated policies are not taken into account. Also note that a job may only run under one group. If more constraining policies are required for these systems, an alternate aggregation scheme such as the use of Account or QOS credentials is recommended.
To submit a job as a secondary group, refer to your local resource manager's job submission options. For TORQUE users, see the group_list=g_list option of the qsub -W command.
The account credential is also referred to as the project. This credential is generally associated with a group of users along the lines of a particular project for accounting and billing purposes. User-to-accounting mapping may be obtained from a resource manager or allocation manager, or you can configure it directly within Moab. Access to an account can be controlled via the ALIST and ADEF credential attributes specified via the Identity Manager or the moab.cfg file.
The MANAGERS attribute (applicable only to the account and class credentials) allows an administrator to assign a user the ability to manage jobs inside the credential, as if the user is the job owner.
ACCOUNTCFG[general] MANAGERS=ops ACCOUNTCFG[special] MANAGERS=stevep
If a user is able to access more than one account, the desired account can be specified at job submission time using the resource-manager specific attribute. For example, with TORQUE this is accomplished using the -A argument to the qsub command.
Job-to-account mapping can be enforced using the ALIST attribute and the ENFORCEACCOUNTACCESS parameter.
USERCFG[john] ALIST=proj1,proj3 USERCFG[steve] ALIST=proj2,proj3,proj4 USERCFG[brad] ALIST=proj1 USERCFG[DEFAULT] ALIST=proj2 ENFORCEACCOUNTACCESS TRUE ...
The concept of the class credential is derived from the resource manager class or queue object. Classes differ from other credentials in that they more directly impact job attributes. In standard HPC usage, a user submits a job to a class and this class imposes a number of factors on the job. The attributes of a class may be specified within the resource manager or directly within Moab. Class attributes include the following:
|When using SLURM, Moab classes have a one-to-one relationship with SLURM partitions of the same name.|
|For all classes configured in Moab, a resource manager queue with the same name should be created.|
Classes can be assigned to a default job template that can apply values to job attributes not explicitly specified by the submitter. Additionally, you can specify shortcut attributes from the table that follows:
|DEFAULT.DISK||Required Disk (in MB)|
|DEFAULT.EXT||Job RM Extension|
|DEFAULT.FEATURES||Required Node Features/Properties|
|DEFAULT.GRES||Required Consumable Generic Resources|
|DEFAULT.MEM||Required Memory/RAM (in MB)|
|DEFAULT.NODE||Required Node Count|
|DEFAULT.NODESET||Node Set Specification|
|DEFAULT.PROC||Required Processor Count|
|DEFAULT.TPN||Tasks Per Node|
|Defaults set in a class/queue of the resource manager will override the default values of the corresponding class/queue specified in Moab.|
|RESOURCELIMITPOLICY must be configured in order for the CLASSCFG limits to take effect.|
CLASSCFG[batch] DEFAULT.DISK=200MB DEFAULT.FEATURES=prod DEFAULT.WCLIMIT=1:00:00 CLASSCFG[debug] DEFAULT.FEATURES=debug DEFAULT.WCLIMIT=00:05:00
Classes can be assigned a minimum and a maximum job template that constrains resource requests. Jobs submitted to a particular queue must meet the resource request constraints of these templates.
|MAX.CPUTIME||Max Allowed Utilized CPU Time|
|MAX.NODE||Max Allowed Node Count|
|MAX.PROC||Max Allowed Processor Count|
|MAX.PS||Max Requested Processor-Seconds|
|MIN.NODE||Min Allowed Node Count|
|MIN.PROC||Min Allowed Processor Count|
|MIN.PS||Min Requested Processor-Seconds|
|MIN.TPN||Min Tasks Per Node|
|MIN.WCLIMIT||Min Requested Wallclock Limit|
|MAX.WCLIMIT||Max Requested Wallclock Limit|
|The parameters listed in the preceding table are for classes only, and they function on a per-job basis. The MAX.* and MIN.* parameters are different from the MAXJOB, MAXNODE, and MAXMEM parameters described earlier in Credential Usage Limits.|
Classes may be associated with a particular set of compute resources. Consequently, jobs submitted to a given class may only use listed resources. This may be handled at the resource manager level or via the CLASSCFG HOSTLIST attribute.
Classes may be configured at either the resource manager or scheduler level to only allow select users and groups to access them. Jobs that do not meet these criteria are rejected. If specifying class membership/access at the resource manager level, see the respective resource manager documentation. Moab automatically detects and enforces these constraints. If specifying class membership/access at the scheduler level, use the REQUIREDUSERLIST or EXCLUDEUSERLIST attributes of the CLASSCFG parameter.
|Under most resource managers, jobs must always be a member of one and only one class.|
Users listed via the MANAGERS parameter are granted full control over all jobs submitted to or running within the specified class.
# allow john and steve to cancel and modify all jobs submitted to the class/queue special CLASSCFG[special] MANAGERS=john,steve
In particular, a class manager can perform the following actions on jobs within a class/queue:
The JOBPROLOG class performs a function similar to the resource manager level job prolog feature; however, there are some key differences:
The JOBPROLOG class attribute allows a site to specify a unique per-class action to take before a job is allowed to start. This can be used for environmental provisioning, pre-execution resource checking, security management, and other functions. Sample uses may include enabling a VLAN, mounting a global file system, installing a new application or virtual node image, creating dynamic storage partitions, or activating job specific software services.
|A prolog is considered to have failed if it returns a negative number. If a prolog fails, the associated job will not start.|
|If a prolog executes successfully, the associated epilog is guaranteed to start, even if the job fails for any reason. This allows the epilog to undo any changes made to the system by the prolog.|
Job Prolog Examples
# explicitly specify prolog arguments for special epilog CLASSCFG[special] JOBPROLOG='$TOOLSDIR/specialprolog.pl $JOBID $HOSTLIST' # use default prolog arguments for batch prolog CLASSCFG[batch] JOBPROLOG=$TOOLSDIR/batchprolog.pl
The Moab epilog is nearly identical to the prolog in functionality except that it runs after the job completes within the resource manager but before the scheduler releases the allocated resources for use by subsequent jobs. It is commonly used for job clean-up, file transfers, signaling peer services, and undoing other forms of resource customization.
|An epilog is considered to have failed if it returns a negative number. If an epilog fails, the associated job will be annotated and a message will be sent to administrators.|
Job triggers can be directly associated with jobs submitted into a class using the JOBTRIGGER attribute. Job triggers are described using the standard trigger description language specified in the Trigger overview section. In the example that follows, users submitting jobs to the class debug will be notified with a descriptive message anytime their job is preempted.
This policy allows specification of the action to take on a per-class basis when a failure occurs on a node allocated to an actively running job. See the Node Availability Overview for more information.
You can disable allocation management for jobs in specific classes by setting the DISABLEAM class attribute to FALSE. For all jobs outside of the specified classes, allocation enforcement will continue to be enforced.
# do not enforce allocations on low priority and debug jobs CLASSCFG[lowprio] DISABLEAM=TRUE CLASSCFG[debug] DISABLEAM=TRUE
In many cases, end-users do not want to be concerned with specifying a job class/queue. This is often handled by defining a default class. Whenever a user does not explicitly submit a job to a particular class, a default class, if specified, is used. In resource managers such as TORQUE, this can be done at the resource manager level and its impact is transparent to the scheduler. The default class can also be enabled within the scheduler on a per resource manager or per user basis. To set a resource manager default class within Moab, use the DEFAULTCLASS attribute of the RMCFG parameter. For per user defaults, use the CDEF attribute of the USERCFG parameter.
If a single default class is not adequate, Moab provides more flexible options with the REMAPCLASS parameter. If this parameter is set and a job is submitted to the remap class, Moab attempts to determine the final class to which a job belongs based on the resources requested. If a remap class is specified, Moab compares the job's requested nodes, processors, memory, and node features with the class's corresponding minimum and maximum resource limits. Classes are searched in the order in which they are defined; when the first match is found, Moab assigns the job to that class. In the example that follows, a job requesting 4 processors and the node feature fast are assigned to the class quick.
# jobs submitted to 'batch' should be remapped REMAPCLASS batch # stevens only queue CLASSCFG[stevens] REQ.FEATURES=stevens REQUIREDUSERLIST=stevens,stevens2 # special queue for I/O nodes CLASSCFG[io] MAX.PROC=8 REQ.FEATURES=io # general access queues CLASSCFG[quick] MIN.PROC=2 MAX.PROC=8 REQ.FEATURES=fast|short CLASSCFG[medium] MIN.PROC=2 MAX.PROC=8 CLASSCFG[DEFAULT] MAX.PROC=64 ...
The following parameters can be used to remap jobs to different classes:
If the parameter REMAPCLASSLIST is set, then only the listed classes are searched and they are searched in the order specified by this parameter. If none of the listed classes are valid for a particular job, that job retains its original class.
|The remap class only works with resource managers that allow dynamic modification of a job's assigned class/queue. Also note that OpenPBS and TORQUE 1.x support dynamic job queue modification, but this change is not persistent and will be lost if pbs_server is restarted.|
|If default credentials are specified on a remap class, a job submitted to that class will inherit those credentials. If the destination class has different defaults credentials, the new defaults override the original settings. If the destination class does not have default credentials, the job maintains the defaults inherited from the remap class.|
The following table enumerates the different parameters for CLASSCFG:
|Description:||One or more comma-delimited generic job attributes.|
|Description:||Default amount of requested disk space.|
|Description:||Default job RM extension.|
|Format:||Comma-delimited list of features.|
|Description:||Default list of requested node features (a.k.a, node properties). This only applies to compute resource reqs.|
|Description:||Default list of per task required consumable generic resources.|
|Format:||<INTEGER> (in MB)|
|Description:||Default amount of requested memory.|
|Description:||Default required node count.|
|Description:||Default node set.|
|Description:||Default number of requested processors.|
|Description:||Default number of tasks per node.|
|Description:||Default wallclock limit.|
|Format:||Comma- or pipe-delimited list of node features.|
|Description:||Set of excluded (disallowed) features. If delimited by commas, reject job if all features are requested; if delimited by the pipe symbol (|), reject job if at least one feature is requested.|
|Format:||Comma-delimited list of job flags.|
|Description:||Set of excluded (disallowed) job flags. Reject job if any listed flags are set.|
|Format:||Comma-delimited list of users.|
|Description:||List of users not permitted access to class.
|Format:||one of SINGLETASK, SINGLEJOB, SINGLEUSER, or SHARED|
|Description:||Node access policy associated with queue. If set, this value overrides any per job settings specified by the user at the job level. (See Node Access Policy overview for more information.)|
|Description:||See fairshare policies specification.|
|Description:||See fairshare policies specification.|
|Format:||Host expression, or comma-delimited list of hosts or host ranges.|
|Description:||List of hosts associated with a class. If specified, Moab constrains the availability of a class to only nodes listed in the class host list.|
|Description:||Scheduler level job epilog to be run after job is completed by resource manager. (See special class attributes.)|
|Format:||Comma-delimited list of job flags.|
|Description:||See the flag overview for a description of legal flag values.|
|Description:||Scheduler level job prolog to be run before job is started by resource manager. (See special class attributes.)|
|Description:||Scheduler level job trigger to be associated with jobs submitted to this class. (See special class attributes.)|
|Description:||Users allowed to control, cancel, preempt, and modify jobs within class/queue. (See special class attributes.)|
|Description:||Maximum number of jobs allowed in the class.|
|Description:||Maximum number of processors requested per node.|
|Description:||Maximum allowed utilized CPU time.|
|Description:||Maximum number of requested nodes per job. (Also used when REMAPCLASS is set to correctly route the job.)|
|Description:||Maximum number of requested processors per job. (Also used when REMAPCLASS is set to correctly route the job.)|
|Description:||Maximum requested processor-seconds.|
|Description:||Maximum allowed wallclock limit per job. (Also used when REMAPCLASS is set to correctly route the job.)|
|Description:||Minimum number of requested nodes per job. (Also used when REMAPCLASS is set to correctly route the job.)|
|Description:||Minimum number of requested processors per job. (Also used when REMAPCLASS is set to correctly route the job.)|
|Description:||Minimum requested processor-seconds.|
|Description:||Minimum required tasks per node per job. (Also used when REMAPCLASS is set to correctly route the job.)|
|Description:||Minimum required wallclock limit per job. (Also used when REMAPCLASS is set to correctly route the job.)|
|Format:||one of SINGLETASK, SINGLEJOB, SINGLEUSER, or SHARED|
|Description:||Default node access policy associated with queue. This value will be overridden by any per job settings specified by the user at the job level. (See Node Access Policy overview.)|
|Description:||Partition name where jobs associated with this class must run.|
|Description:||Priority associated with the class. (See Priority overview.)|
|Description:||Default QoS for jobs submitted to this class.|
|Description:||List of accessible QoS's for jobs submitted to this class.|
CLASSCFG[batch] QDEF=base QLIST=base,fast,special,bigio
|Format:||Comma- or pipe-delimited list of node features.|
|Description:||Set of required features. If delimited by commas, all features are required; if delimited by the pipe symbol (|), at least one feature is required.|
|Format:||REQ.FLAGS can be used with only the INTERACTIVE flag.|
|Description:||Sets the INTERACTIVE flag on jobs in this class.|
|Format:||Comma-delimited list of accounts.|
List of accounts allowed to access and use a class (analogous to *LIST for other credentials).
|Format:||Comma-delimited list of users.|
|Description:||List of users allowed to access and use a class (analogous to *LIST for other credentials).
|Format:||Comma-delimited list of QoS's|
|Description:||List of QoS's allowed to access and use a class (analogous to *LIST for other credentials).
|Description:||List of resource managers that can (or cannot) view or access the class. By default, all resource managers can view and access all queues/classes. If this attribute is specified, only listed resource managers can see the associated queue. If an exclamation point character (!) is specified in the value, then access is granted to all resource managers who are not listed. This feature is most commonly used in grid environments.|
|Description:||Value of system priority applied to every job submitted to this class.|
|Description:||Tolerated amount of time beyond the specified wallclock limit.|
Queue complexes allow an organization to build a hierarchy of queues and apply certain limits and rules to collections of these queues. Moab supports this functionality in two ways. The first way, queue mapping, is very simple but limited in functionality. The second method provides very rich functionality but requires more extensive configuration using the Moab hierarchical fairshare facility.
Queue mapping allows collections of queues to be mapped to a parent credential object against which various limits and policies can be applied, as in the following example.
QOSCFG[general] MAXIJOB[USER]=14 PRIORITY=20 QOSCFG[prio] MAXIJOB[USER]=8 PRIORITY=2000 # group short, med, and long jobs into 'general' QOS CLASSCFG[short] QDEF=general FSTARGET=30 CLASSCFG[med] QDEF=general FSTARGET=40 CLASSCFG[long] QDEF=general FSTARGET=30 MAXPROC=200 # group interactive and debug jobs into 'prio' QOS CLASSCFG[inter] QDEF=prio CLASSCFG[debug] QDEF=prio CLASSCFG[premier] PRIORITY=10000
The concept of a quality of service (QoS) credential is unique to Moab and is not derived from any underlying concept or peer service. In most cases, the QoS credential is used to allow a site to set up a selection of service levels for end-users to choose from on a long-term or job-by-job basis. QoS's differ from other credentials in that they are centered around special access where this access may allow use of additional services, additional resources, or improved responsiveness. Unique to this credential, organizations may also choose to apply different charge rates to the varying levels of service available within each QoS. As QoS is an internal credential, all QoS configuration occurs within Moab.
QoS access and QoS defaults can be mapped to users, groups, accounts, and classes, allowing limited service offering for key users. As mentioned, these services focus around increasing access to special scheduling capabilities & additional resources and improving job responsiveness. At a high level, unique QoS attributes can be broken down into the following:
All credentials allow specification of job limits. In such cases, jobs are constrained by the most limiting of all applicable policies. With QoS override limits, however, jobs are limited by the override, regardless of other limits specified.
Service targets cause the scheduler to take certain job-related actions as various responsiveness targets are met. Targets can be set for either job queue time or job expansion factor and cause priority adjustments, reservation enforcement, or preemption activation. In strict service centric organizations, Moab can be configured to trigger various events and notifications in the case of failure by the cluster to meet responsiveness targets.
QoS's can provide access to special capabilities. These capabilities include preemption, job deadline support, backfill, next to run priority, guaranteed resource reservation, resource provisioning, dedicated resource access, and many others. See the complete list in the QoS Facility Overview section.
Associated with the QoS's many privileges is the ability to assign end-users costs for the use of these services. This charging can be done on a per-QoS basis and may be specified for both dedicated and use-based resource consumption. The Per QoS Charging section covers more details on QoS level costing configuration while the Charging and Allocation Management section provides more details regarding general single cluster and multi-cluster charging capabilities.
QoS access control can be enabled on a per QoS basis using the MEMBERULIST attribute or specified on a per-requestor basis using the QDEF and QLIST attributes of the USERCFG, GROUPCFG, ACCOUNTCFG, and CLASSCFG parameters. See Managing QoS Access for more detail.