(Click to open topic with navigation)
Partitions are a logical construct that divide available resources. Any single resource (compute node) may only belong to a single partition. Often, natural hardware or resource manager bounds delimit partitions such as in the case of disjoint networks and diverse processor configurations within a cluster. For example, a cluster may consist of 256 nodes containing four 64 port switches. This cluster may receive excellent interprocess communication speeds for parallel job tasks located within the same switch but sub-stellar performance for tasks that span switches. To handle this, the site may choose to create four partitions, allowing jobs to run within any of the four partitions but not span them.
While partitions do have value, it is important to note that within Moab, the standing reservation facility provides significantly improved flexibility and should be used in the vast majority of politically motivated cases where partitions may be required under other resource management systems. Standing reservations provide time flexibility, improved access control features, and more extended resource specification options. Also, another Moab facility called Node Sets allows intelligent aggregation of resources to improve per job node allocation decisions. In cases where system partitioning is considered for such reasons, node sets may be able to provide a better solution.
Still, one key advantage of partitions over standing reservations and node sets is the ability to specify partition specific policies, limits, priorities, and scheduling algorithms although this feature is rarely required. An example of this need may be a cluster consisting of 48 nodes owned by the Astronomy Department and 16 nodes owned by the Mathematics Department. Each department may be willing to allow sharing of resources but wants to specify how their partition will be used. As mentioned, many of Moab's scheduling policies may be specified on a per partition basis allowing each department to control the scheduling goals within their partition.
The partition associated with each node should be specified as indicated in the Node Location section. With this done, partition access lists may be specified on a per job or per QoS basis to constrain which resources a job may have access to. (See the QoS Overview for more information.) By default, QoSes and jobs allow global partition access. Note that by default, a job may only use resources within a single partition.
If no partition is specified, Moab creates one partition per resource manager into which all resources corresponding to that resource manager are placed. (This partition is given the same name as the resource manager.)
A partition may not span multiple resource managers. In addition to these resource manager partitions, a pseudo-partition named " [ALL]" is created that contains the aggregate resources of all partitions.
While the resource manager partitions are real partitions containing resources not explicitly assigned to other partitions, the " [ALL]" partition is only a convenience object and is not a real partition; thus it cannot be requested by jobs or included in configuration ACLs.
Node to partition mappings can be established directly using the NODECFG parameter or indirectly using the FEATUREPARTITIONHEADER parameter. If using direct mapping, this is accomplished as shown in the example that follows.
NODECFG[node001] PARTITION=astronomy NODECFG[node002] PARTITION=astronomy ... NODECFG[node049] PARTITION=math ...
By default, Moab creates two partitions, "DEFAULT" and "[ALL]." These are used internally, and consume spots in the 31-partition maximum defined in the MMAX_PAR parameter. If more partitions are needed, you can adjust the maximum partition count. See Adjusting Default Limits for information on increasing the maximum number of partitions.
Partition access can be constrained by credential ACLs and by limits based on job resource requirements.
Determining who can use which partition is specified using the *CFG parameters (USERCFG, GROUPCFG, ACCOUNTCFG, QOSCFG, CLASSCFG, and SYSCFG). These parameters allow you to select a partition access list on a credential or system wide basis using the PLIST attribute. By default, the access associated with any given job is the logical OR of all partition access lists assigned to the job's credentials.
For example, assume a site with two partitions, general, and test. The site management would like everybody to use the general partition by default. However, one user, Steve, needs to perform the majority of his work on the test partition. Two special groups, staff and management will also need access to use the test partition from time to time but will perform most of their work in the general partition. The following example configuration enables the needed user and group access and defaults for this site:
SYSCFG[base] PLIST=general:test USERCFG[DEFAULT] PLIST=general USERCFG[steve] PLIST=general:test GROUPCFG[staff] PLIST=general:test GROUPCFG[mgmt] PLIST=general:test
While using a logical OR approach allows sites to add access to certain jobs, some sites prefer to work the other way around. In these cases, access is granted by default and certain credentials are then restricted from accessing various partitions. To use this model, a system partition list must be specified as in the following example:
SYSCFG[base] PLIST=general,test& USERCFG[demo] PLIST=test& GROUPCFG[staff] PLIST=general&
In the preceding example, note the ampersand (&). This character, which can be located anywhere in the PLIST line, indicates that the specified partition list should be logically ANDed with other partition access lists. In this case, the configuration limits jobs from user demo to running in partition test and jobs from group staff to running in partition general. All other jobs are allowed to run in either partition.
When using AND-based partition access lists, the base system access list must be specified with SYSCFG.
Access to partitions can be constrained based on the resources requested on a per job basis with limits on both minimum and maximum resources requested. All limits are specified using PARCFG. See Usage Limits for more information on the available limits.
PARCFG[amd] MAX.PROC=16 PARCFG[pIII] MAX.WCLIMIT=12:00:00 MIN.PROC=4 PARCFG[aix] MIN.NODE=12
Users may request to use any partition they have access to on a per job basis. This is accomplished using the resource manager extensions since most native batch systems do not support the partition concept. For example, on a TORQUE system, a job submitted by a member of the group staff could request that the job run in the test partition by adding the line -l partition=test to the qsub command line. See the resource manager extension overview for more information on configuring and using resource manager extensions.
The following settings can be specified on a per-partition basis using the PARCFG parameter:
Setting | Description |
---|---|
DEFAULTNODEFEATURES | Specifies a default feature on a group of node within a partition and applies only to nodes in that partition. |
JOBNODEMATCHPOLICY | Specifies the JOBNODEMATCHPOLICY to be applied to jobs that run in the specified partition. |
NODEACCESSPOLICY | Specifies the NODEACCESSPOLICY to be applied to jobs that run in the specified partition. |
NODEALLOCATIONPOLICY | Specifies the NODEALLOCATIONPOLICY to be applied to jobs that run in the specified partition. |
USETTC | Specifies whether TTC specified at submission should be used and displayed by the scheduler. |
VMCREATEDURATION | Specifies the maximum amount of time VM creation can take before Moab considers it a failure (in [HH[:MM[:SS]). If no value is set, there is no maximum limit. |
VMDELETEDURATION | Specifies the maximum amount of time VM deletion can take before Moab considers it a failure (in [HH[:MM[:SS]). If no value is set, there is no maximum limit. |
VMMIGRATEDURATION | Specifies the maximum amount of time VM migration can take before Moab considers it a failure (in [HH[:MM[:SS]). If no value is set, there is no maximum limit. |
A brief caution: Use of partitions has been quite limited in recent years as other, more effective approaches are selected for site scheduling policies. Consequently, some aspects of partitions have received only minor testing. Still, note that partitions are fully supported and any problem found will be rectified.
Related topics