12.2 Node Attributes

12.2.1 Configurable Node Attributes

Nodes can possess a large number of attributes describing their configuration which are specified using the NODECFG parameter. The majority of these attributes such as operating system or configured network interfaces can only be specified by the direct resource manager interface. However, the number and detail of node attributes varies widely from resource manager to resource manager. Sites often have interest in making scheduling decisions based on scheduling attributes not directly supplied by the resource manager. Configurable node attributes are listed in the following table; click an attribute for more detailed information:
ACCESS
ALLOCATIONLIMITS
ARCH
CHARGERATE
COMMENT
ENABLEPROFILING
FEATURES
FLAGS
GENERICTHRESHOLDS
GRES
LOGLEVEL
MAXIOIN
MAXJOB
MAXJOBPERUSER
MAXPE
MAXPROC
NETWORK
NODEINDEX
NODETYPE
OS
OSLIST
OVERCOMMIT
PARTITION
POOL
POWERPOLICY
PREEMPTMAXCPULOAD
PREEMPTMINMEMAVAIL
PREEMPTPOLICY
PRIORITY
PRIORITYF

PROCSPEED
PROVRM
RACK
RADISK
RCDISK
RCMEM
RCPROC
RCSWAP
SIZE
SLOT
SPEED
TRIGGER
VARIABLE
UTILIZATIONTHRESHOLD


AttributeDescription

Specifies the node access policy that can be one of SHARED, SHAREDONLY, SINGLEJOB, SINGLETASK, or SINGLEUSER. See Node Access Policies for more details.

NODECFG[node013] ACCESS=singlejob

Specifies the high-water limit for over-allocation of processors or memory on a hypervisor. This setting is used to protect hypervisors from having too many VMs placed on them, regardless of the utilization level of those VMs. Possible attributes include DISK, MEM, PROC, and SWAP. Usage is <attr>:<integer>.

NODECFG[node012] ALLOCATIONLIMITS=PROC:2,MEM:4			
Specifies the node's processor architecture.
NODECFG[node013] ARCH=opteron

Allows a site to assign specific charging rates to the usage of particular resources. The CHARGERATE value may be specified as a floating point value and is integrated into a job's total charge (as documented in the Charging and Allocation Management section).

NODECFG[DEFAULT] CHARGERATE=1.0
NODECFG[node003] CHARGERATE=1.5
NODECFG[node022] CHARGERATE=2.5

Allows an organization to annotate a node via the configuration file to indicate special information regarding this node to both users and administrators. The COMMENT value may be specified as a quote delimited string as shown in the example that follows. Comment information is visible using checknode, mdiag, Moab Cluster Manager, and Moab Access Portal.

NODECFG[node013] COMMENT="Login Node"

Allows an organization to track node state over time. This information is available using showstats -n.

NODECFG[DEFAULT] ENABLEPROFILING=TRUE

Not all resource managers allow specification of opaque node features (also known as node properties). For these systems, the NODECFG parameter can be used to directly assign a list of node features to individual nodes. To set/overwrite a node's features, use FEATURES=<X>; to append node features, use FEATURES+=<X>.

NODECFG[node013] FEATURES+=gpfs,fastio
NoteThe total number of supported node features is limited as described in the Adjusting Default Limits section.
NoteIf supported by the resource manager, the resource manager specific manner of requesting node features/properties within a job may be used. (Within TORQUE, use qsub -l nodes=<NODECOUNT>:<NODEFEATURE>.) However, if either not supported within the resource manager or if support is limited, the Moab feature resource manager extension may be used.

Specifies various attributes of the NODECFG parameter.

The NoVMMigrations flag excludes this hypervisor from VM auto-migrations. This means that VMs cannot automatically migrate to or from this hypervisor while this flag is set.

NODECFG[node1] FLAGS=NoVMMigrations			

To allow VMs to resume migrating, remove this flag and restart Moab or use a resource manager to unset it.

Specifies the high-water threshold for generic metric values on a server. When a generic metric value goes above its configured threshold, Moab will attempt to migrate VMs off of the hypervisor until the metric falls below the threshold.

NODECFG[node013] GENERICTHRESHOLDS=GMETRIC:<NAME1>:<VALUE1>,GMETRIC:<NAME2>:<VALUE2>	

Many resource managers do not allow specification of consumable generic node resources. For these systems, the NODECFG parameter can be used to directly assign a list of consumable generic attributes to individual nodes or to the special pseudo-node global, which provides shared cluster (floating) consumable resources. To set/overwrite a node's generic resources, use GRES=<NAME>[:<COUNT>]. (See Managing Consumable Generic Resources.)

NODECFG[node013] GRES=quickcalc:20
Node specific loglevel allowing targetted log facility verbosity.
Maximum input allowed on node before it is marked busy.
See Node Policies for details.
See Node Policies for details.
See Node Policies for details.

Maximum allowed Processor Equivalent per job on this node. A job will not be allowed to run on this node if its PE exceeds this number.

NODECFG[node024] MAXPEPERJOB=10000
...

Maximum dedicated processors allowed on this node. No jobs are scheduled on this node when this number is reached. See Node Policies for more information.

NODECFG[node024] MAXPROC=8
...

The ability to specify which networks are available to a given node is limited to only a few resource managers. Using the NETWORK attribute, administrators can establish this node to network connection directly through the scheduler. The NODECFG parameter allows this list to be specified in a comma-delimited list.

NODECFG[node024] NETWORK=GigE
...
The node's index. See Node Location for details.

The NODETYPE attribute is most commonly used in conjunction with an allocation management system such as Gold. In these cases, each node is assigned a node type and within the allocation management system, each node type is assigned a charge rate. For example, a site administrator may want to charge users more for using large memory nodes and may assign a node type of BIGMEM to these nodes. The allocation management system would then charge a premium rate for jobs using BIGMEM nodes. (See the Allocation Manager Overview for more information.)

Node types are specified as simple strings. If no node type is explicitly set, the node will possess the default node type of [DEFAULT]. Node type information can be specified directly using NODECFG or through use of the FEATURENODETYPEHEADER parameter.

NODECFG[node024] NODETYPE=BIGMEM

This attribute specifies the node's operating system.

NODECFG[node013] OS=suse10
NoteBecause the TORQUE operating system overwrites the Moab operating system, change the operating system with opsys instead of OSif you are using TORQUE.

This attribute specifies the list of operating systems the node can run.

NODECFG[compute002] OSLIST=linux,windows
See Node Location for details.
Specifies the associated node pool.
The POWERPOLICY can be set to OnDemand or STATIC. It defaults to STATIC if not set. If set to STATIC, Moab will never automatically change the power status of a node. If set to OnDemand, Moab will turn the machine off and on based on workload and global settings. See Green Computing for further details.

If the node CPU load exceeds the specified value, any batch jobs running on the node are preempted using the preemption policy specified with the node's PREEMPTPOLICY attribute. If this attribute is not specified, the global default policy specified with PREEMPTPOLICY parameter is used. See Sharing Server Resources for further details.

NODECFG[node024] PRIORITY=-150 COMMENT="NFS Server Node"
NODECFG[node024] PREEMPTPOLICY=CANCEL PREEMPTMAXCPULOAD=1.2
...

If the available node memory drops below the specified value, any batch jobs running on the node are preempted using the preemption policy specified with the node's PREEMPTPOLICY attribute. If this attribute is not specified, the global default policy specified with PREEMPTPOLICY parameter is used. See Sharing Server Resources for further details.

NODECFG[node024] PRIORITY=-150 COMMENT="NFS Server Node"
NODECFG[node024] PREEMPTPOLICY=CANCEL PREEMPTMINMEMAVAIL=1.2
...

If any node preemption policies are triggered (such as PREEMPTMAXCPULOAD, or PREEMPTMINMEMAVAIL) any batch jobs running on the node are preempted using this preemption policy if specified. If not specified, the global default preemption policy specified with PREEMPTPOLICY parameter is used. See Sharing Server Resources for further details.

NODECFG[node024] PRIORITY=-150 COMMENT="NFS Server Node"
NODECFG[node024] PREEMPTPOLICY=CANCEL PREEMPTMAXCPULOAD=1.2
...

The PRIORITY attribute specifies the fixed node priority relative to other nodes. It is only used if NODEALLOCATIONPOLICY is set to PRIORITY. The default node priority is 0. A default cluster-wide node priority may be set by configuring the PRIORITY attribute of the DEFAULT node. See Priority Node Allocation for more details.

NODEALLOCATIONPOLICY  PRIORITY
NODECFG[node024] PRIORITY=120
...

The PRIORITYF attribute specifies the function to use when calculating a node's allocation priority specific to a particular job. It is only used if NODEALLOCATIONPOLICY is set to PRIORITY. The default node priority function sets a node's priority exactly equal to the configured node priority. The priority function allows a site to indicate that various environmental considerations such as node load, reservation affinity, and ownership be taken into account as well using the following format:

<COEFFICIENT> * <ATTRIBUTE> [ + <COEFFICIENT> * <ATTRIBUTE> ]...

<ATTRIBUTE> is an attribute from the table found in the Priority Node Allocation section.

A default cluster-wide node priority function may be set by configuring the PRIORITYF attribute of the DEFAULT node. See Priority Node Allocation for more details.

NODEALLOCATIONPOLICY  PRIORITY
NODECFG[node024] PRIORITYF='SPEED + .01 * AMEM - 10 * JOBCOUNT'
...

Knowing a node's processor speed can help the scheduler improve intra-job efficiencies by allocating nodes of similar speeds together. This helps reduce losses due to poor internal job load balancing. Moab's Node Set scheduling policies allow a site to control processor speed based allocation behavior.

Processor speed information is specified in MHz and can be indicated directly using NODECFG or through use of the FEATUREPROCSPEEDHEADER parameter.

Provisioning resource managers can be specified on a per node basis. This allows flexibility in mixed environents. If the node does not have a provisioning resource manager, the default provisioning resource manager will be used. The default is always the first one listed in moab.cfg.

RMCFG[prov] TYPE=NATIVE RESOURCETYPE=PROV
RMCFG[prov] PROVDURATION=10:00
RMCFG[prov] NODEMODIFYURL=exec://$HOME/tools/os.switch.pl
...
NODECFG[node024] PROVRM=prov
The rack associated with the node's physical location. Valid values range from 1 to 400. See Node Location for details.
Jobs can request a certain amount of disk space through the RM Extension String'sDDISK parameter. When done this way, Moab can track the amount of disk space available for other jobs. To set the total amount of disk space available the RADISK parameter is used.
Jobs can request a certain amount of disk space (in MB) through the RM Extension String'sDDISK parameter. When done this way, Moab can track the amount of disk space available for other jobs. The RCDISK attribute constrains the amount of disk reported by a resource manager while the RADISK attribute specifies the amount of disk available to jobs. If the resource manager does not report available disk, the RADISK attribute should be used.

Jobs can request a certain amount of real memory (RAM) in MB through the RM Extension String'sDMEM parameter. When done this way, Moab can track the amount of memory available for other jobs. The RCMEM attribute constrains the amount of RAM reported by a resource manager while the RAMEM attribute specifies the amount of RAM available to jobs. If the resource manager does not report available disk, the RAMEMattribute should be used.

Please note that memory reported by the resource manager will override the configured value unless a trailing caret (^) is used.

NODECFG[node024] RCMEM=2048
...

If the resource manager does not report any memory, then Moab will assign node024 2048 MB of memory.

NODECFG[node024] RCMEM=2048^
...

Moab will assign 2048 MB of memory to node024 regardless of what the resource manager reports.

The RCPROC specifies the number of processors available on a compute node.

NODECFG[node024] RCPROC=8
...

Jobs can request a certain amount of swap space in MB.

NoteRCSWAP works similarly to RCMEM. Setting RCSWAP on a node will set the swap but can be overridden by swap reported by the resource manager. If the trailing caret (^) is used, Moab will ignore the swap reported by the resource manager and use the configured amount.
NODECFG[node024] RCSWAP=2048
...

If the resource manager does not report any memory, Moab will assign node024 2048 MB of swap.

NODECFG[node024] RCSWAP=2048^
...

Moab will assign 2048 MB of swap to node024 regardless of what the resource manager reports.

The number of slots or size units consumed by the node. This value is used in graphically representing the cluster using showstate or Moab Cluster Manager. See Node Location for details. For display purposes, legal size values include 1, 2, 3, 4, 6, 8, 12, and 16.

NODECFG[node024] SIZE=2
...
The first slot in the rack associated with the node's physical location. Valid values range from 1 to MMAX_RACKSIZE (default=64). See Node Location for details.

A node's speed is very similar to its processor speed but is specified as a relative value. In general use, the speed of a base node is determined and assigned a speed of 1.0. A node that is 50% faster would be assigned a value of 1.5 while a slower node may receive a value that is proportionally less than 1.0. Node speeds do not have to be directly proportional to processor speeds and may take into account factors such as memory size or networking interface. Generally, node speed information is used to determine proper wallclock limit and CPU time scaling adjustments.

Node speed information is specified as a unitless floating point ratio and can be specified through the resource manager or with the NODECFG parameter.

NoteThe SPEED specification must be in the range of 0.01 to 100.0.
See Object Triggers for details.

Variables associated with the given node, which can be used in job scheduling. See -l PREF.

NODECFG[node024] VARIABLE=var1
...			

Specifies the high-water threshold for utilization of resources on a server (i.e. processor and memory). This setting is used to protect hypervisors from becoming too highly utilized and thus negatively impacting the performance of VMs running on the hypervisor. Possible attributes include PROC and MEM.

NODECFG[node024] UTILIZATIONTHRESHOLD=PROC=2,MEM=2		

12.2.2 Node Features/Node Properties

A node feature (or node property) is an opaque string label that is associated with a compute node. Each compute node may have any number of node features assigned to it, and jobs may request allocation of nodes that have specific features assigned. Node features are labels and their association with a compute node is not conditional, meaning they cannot be consumed or exhausted.

Node features may be assigned by the resource manager, and this information may be imported by Moab or node features may be specified within Moab directly. As a convenience feature, certain node attributes can be specified via node features using the parameters listed in the following table:

PARAMETERDESCRIPTION
Set Node Type
Set Partition
Set Processor Speed
Set Rack
Set Slot

Example

FEATUREPARTITIONHEADER  par
FEATUREPROCSPEEDHEADER  cpu

See Also

Copyright © 2012 Adaptive Computing Enterprises, Inc.®