21.0 VMs > Overcommit Factor and Threshold

Conventions

21.2 Overcommit Factor and Threshold

The two main configuration settings that govern how migrations work are the Overcommit Factor and Overcommit Threshold. Both can be applied to the processors and memory of virtual machines (VM's).

The Overcommit Factor and Threshold can be defined as a global default or on a per-node basis.

NODECFG[DEFAULT]  OVERCOMMIT=PROC:2.0,MEM:2.0  # This is the default global policy
NODECFG[node42]   OVERCOMMIT=PROC:3.0,MEM:3.0  # This is a node-specific policy for node42

Overcommit Factor defines the upper bound or maximum amount of VCPUs that can be created on any given hypervisor (HV). For example, if you have a hypervisor with 12 processors or cores (Moab sees them as 12 processors), and have an Overcommit Factor of 2.0 for procs, then Moab will not allow, under any condition, more than 24 VCPU's to be allocated on this hypervisor. Remember: a VM can have one or more VCPU's. So, in this example, the HV could only support 8 VM's if they all had 3 VPCU's each. It could support 4 VM's if they had 6 VPCU's each, and so forth.

The Overcommit Threshold defines how many VM's are allowed on a node if those VM's have a load being reported. The Overcommit Factor defines the maximum under any condition, but the Overcommit Threshold controls how many can be practically supported by a hypervisor due to load.

An Overcommit Threshold is a number between 0 and 1 and is interpreted as a percentage that is applied to the number of configured processors. It is not applied to the overcommitted processor count. For example, if we have an Overcommit Threshold of 0.7 for CPUs and a hypervisor with 12 configured processors, then that HV can support a CPU load of up to 8.4 before Moab will try to migrate VM's off of it. Moab uses the CPULOAD reported for the hypervisor to determine if the threshold is exceeded.

An example using both the Overcommit Factor and Overcommit Threshold is as follows:

Example of Overcommit Migration event

#GMETRIC threshold based triggers
# WLM Metric Threshold to check file system utilization
GEVENTCFG[disk_free] ACTION=fail SEVERITY=4
NODECFG[DEFAULT] TRIGGER=atype=exec,etype=threshold,failoffset=1:00,threshold=gmetric[disk_free]>90,action="/opt/moab/tools/filesize_fault.py $OID $METRICTYPE"		

In the above example, a GEVENT of type disk_free is created. This is one of the predefined GEVENTS provided by Moab. The action is set to fail if the event is triggered and give it an arbitrary SEVERITY of 4. Next, the nodes to be defined with this event are specified. In this example this is applied to all nodes. If the threshold is over 90, Moab is informed the following action will take effect: /opt/moab/tools/filesize_fault.py $OID $METRICTYPE

Example of supported GMETRIC. Note this is not an exhaustive list:

"bytes_out"Number of net bytes out per second 
"cpu_num" Number of CPUs 
"cpu_speed" processor speed (in MHz) 
"disk_free" Total free disk space (GB) 
"disk_total" Total available disk space 
"load_one" One minute load average 
"machine_type" cpu architecture 
"mem_free" Amount of available memory (KB) 
"mem_total"  Amount of available memory 
"os_name"
"os_release" operating system release 
"pkts_in" NYI */ / Packets in per second (packets/sec) 
"pkts_out" NYI Packets out per second 
"swap_free" Amount of available swap memory (KB) 
"swap_total"  Total amount of swap memory 		

Note that the Overcommit Factor and Threshold should also apply when selecting a VM destination. If a VM needs to be migrated off of a loaded hypervisor Y, but moving it to hypervisor X would cause X's load or overcommit factor to be violated, Moab cannot move it to X. It must try to find another location. Also, the example above dealt only with CPU or processor counts, but Overcommit Factor and Threshold also apply to a wide array of system resources. You can also create your own with Ganglia.