(Click to open topic with navigation)
If you've enabled green computing and are having trouble, here are some tips that can help you determine the cause of the issues you encounter. These tips are specifically for the Adaptive Computing supplied IPMI scripts, but can be generalized for whatever power management solution you use. Simply substitute your power management system, power query script (as specified by CLUSTERQUERYURL), and power action script (as specified by NODEPOWERURL) where appropriate.
Use the ipmitool command to verify you have access to the IPMI interface of your nodes. Try getting the current power state of a node. The syntax is ipmitool -I lan -H <host> -U <IPMI username> -P <IPMI password> chassis power status.
$ ipmitool -I lan -H qt06 -U ADMIN -P ADMIN chassis power status Chassis Power is off
Verify the power query (CLUSTERQUERYURL) script is working
Execute the impi.mon.py script (should be found in /<MOABHOMEDIR>/tools/ipmi) to start the monitor.
$ cd /opt/moab/tools/ipmi $ ./ipmi.mon.py
Execute the script again. The following is an example of the expected output:
$ ./ipmi.mon.py qt09 GMETRIC[System_Temp]=27 GMETRIC[CPU_Temp]=25 POWER=on State=Unknown qt08 GMETRIC[System_Temp]=31 GMETRIC[CPU_Temp]=25 POWER=on State=Unknown qt07 GMETRIC[System_Temp]=30 GMETRIC[CPU_Temp]=29 POWER=on State=Unknown qt06 GMETRIC[System_Temp]=Disabled GMETRIC[CPU_Temp]=Disabled POWER=off State=Unknown
If the POWER attribute is not present the script is not working correctly.
Verify the power action (NODEPOWERURL) script is working
Execute the ipmi.power.py script (should be found in /<MOABHOMEDIR>/tools/ipmi) to see if you can force a node to power on or off. The syntax is ipmi.power.py <node>,<node>,<node>... [off|on]
$ /opt/moab/tools/ipmi/ipmi.power.py qt06 off
This example is trying to power off a node named qt06.
Verify the machine's power state was changed to what you attempted in the previous step. You can do this remotely via two methods:
Verify the scripts are configured correctly
Run the mdiag -R command to verify your IPMI resource manager configuration.
$ mdiag -R -v
RM[ipmi] State: Active Type: NATIVE ResourceType: PROV Timeout: 30000.00 ms Cluster Query URL: exec://$TOOLSDIR/ipmi/ipmi.mon.py Node Power URL: exec://$TOOLSDIR/ipmi/ipmi.power.py Objects Reported: Nodes=3 (0 procs) Jobs=0 Nodes Reported: 3 (N/A) Partition: SHARED Event Management: (event interface disabled) RM Performance: AvgTime=0.05s MaxTime=0.06s (176 samples) RM Languages: NATIVE RM Sub-Languages: NATIVE
Run the mdiag -G command to verify that power information is being reported correctly.
$ mdiag -G NodeID State Power Watts PWatts qt09 Idle On 0.00 0.00 qt08 Idle On 0.00 0.00 qt07 Idle Off 0.00 0.00
Verify the scripts are running
Once green is configured and Moab is running, Moab should start the power query script automatically. Use the ps command to verify the script is running.
$ ps -ef | grep <CLUSTERQUERYURL script name>
If this command does not show the power query script running then your settings in moab.cfg aren't working.
Verify Moab can power nodes on or off
Use the mnodectl command to turn a node on or off. The syntax is mnodectl -m power=[off|on] <node>.
mnodectl -m power=off qt06
Moab should turn off the node named qt06.
Related topics