16.0 Green computing > How-to's > Troubleshooting green computing

Conventions

16.5 Troubleshooting green computing

Verify your IPMI access

  1. Use the ipmitool command to verify you have access to the IPMI interface of your nodes. Try getting the current power state of a node. The syntax is ipmitool -I lan -H <host> -U <IPMI username> -P <IPMI password> chassis power status.

    $ ipmitool -I lan -H qt06 -U ADMIN -P ADMIN chassis power status
    
    Chassis Power is off

Verify the power query (CLUSTERQUERYURL) script is working

  1. Execute the impi.mon.py script (should be found in /<MOABHOMEDIR>/tools/ipmi) to start the monitor.

    $ cd /opt/moab/tools/ipmi
    $ ./ipmi.mon.py
  2. Execute the script again. The following is an example of the expected output:

    $ ./ipmi.mon.py
    
    qt09  GMETRIC[System_Temp]=27 GMETRIC[CPU_Temp]=25 POWER=on State=Unknown
    qt08  GMETRIC[System_Temp]=31 GMETRIC[CPU_Temp]=25 POWER=on State=Unknown
    qt07  GMETRIC[System_Temp]=30 GMETRIC[CPU_Temp]=29 POWER=on State=Unknown
    qt06  GMETRIC[System_Temp]=Disabled GMETRIC[CPU_Temp]=Disabled POWER=off State=Unknown

    If the POWER attribute is not present the script is not working correctly.

Verify the power action (NODEPOWERURL) script is working

  1. Execute the ipmi.power.py script (should be found in /<MOABHOMEDIR>/tools/ipmi) to see if you can force a node to power on or off. The syntax is ipmi.power.py <node>,<node>,<node>... [off|on]

    $ /opt/moab/tools/ipmi/ipmi.power.py qt06 off

    This example is trying to power off a node named qt06.

  2. Verify the machine's power state was changed to what you attempted in the previous step. You can do this remotely via two methods:

    1. If the cluster query script is working, you can use that to verify the current power state of the node.
    2. If you have IPMI access, you can use the ipmitool command to verify the current power state of the node.

Verify the scripts are configured correctly

  1. Run the mdiag -R command to verify your IPMI resource manager configuration.

    $ mdiag -R -v

    RM[ipmi]      State: Active  Type: NATIVE  ResourceType: PROV
    Timeout:            30000.00 ms
    Cluster Query URL:  exec://$TOOLSDIR/ipmi/ipmi.mon.py
    Node Power URL:     exec://$TOOLSDIR/ipmi/ipmi.power.py
    Objects Reported:   Nodes=3 (0 procs)  Jobs=0
    Nodes Reported:     3 (N/A)
    Partition:          SHARED
    Event Management:   (event interface disabled)
    RM Performance:     AvgTime=0.05s  MaxTime=0.06s  (176 samples)
    RM Languages:       NATIVE
    RM Sub-Languages:   NATIVE
  2. Run the mdiag -G command to verify that power information is being reported correctly.

    $ mdiag -G
    
    NodeID      State      Power   Watts  PWatts
     qt09       Idle       On      0.00   0.00
     qt08       Idle       On      0.00   0.00
     qt07       Idle       Off     0.00   0.00

Verify the scripts are running

  1. Once green is configured and Moab is running, Moab should start the power query script automatically. Use the ps command to verify the script is running.

    $ ps -ef | grep <CLUSTERQUERYURL script name>

    If this command does not show the power query script running then your settings in moab.cfg aren't working.

Verify Moab can power nodes on or off

  1. Use the mnodectl command to turn a node on or off. The syntax is mnodectl -m power=[off|on] <node>.

    mnodectl -m power=off qt06

    Moab should turn off the node named qt06.

    1. Moab generates a system job called poweron-<num> or poweroff-<num> job as shown in showq. The system job calls the ipmi.power.py (NODEPOWERURL) script to execute the command.
    2. Moab waits until the cluster query reports the correct data. In this case, the ipmi.power.py script reports that the power attribute has changed.
    3. Moab does not change the power status based on the power script return code. Rather, Moab completes the system power job when it detects the power attribute has changed as indicated by the cluster query script.

Related topics