Moab can dynamically provision compute machines to requested operating systems and power off compute machines when not in use. Moab can intelligently control xCAT and use its advanced system configuration mechanisms to adapt systems to current workload requirements. Moab communicates with xCAT using the Moab Service Manager (MSM). MSM is a translation utility that resides between Moab and xCAT and acts as aggregator and interpreter. The Moab Workload Manager will query MSM, which in turn queries xCAT, about system resources, configurations, images, and metrics. After learning about these resources from MSM, Moab then makes intelligent decisions about the best way to maximize system utilization.
In this model Moab gathers system information from two resource managers. The first is TORQUE, which handles the workload on the system; the second is MSM, which relays information gathered by xCAT. By leveraging these software packages, Moab intelligently adapts clusters to deliver on-site goals.
This document assumes that xCAT has been installed and configured. It describes the process of getting MSM and xCAT communicating, and it offers troubleshooting guidance for basic integration. This document offers a description for how to get Moab communicating with MSM and the final steps in verifying a complete software stack.
Observe the following xCAT configuration requirements before installing MSM:
You must have a valid Moab license file (moab.lic) with provisioning and green enabled. For information on acquiring an evaluation license, please contact info@adaptivecomputing.com. |
perl -e 'use Storable 2.18' perl -MXML::Simple -e 'exit' perl -MProc::Daemon -e 'exit' perl -MDBD::SQLite -e 'exit'
Copy the x_msm table schema to the xCAT schema directory:
> cp $MSMHOMEDIR/contrib/xcat/MSM.pm $XCATROOT/lib/perl/xCAT_schema
Restart xcatd and check the x_msm table is correctly created:
> service xcatd restart
> tabdump x_msm
Prepare xCAT images and ensure they provision correctly (see xCAT documentation)
Populate the x_msm table with your image definitions:
> tabedit x_msm #flavorname,arch,profile,os,nodeset,features,vmoslist,hvtype,hvgroupname,vmgroupname,comments,disable "compute","x86_64","compute","centos5.3","netboot","torque",,,,,, "science","x86","compute","scientific_linux","netboot","torque",,,,,,
Ensure all xCAT group names in the x_msm table exist in the xCAT nodegroup table
> tabedit nodegroup
Edit as necessary to simulate the following example:
#groupname,grouptype,members,wherevals,comments,disable "compute",,,,, "esxi4",,,,, "esxhv",,,,, "esxvmmgt",,,,,
After making any necessary edits, run the following command:
> nodels compute,esxi4,esxhv,esxvmmgt # should complete without error, ok if doesn't return anything
Edit $MSMHOMEDIR/msm.cfg and configure the xCAT plug-in. Below is a generic example for use with TORQUE without virtualization. See the section on configuration parameters for a complete list of parameters and descriptions.
# MSM configuration options RMCFG[msm] PORT=24603 RMCFG[msm] POLLINTERVAL=45 RMCFG[msm] LOGFILE=/opt/moab/log/msm.log RMCFG[msm] LOGLEVEL=8 RMCFG[msm] DEFAULTNODEAPP=xcat # xCAT plugin specific options APPCFG[xcat] DESCRIPTION="xCAT plugin" APPCFG[xcat] MODULE=Moab::MSM::App::xCAT APPCFG[xcat] LOGLEVEL=3 APPCFG[xcat] POLLINTERVAL=45 APPCFG[xcat] TIMEOUT=3600 APPCFG[xcat] _USEOPIDS=0 APPCFG[xcat] _NODERANGE=moab,esxcompute APPCFG[xcat] _USESTATES=boot,netboot,install APPCFG[xcat] _LIMITCLUSTERQUERY=1 APPCFG[xcat] _RPOWERTIMEOUT=120 APPCFG[xcat] _DONODESTAT=1 APPCFG[xcat] _REPORTNETADDR=1 APPCFG[xcat] _CQXCATSESSIONS=4
Set up environment to manually call MSM commands:
# substitute appropriate value(s) for path(s) export MSMHOMEDIR=/opt/moab/tools/msm export MSMLIBDIR=/opt/moab/tools/msm export PATH=$PATH:/$MSMLIBDIR/contrib:$MSMLIBDIR/bin
Verify that MSM starts without errors:
> msmd
Verify that the expected nodes are listed, without errors, using the value of _NODERANGE from msm.cfg.
> nodels <_NODERANGE>
Verify that the expected nodes, are listed in the cluster query output from MSM:
> cluster.query.pl
Provision all nodes through MSM for the first time (pick and image name from x_msm):
> for i in `nodels <_NODERANGE>; do node.modify.pl $i --set os=<image_name>;done
Verify the nodes correctly provision and that the correct OS is reported (which may take some time after the provisioning requests are made):
> cluster.query.pl
When using MSM + xCAT to deploy images with TORQUE, there are some special configuration considerations. Most of these also apply to other workload resource managers.
Note that while the MSM xCAT plugin contains support for manipulating TORQUE directly, this is not an ideal solution. If you are using a version of xCAT that supports prescripts, it is more appropriate to write prescripts that manipulate TORQUE based on the state of the xCAT tables. This approach is also applicable to other workload resource managers, while the xCAT plugin only deals with TORQUE.
Several use cases and configuration choices are discussed in what follows.
Each image should be configured to report its image name through TORQUE. In the TORQUE pbs_mom mom_config file the "opsys" value should mirror the name of the image. See Node Manager (MOM) Configuration in the TORQUE Administrator's Guide for more information.
Moab is the intelligence engine that coordinates the capabilities of xCAT and TORQUE to dynamically provision compute nodes to the requested operating system. Moab also schedules workload on the system and powers off idle nodes. Download and install Moab.
Moab stores its configuration in the moab.cfg file: /opt/moab/moab.cfg. A sample configuration file, set up and optimized for adaptive computing follows:
# Example moab.cfg SCHEDCFG[Moab] SERVER=gpc-sched:42559 ADMINCFG[1] USERS=root,egan LOGLEVEL 7 # How often (in seconds) to refresh information from Torque and MSM RMPOLLINTERVAL 60 RESERVATIONDEPTH 10 DEFERTIME 0 ################################################################### # Location of msm directory # # www.adaptivecomputing.com/moabdocs/a.fparameters.php#toolsdir # ################################################################### TOOLSDIR /opt/moab/tools ############################################################################### # TORQUE and MSM configuration # # http://www.adaptivecomputing.com/resources/docs/mwm/a.fparameters.php#rmcfg # ############################################################################### RMCFG[torque] TYPE=PBS RMCFG[msm] TYPE=NATIVE:msm FLAGS=autosync,NOCREATERESOURCE RESOURCETYPE=PROV RMCFG[msm] TIMEOUT=60 RMCFG[msm] PROVDURATION=10:00 AGGREGATENODEACTIONS TRUE ############################################################################### # ON DEMAND PROVISIONING SETUP # # www.adaptivecomputing.com/moabdocs/3.5credoverview.php#qos # # www.adaptivecomputing.com/moabdocs/5.2nodeallocation.php#PRIORITY # # www.adaptivecomputing.com/moabodcs/a.fparameters.php#jobprioaccrualpolicy # ############################################################################### QOSCFG[od] QFLAGS=PROVISION USERCFG[DEFAULT] QLIST=od NODEALLOCATIONPOLICY PRIORITY NODECFG[DEFAULT] PRIORITYF=1000*OS+1000*POWER NODEAVAILABILITYPOLICY DEDICATED CLASSCFG[DEFAULT] DEFAULT.OS=scinetcompute ############################################################### # GREEN POLICIES # # www.adaptivecomputing.com/moabdocs/18.0greencomputing.php # ############################################################### NODECFG[DEFAULT] POWERPOLICY=ONDEMAND PARCFG[ALL] NODEPOWEROFFDURATION=20:00 NODEIDLEPOWERTHRESHOLD 600 # END Example moab.cfg
When Moab starts it immediately communicates with its configured resource managers. In this case Moab communicates with TORQUE to get compute node and job queue information. It then communicates with MSM to determine the state of the nodes according to xCAT. It aggregates this information and processes the jobs discovered from TORQUE.
When a job is submitted, Moab determines whether nodes need to be provisioned to a particular operating system to satisfy the requirements of the job. If any nodes need to be provisioned Moab performs this action by creating a provisioning system job (a job that is internal to Moab). This system job communicates with xCAT to provision the nodes and remain active while the nodes are provisioning. Once the system job has provisioned the nodes it informs the user’s job that the nodes are ready at which time the user’s job starts running on the newly provisioned nodes.
When a node has been idle for a specified amount of time (see NODEIDLEPOWERTHRESHOLD), Moab creates a power-off system job. This job communicates with xCAT to power off the nodes and remain active in the job queue until the nodes have powered off. Then the system job informs Moab that the nodes are powered off but are still available to run jobs. The power off system job then exits.
To verify correct communication between Moab and MSM run the mdiag -R –v msm command.
$ mdiag -R -v msm diagnosing resource managers RM[msm] State: Active Type: NATIVE:MSM ResourceType: PROV Timeout: 30000.00 ms Cluster Query URL: $HOME/tools/msm/contrib/cluster.query.xcat.pl Workload Query URL: exec://$TOOLSDIR/msm/contrib/workload.query.pl Job Start URL: exec://$TOOLSDIR/msm/contrib/job.start.pl Job Cancel URL: exec://$TOOLSDIR/msm/contrib/job.modify.pl Job Migrate URL: exec://$TOOLSDIR/msm/contrib/job.migrate.pl Job Submit URL: exec://$TOOLSDIR/msm/contrib/job.submit.pl Node Modify URL: exec://$TOOLSDIR/msm/contrib/node.modify.pl Node Power URL: exec://$TOOLSDIR/msm/contrib/node.power.pl RM Start URL: exec://$TOOLSDIR/msm/bin/msmd RM Stop URL: exec://$TOOLSDIR/msm/bin/msmctl?-k System Modify URL: exec://$TOOLSDIR/msm/contrib/node.modify.pl Environment: MSMHOMEDIR=/home/wightman/test/scinet/tools//msm;MSMLIBDIR=/home/wightman/test/scinet/tools//msm Objects Reported: Nodes=10 (0 procs) Jobs=0 Flags: autosync Partition: SHARED Event Management: (event interface disabled) RM Performance: AvgTime=0.10s MaxTime=0.25s (38 samples) RM Languages: NATIVE RM Sub-Languages: -
To verify nodes are configured to provision use the checknode -v
$ checknode n01 node n01 State: Idle (in current state for 00:00:00) Configured Resources: PROCS: 4 MEM: 1024G SWAP: 4096M DISK: 1024G Utilized Resources: --- Dedicated Resources: --- Generic Metrics: watts=25.00,temp=40.00 Power Policy: Green (global policy) Selected Power State: Off Power State: Off Power: Off MTBF(longterm): INFINITY MTBF(24h): INFINITY Opsys: compute Arch: --- OS Option: compute OS Option: computea OS Option: gpfscompute OS Option: gpfscomputea Speed: 1.00 CPULoad: 0.000 Flags: rmdetected RM[msm]: TYPE=NATIVE:MSM ATTRO=POWER EffNodeAccessPolicy: SINGLEJOB Total Time: 00:02:30 Up: 00:02:19 (92.67%) Active: 00:00:11 (7.33%)
To verify nodes are configured for Green power management, run the mdiag –G command. Each node will show its power state.
$ mdiag -G NOTE: power management enabled for all nodes Partition ALL: power management enabled Partition NodeList: Partition local: power management enabled Partition NodeList: node n01 is in state Idle, power state On (green powerpolicy enabled) node n02 is in state Idle, power state On (green powerpolicy enabled) node n03 is in state Idle, power state On (green powerpolicy enabled) node n04 is in state Idle, power state On (green powerpolicy enabled) node n05 is in state Idle, power state On (green powerpolicy enabled) node n06 is in state Idle, power state On (green powerpolicy enabled) node n07 is in state Idle, power state On (green powerpolicy enabled) node n08 is in state Idle, power state On (green powerpolicy enabled) node n09 is in state Idle, power state On (green powerpolicy enabled) node n10 is in state Idle, power state On (green powerpolicy enabled) Partition SHARED: power management enabled
To submit a job that dynamically provisions compute nodes, run the msub –l os=<image> command.
$ msub -l os=computea job.sh yuby.3 $ showq active jobs------------------------ JOBID USERNAME STATE PROCS REMAINING STARTTIME provision-4 root Running 8 00:01:00 Fri Jun 19 09:12:56 1 active job 8 of 40 processors in use by local jobs (20.00%) 2 of 10 nodes active (20.00%) eligible jobs---------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME yuby.3 wightman Idle 8 00:10:00 Fri Jun 19 09:12:55 1 eligible job blocked jobs----------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 blocked jobs Total jobs: 2
Notice that Moab created a provisioning system job named provision-4 to provision the nodes. When provision-4 detects that the nodes are correctly provisioned to the requested OS, the submitted job yuby.3 runs:
$ showq active jobs------------------------ JOBID USERNAME STATE PROCS REMAINING STARTTIME yuby.3 wightman Running 8 00:08:49 Fri Jun 19 09:13:29 1 active job 8 of 40 processors in use by local jobs (20.00%) 2 of 10 nodes active (20.00%) eligible jobs---------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 eligible jobs blocked jobs----------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 blocked jobs Total job: 1
The checkjob command shows information about the provisioning job as well as the submitted job. If any errors occur, run the checkjob –v <jobid> command to diagnose failures.
Plugin parameters that begin with an underscore character are specific to the xCAT plug-in; others are common to all plug-ins and may either be set in the RMCFG[msm] for all plug-ins, or per plug-in in the APPCFG[<plugin_name>].