17.4 Hierarchal Grid Management

17.4.1 Configuring a Peer Server (Source)

Peer relationships are enabled by creating and configuring a resource manager interface using the RMCFG parameter. This interface defines how a given Moab will load resource and workload information and enforce its scheduling decisions. In non-peer cases, the RMCFG parameter points to a resource manager such as TORQUE, LSF, or SGE. However, if the TYPE attribute is set to Moab, the RMCFG parameter can be used to configure and manage a peer relationship.

17.4.1.1 Simple Hierarchal Grid

The first step to create a new peer relationship is to configure an interface to a destination Moab server. In the following example, cluster C1 is configured to be able to see and use resources from two other clusters.

SCHEDCFG[C1] MODE=NORMAL SERVER=head.C1.xyz.com:41111 
RMCFG[C2]    TYPE=moab   SERVER=head.C2.xyz.com:40559 
RMCFG[C3]    TYPE=moab   SERVER=head.C3.xyz.com:40559
...  

In this example, C1 allows a global view of the underlying clusters. From C1, jobs can be viewed and modified. C2 and C3 act as separate scheduling entities that can receive jobs from C1. C1 migrates jobs to C2 and C3 based on available resources and policies of C1. Jobs migrated to C2 and C3 are scheduled according to the polices on C2 and C3.

In this case, one RMCFG parameter is all that is required to configure each peer relationship if standard secret key based authentication is being used and a shared default secret key exists between the source and destination Moabs. However, if peer relationships with multiple clusters are to be established and a per-peer secret key is to be used (highly recommended), then a CLIENTCFG parameter must be specified for the authentication mechanism. Because the secret key must be kept secure, it must be specified in the moab-private.cfg file. For the current example, a per-peer secret key could be set up by creating the following moab-private.cfg file on the C1 cluster.

CLIENTCFG[RM:C2] KEY=fastclu3t3r  
CLIENTCFG[RM:C3] KEY=14436aaa 
Note The key specified can be any alphanumeric value and can be locally generated or made up. The only critical aspect is that the keys specified on each end of the peer relationship match.

Additional information can be found in the Grid Security section which provides detailed information on designing, configuring, and troubleshooting peer security.

Continuing with the example, the initial source side configuration is now complete. On the destination clusters, C2 and C3, the first step is to configure authentication. If a shared default secret key exists between all three clusters, then configuration is complete and the clusters are ready to communicate. If per-peer secret keys are used (recommended), then it will be necessary to create matching moab-private.cfg files on each of the destination clusters. With this example, the following files would be required on C2 and C3 respectively:

CLIENTCFG[RM:C1] KEY=fastclu3t3r AUTH=admin1
CLIENTCFG[RM:C1] KEY=14436aaa AUTH=admin1

Once peer security is established, a final optional step would be to configure scheduling behavior on the destination clusters. By default, each destination cluster accepts jobs from each trusted peer. However, it will also be fully autonomous, accepting and scheduling locally submitted jobs and enforcing its own local policies and optimizations. If this is the desired behavior, then configuration is complete.

In the current example, with no destination side scheduling configuration, jobs submitted to cluster C1 can run locally, on cluster C2 or on cluster C3. However, the established configuration does not necessarily enforce a strict master-slave relationship because each destination cluster (C2 and C3) has complete autonomy over how, when, and where it schedules both local and remote jobs. Each cluster can potentially receive jobs that are locally submitted and can also receive jobs from other source Moab servers. See Slave Mode for more information on setting up a master-slave grid.

Further, each destination cluster will accept any and all jobs migrated to it from a trusted peer without limitations on who can run, when and where they can run, or how many resources they can use. If this behavior is either too restrictive or not restrictive enough, then destination side configuration will be required.

Copyright © 2012 Adaptive Computing Enterprises, Inc.®