Overview
The Maui Scheduler is a policy engine which allows sites control over when, where, and how resources such as processors, memory, and disk are allocated to jobs. In addition to this control, it also provides mechanisms which help to intelligently optimize the use of these resources, monitor system performance, help diagnose problems, and generally manage the system.
1.0 Philosophy and Goals of the Maui Scheduler
2.0
Installation and Initial Configuration
2.1
Building and Installing Maui
2.2
Initial Configuration
2.3
Initial Testing
3.0 Scheduler Basics
3.1 Layout of Scheduler Components
3.2
Scheduling Environment and Objects
3.3
Scheduling Iterations and Job Flow
3.4
Configuring the Scheduler
4.0 Scheduler Commands
4.1
Client Overview
4.2
Monitoring System Status
4.3
Managing Jobs
4.4
Managing Reservations
4.5
Configuring Policies
4.6
End User Commands
4.7
Miscellaneous Commands
5.0 Prioritizing Jobs and Allocating Resources
5.1
Job Priority
5.2
Node Allocation
5.3
Node Access
5.4
Node Availability
5.5
Task Distribution
6.0 Managing Fairness - Throttling Policies,
Fairshare, and Allocation Management
6.1
Fairness Overview
6.2
Throttling Policies
6.3
Fairshare
6.4
Allocation Management
7.0 Controlling Resource Access - Reservations,
Partitions, and QoS Facilities
7.1 Advance Reservations
7.2 Partitions
7.3 QoS Facilities
8.0 Optimizing Scheduling Behavior - Backfill,
Node Sets, and Preemption
8.1 Optimization
Overview
8.2
Backfill
8.3
Node Sets
8.4
Preemption
9.0 Evaluating System Performance - Statistics,
Profiling, Testing, and Simulation
9.1 Scheduler Performance
Evaluation Overview
9.2
Accounting - Job and System Statistics
9.3 Profiling Current
and Historical Usage
9.4 Testing New
Versions and Configurations
9.5
Answering 'What If?' Questions with the Simulator
10.0 Managing Shared Resources - SMP Issues and Policies
10.1 Consumable Resource Handling
10.2 Load Balancing Features
10.3 Resource Usage Tracking
10.4 Resource Usage Limits
11.0 General Job Administration
11.1 Deferred Jobs and Job Holds
11.2 Job Priority
Management
11.3 Suspend/Resume
Handling
11.4 Checkpoint/Restart
11.5 Job Dependencies
11.6 Setting Job Defaults and Per Job Limits
11.7 General Job Policies
11.8 Using a Local Queue
12.0 General
Node Administration
12.1 Node Location (Partitions, Frames, Queues, etc.)
12.2 Node Attributes (Node Features, Speed, etc.)
12.3 Node Specific Policies (MaxJobPerNode, etc.)
12.4 Configuring Node-Locked Consumable Generic Resources (tape drives, node-locked licenses, etc.)
13.0 Resource Managers and Interfaces
13.1 Resource Manager Overview
13.2 Resource Manager Configuration
13.3 Resource Manager Extensions
13.4 Adding Resource Manager Interfaces
14.0 Trouble Shooting and System Maintenance
14.1 Internal Diagnostics
14.2
Logging Facilities
14.3 Using the
Message Buffer
14.4
Handling Events with the Notification Routine
14.5
Issues with Client Commands
14.6
Tracking System Failures
14.7
Problems with Individual Jobs
15.0 Improving User Effectiveness
15.1 User Feedback
Loops
15.2 User Level
Statistics
15.3 Enhancing
Wallclock Limit Estimates
15.4 Providing
Resource Availability Information
15.5 Job Start
Time Estimates
15.6 Collecting
Performance Information on Individual Jobs
16.0 Simulations
16.1 Simulation
Overview
16.2
Resource Traces
16.3
Workload Traces
16.4 Simulation
Specific Configuration
17.0 Miscellaneous
17.1 User Feedback
17.2 Grid Scheduling
17.3 Enabling High Availability Features
17.4 Using the Application Scheduling Library
Appendices
Appendix A: Case Studies
Appendix B: Extension Interface
Appendix C: Adding New Algorithms
Appendix D: Structure Limits
Appendix E: Security Configuration
Appendix F: Parameters Overview
Appendix G: Commands Overview
Appendix H: Interfacing to Maui
Appendix I: Considerations for Large Clusters
Appendix J: Differences Guide
Appendix K: Maui-Moab Comparison