8.4.6 Testing and troubleshooting preemption

There are multiple steps associated with setting up a working preemption policy. With preemption, issues arise because it appears that Moab is not allowing PREEMPTORs to preempt PREEMPTEEs in the right way. To diagnose this, use the following checklist:

Are PREEMPTOR jobs marked with the PREEMPTOR flag? (Verify with checkjob <JOBID> | grep Flags.)
Are PREEMPTEE jobs marked with the PREEMPTEE flag? (Verify with checkjob <JOBID> | grep Flags.)
Is the start priority of the PREEMPTOR higher than the priority of the PREEMPTEE? (Verify with checkjob <JOBID> | grep Priority.)
Do the resources allocated to the PREEMPTEE match those requested by the PREEMPTOR?
Is the PREEMPTOR within the 32-PREEMPTEE limit?
Are any policies preventing preemption from occurring? (Verify with checkjob -v -n <NODEID> <JOBID>.)
Is the PREEMPTPOLICY parameter properly set? (See Choosing a PREEMPTPOLICY type.)
Is the PREEMPTEE properly marked as restartable, suspendable, or checkpointable? (Verify with checkjob <JOBID> | grep Flags.)
Is GUARANTEEDPREEMPTION set to TRUE?
Is JOBNODEMATCHPOLICY set to EXACTNODE? (It should NOT be set to EXACTNODE.)
Is NODEACCESSPOLICY set to SINGLEUSER? (It should NOT be set to SINGLEUSER; SHARED is recommended.)
Is BACKFILLPOLICY set to FIRSTFIT?
Is the resource manager properly responding to preemption requests? (Use mdiag -R.)
If there is a resource manager level race condition, is Moab properly holding target resources? (Verify with mdiag -S and set RESERVATIONRETRYTIME if needed.)

See Also

Copyright © 2012 Adaptive Computing Enterprises, Inc.®