5.689 Changing Node State

A common task is to prevent jobs from running on a particular node by marking it offline with pbsnodes -o nodename. Once a node has been marked offline, the scheduler will no longer consider it available for new jobs. Simply use pbsnodes -c nodename when the node is returned to service.

Also useful is pbsnodes -l, which lists all nodes with an interesting state, such as down, unknown, or offline. This provides a quick glance at nodes that might be having a problem. (See pbsnodes for details.)

5.689.1 Node Recovery

When a mom gets behind on processing requests, pbs_server has a failsafe to allow for node recovery in processing the request backlog. After three failures without having two consecutive successes in servicing a request, pbs_server will mark that mom as offline for five minutes to allow the mom extra time to process the backlog before it resumes its normal activity. If the mom has two consecutive successes in responding to network requests before the timeout, then it will come back earlier.

Related Topics 

© 2016 Adaptive Computing