momctl

(PBS MOM Control)

Synopsis

momctl -c { <JOBID> | all }
momctl -C
momctl -d { <INTEGER> | <JOBID> }
momctl -f <FILE>
momctl -h <HOST>[,<HOST>]...
momctl -p <PORT_NUMBER>
momctl -q <ATTRIBUTE>
momctl -r { <FILE> | LOCAL:<FILE> }
momctl -s

Overview

The momctl command allows remote shutdown, reconfiguration, diagnostics, and querying of the pbs_mom daemon.

Format

-c — Clear
Format { <JOBID> | all }
Default ---
Description Makes the MOM unaware of the job's existence. It does not clean up any processes associated with the job.
Example

momctl - node1 -c 15406

-C — Cycle
Format ---
Default ---
Description Cycle pbs_mom(s)
Example

momctl - node1 -C

Cycle pbs_mom on node1.

-d — Diagnose
Format { <INTEGER> | <JOBID> }
Default 0
Description

Diagnose MOM(s)

(For more details, see Diagnose detail below.)

Example

momctl - node1 -d 2

Print level 2 and lower diagnose information for the MOM on node1.

-f — Host File
Format <FILE>
Default ---
Description A file containing only comma or whitespace (space, tab, or new line) delimited hostnames
Example

momctl -f hosts.txt -d

Print diagnose information for the MOMs running on the hosts specified in hosts.txt.

-h — Host List
Format <HOST>[,<HOST>]...
Default localhost
Description A comma separated list of hosts
Example

momctl -h node1,node2,node3 -d

Print diagnose information for the MOMs running on node1, node2, and node3.

-p — Port
Format <PORT_NUMBER>
Default TORQUE's default port number
Description The port number for the specified MOM(s)
Example

momctl -p 5455 -h node1 -d

Request diagnose information over port 5455 on node1.

-q — Query
Format <ATTRIBUTE>
Default ---
Description Query <ATTRIBUTE> on specified MOM, where <ATTRIBUTE> is a property listed by pbsnodes -a (see Query attributes for a list of attributes)
Example

momctl -q physmem

Print the amount of physmem on localhost.

-r — Reconfigure
Format { <FILE> | LOCAL:<FILE> }
Default ---
Description Reconfigure MOM(s) with remote or local config file, <FILE>. This does not work if $remote_reconfig is not set to true when the MOM is started.
Example

momctl -r /home/user1/new.config -h node1

Reconfigure MOM on node1 with /home/user1/new.cofig on node1.

-s — Shutdown
Format  
Default ---
Description Shutdown pbs_mom
Example

momctl -s

Terminates pbs_mom process on localhost.

Query attributes

Attribute Description
arch node hardware architecture
availmem available RAM
loadave 1 minute load average
ncpus number of CPUs available on the system
netload total number of bytes transferred over all network interfaces
nsessions number of sessions active
nusers number of users active
physmem configured RAM
sessions list of active sessions
totmem configured RAM plus configured swap

Diagnose detail

Level Description
0

Display the following information:

  • Local hostname
  • Expected server hostname
  • Execution version
  • MOM home directory
  • MOM config file version (if specified)
  • Duration MOM has been executing
  • Duration since last request from pbs_server daemon
  • Duration since last request to pbs_server daemon
  • RM failure messages (if any)
  • Log verbosity level
  • Local job list
1

All information for level 0 plus the following:

  • Interval between updates sent to server
  • Number of initialization messages sent to pbs_server daemon
  • Number of initialization messages received from pbs_server daemon
  • Prolog/epilog alarm time
  • List of trusted clients
2

All information from level 1 plus the following:

  • PID
  • Event alarm status
3

All information from level 2 plus the following:

  • syslog enabled

Example A-1: MOM diagnostics

momctl -d 1

 

Host: nsrc/nsrc.fllcl.com    Server: 10.10.10.113    Version: torque_1.1.0p4

HomeDirectory:          /usr/spool/PBS/mom_priv

ConfigVersion:          147

MOM active:             7390 seconds

Last Msg From Server:   7389 seconds (CLUSTER_ADDRS)

Server Update Interval: 20 seconds

Server Update Interval: 20 seconds

Init Msgs Received:     0 hellos/1 cluster-addrs

Init Msgs Sent:         1 hellos

LOGLEVEL:               0 (use SIGUSR1/SIGUSR2 to adjust)

Prolog Alarm Time:      300 seconds

Trusted Client List:    12.14.213.113,127.0.0.1

JobList:                NONE

 

diagnostics complete

Example A-2: System shutdown

> momctl -s -f /opt/clusterhostfile

 

shutdown request successful on node001

shutdown request successful on node002

shutdown request successful on node003

shutdown request successful on node004

shutdown request successful on node005

shutdown request successful on node006

© 2014 Adaptive Computing