(Click to open topic with navigation)
W.1.1 Query Resources Data Format
NAME | FORMAT | DEFAULT | DESCRIPTION |
---|---|---|---|
ADISK | <INTEGER> | 0 | Available local disk on node (in MB) |
AFS | <fs id="X" size="X" io="Y" rcount="X" wcount="X" ocount="X"></fs>[...] | 0 | Available filesystem state |
AMEMORY | <INTEGER> | 0 | Available/free RAM on node (in MB) |
APROC | <INTEGER> | 1 | Available processors on node |
ARCH | <STRING> | --- | Compute architecture of node |
ARES | one or more comma delimited <NAME>:<VALUE> pairs (ie, MATLAB:6,COMPILER:100) | --- | Arbitrary consumable resources currently available on the node |
ASWAP | <INTEGER> | 0 | Available swap on node (in MB) |
CCLASS | one or more bracket enclosed <NAME>:<COUNT> pairs (ie, [batch:5][sge:3]) | --- | Run classes supported by node. Typically, one class is 'consumed' per task. Thus, an 8 processor node may have 8 instances of each class it supports present, ie [batch:8][interactive:8] |
CDISK | <INTEGER> | 0 | Configured local disk on node (in MB) |
CFS | <STRING> | 0 | Configured filesystem state |
CMEMORY | <INTEGER> | 0 | Configured RAM on node (in MB) |
CONTAINERNODE | <STRING> | --- | The physical machine that is hosting the virtual machine. Only valid on VMs. |
CPROC | <INTEGER> | 1 | Configured processors on node |
CPULOAD | <DOUBLE> | 0.0 | One minute BSD load average |
CPUSPEED | <INTEGER> | --- | The node's processor speed in MHz |
CRES | one or more comma delimited <NAME>:<VALUE> pairs (ie, MATLAB:6,COMPILER:100) | --- | Arbitrary consumable resources supported and tracked on the node, ie software licenses or tape drives |
CSWAP | <INTEGER> | 0 | Configured swap on node (in MB) |
FEATURE | one or more colon delimited <STRING>'s (ie, WIDE:HSM) | --- | Generic attributes, often describing hardware or software features, associated with the node |
GEVENT | GEVENT[<EVENTNAME>]=<STRING> | --- | Generic event occurrence and context data |
GMETRIC | GMETRIC[<METRICNAME>]=<DOUBLE> | --- | Current value of generic metric, i.e., 'GMETRIC[temp]=103.5'. |
IDLETIME | <INTEGER> | --- | Number of seconds since last detected keyboard or mouse activity (often used with desktop harvesting) |
MAXTASK | <INTEGER> | <CPROC> | Maximum number of tasks allowed on the node at any given time |
NETADDR | <STRING> | --- | The IP address of the machine |
NODEINDEX | <INTEGER> | --- | The node's index |
OS | <STRING> | --- | Operating system running on node |
OSLIST | One or more comma delimited <STRING>'s with quotes if the string has spaces (ie. "SAS7 AS3 Core Baseline Build v0.1.0","RedHat AS3-U5Development Build v0.2"). | --- | Operating systems accepted by node |
OTHER | <ATTR>=<VALUE>[,<ATTR>=<VALUE>]... | --- | Opaque node attributes assigned to node |
PARTITION | <STRING> | DEFAULT | Partition to which node belongs |
POWER | <BOOLEAN> | Whether the machine is on or off | |
PRIORITY | <INTEGER> | --- | Node allocation priority |
RACK | <INTEGER> | 0 | Rack location of the node |
SLOT | <INTEGER> | 0 | Slot location of the node |
STATE* | one of the following: Idle, Running, Busy, Unknown, Drained, Draining, or Down | Down | State of the node |
UPDATETIME* | <EPOCHTIME> | 0 | Time node information was last updated |
VARATTR | <ATTR1>=<VAL1>[=<displayName1>][+<ATTR2>=<VAL2>[=<displayName2>]]... | --- |
Plus-delimited (+) list of <ATTR>=<VAL>[=<displayName>] pairs that jobs can request. You can replace any of the equals signs with colons if desired. Specifying a display name allows you to choose a name that will be displayed in the Mongo database instead of the unique ID (the <VALUE>). If you give two different attributes the same value and one of them also has a display name specified, both attributes will appear with the same display name. |
VARIABLE | <ATTR>=<VAL> | --- | Generic variables to be associated with node |
VMOSLIST | <STRING> | --- | Comma-delimited list (,) of supported virtual machine operating systems for this node |
XRES | one or more comma delimited <NAME>:<VALUE> pairs (ie, MATLAB:6,COMPILER:100) | --- | Amount of external usage of a particular generic resource |
* indicates required field
Node states have the following definitions:
State | Description |
---|---|
Busy | Node is running some jobs and will not accept additional jobs |
Down | Resource Manager problems have been detected. Node is incapable of running jobs. |
Draining | Node is responding but will not accept new jobs |
Idle | Node is ready to run jobs but currently is not running any. |
Running | Node is running some jobs and will accept additional jobs |
Unknown | Node is capable of running jobs but the scheduler will need to determine if the node state is actually Idle, Running, or Busy. |
W.1.2 Query Workload Data Format
NAME | FORMAT | DEFAULT | DESCRIPTION |
---|---|---|---|
ACCOUNT | <STRING> | --- | AccountID associated with job |
ARGS | <STRING> | --- | job command-line arguments |
COMMENT | <STRING> | 0 | job resource manager extension arguments including qos, dependencies, reservation constraints, etc |
COMPLETETIME* | <EPOCHTIME> | 0 | time job completed execution |
DDISK | <INTEGER> | 0 | quantity of local disk space (in MB) which must be dedicated to each task of the job |
DGRES | name:value[,name:value] | --- | Dedicated generic resources per task. |
DPROCS | <INTEGER> | 1 | number of processors dedicated per task |
DSWAP | <INTEGER> | 0 | quantity of virtual memory (swap, in MB) which must be dedicated to each task of the job |
ENDDATE | <EPOCHTIME> | [ANY] | time by which job must complete |
ENV | <STRING> | --- | job environment variables |
ERROR | <STRING> | --- | file to contain STDERR |
EVENT | <EVENT> | --- | event or exception experienced by job |
EXEC | <STRING> | --- | job executable command |
EXITCODE | <INTEGER> | --- | job exit code |
FLAGS | <STRING> | --- | job flags |
GEOMETRY | <STRING> | --- | String describing task geometry required by job |
GNAME* | <STRING> | --- | GroupID under which job will run |
HOSTLIST | comma or colon delimited list of hostnames - suffix the hostlist with a carat (^) to mean superset; suffix with an asterisk (*) to mean subset; otherwise, the hostlist is interpreted as an exact set |
[ANY] |
list of required hosts on which job must run. (see TASKLIST) A subset means the specified hostlist is used first to select hosts for the job. If the job requires more hosts than are in the hostlist, they will be obtained from elsewhere if possible. If the job does not require all of the jobs in the hostlist, it will use only the ones it needs. A superset means the hostlist is the only source of hosts that should be considered for running the job. If the job can't find the necessary resources in the hosts in this list it should not run. No other hosts should be considered in allocating the job. |
INPUT | <STRING> | --- | file containing STDIN |
IWD | <STRING> | --- | job's initial working directory |
NAME | <STRING> | --- | User specified name of job |
NODES | <INTEGER> | 1 | Number of nodes required by job (See Node Definition for more info) |
OUTPUT | <STRING> | --- | file to contain STDOUT |
PARTITIONMASK | one or more colon delimited <STRING>s | [ANY] | list of partitions in which job can run |
PREF | colon delimited list of <STRING>s | --- | List of preferred node features or variables. (See PREF for more information.) |
PRIORITY | <INTEGER> | --- | system priority (absolute or relative - use '+' and '-' to specify relative) |
QOS | <INTEGER> | 0 | quality of service requested |
QUEUETIME* | <EPOCHTIME> | 0 | time job was submitted to resource manager |
RARCH | <STRING> | --- | architecture required by job |
RCLASS | list of bracket enclosed <STRING>:<INTEGER> pairs | --- | list of <CLASSNAME>:<COUNT> pairs indicating type and number of class instances required per task. (ie, [batch:1] or [batch:2][tape:1]) |
RDISK | <INTEGER> | 0 | local disk space (in MB) required to be configured on nodes allocated to the job |
RDISKCMP | one of >=, >, ==, <, or <= | >= | local disk comparison (ie, node must have > 2048 MB local disk) |
REJCODE | <INTEGER> | 0 | reason job was rejected |
REJCOUNT | <INTEGER> | 0 | number of times job was rejected |
REJMESSAGE | <STRING> | --- | text description of reason job was rejected |
REQRSV | <STRING> | --- | Name of reservation in which job must run |
RESACCESS | <STRING> | --- | List of reservations in which job can run |
RFEATURES | colon delimited list <STRING>'s | --- | List of features required on nodes |
RMEM | <INTEGER> | 0 | real memory (RAM, in MB) required to be configured on nodes allocated to the job |
RMEMCMP | one of '>=', '>', '==', '<', or '<=' | >= | real memory comparison (ie, node must have >= 512MB RAM) |
ROPSYS | <STRING> | --- | operating system required by job |
RSOFTWARE | <RESTYPE>[{+|:}<COUNT>] [@<TIMEFRAME>] |
--- | software required by job |
RSWAP | <INTEGER> | 0 | virtual memory (swap, in MB) required to be configured on nodes allocated to the job |
RSWAPCMP | one of '>=', '>', '==', '<', or '<=' | >= | virtual memory comparison (ie, node must have ==4096 MB virtual memory) |
SID | <STRING> | --- | system id (global job system owner) |
STARTDATE | <EPOCHTIME> | 0 | earliest time job should be allowed to start |
STARTTIME* | <EPOCHTIME> | 0 | time job was started by the resource manager |
STATE* | one of Idle, Running, Hold, Suspended, Completed, or Removed | Idle | State of job |
SUSPENDTIME | <INTEGER> | 0 | Number of seconds job has been suspended |
TASKLIST | one or more comma-delimited <STRING>'s | --- | list of allocated tasks, or in other words, comma-delimited list of node ID's associated with each active task of job (i.e., cl01, cl02, cl01, cl02, cl03) The tasklist is initially selected by the scheduler at the time the StartJob command is issued. The resource manager is then responsible for starting the job on these nodes and maintaining this task distribution information throughout the life of the job. (see HOSTLIST) |
TASKS* | <INTEGER> | 1 | Number of tasks required by job (See Task Definition for more info) |
TASKPERNODE | <INTEGER> | 0 | exact number of tasks required per node |
UNAME* | <STRING> | --- | UserID under which job will run |
UPDATETIME* | <EPOCHTIME> | 0 | Time job was last updated |
WCLIMIT* | [[HH:]MM:]SS | 864000 | walltime required by job |
* indicates required field
Job states have the following definitions:
State | Definition |
---|---|
Completed | Job has completed |
Hold | Job is in the queue but is not allowed to run |
Idle | Job is ready to run |
Removed | Job has been canceled or otherwise terminated externally |
Running | Job is currently executing |
Suspended | job has started but execution has temporarily been suspended |
Completed and canceled jobs should be maintained by the resource manager for a brief time, perhaps 1 to 5 minutes, before being purged. This provides the scheduler time to obtain all final job state information for scheduler statistics.
Related Topics