4.320 Monitoring Jobs

Torque allows users and administrators to monitor submitted jobs with the qstat command.

If the command is run by a non-administrative user, it will output just that user's jobs. For example:

> qstat

Job id           Name             User             Time Use S Queue

---------------- ---------------- ---------------- -------- - -----

4807             scatter          user01           12:56:34 R batch

...

4.320.1 Monitoring NUMA Job Task Placement

NUMA-aware job task placement is available with Torque Resource Manager 6.0 and later.

When using NUMA, job resources are tracked per task. To support this qstat -f produces a new category of information that begins with the " req_information" keyword. Following each "req_information keyword" is another keyword giving information about how the job was allocated. See4.233 -L NUMA Resource Request for available allocation keywords.

When the job has completed, the output will also include the per task resident memory used and per task cpu time used. The following is a sample qstat -f completed job output.

You will see that req_information.task_usage.0.task.0.cpu_list gives the cores to which the job is bound for the cpuset. The same for mem_list. The keywords memory_used and cput_used report the per task resident memory used and cpu time used respectively.

Job Id: 832.pv-knielson-dt
Job_Name = bigmem.sh
Job_Owner = knielson@pv-knielson-dt
resources_used.cput = 00:00:00
resources_used.energy_used = 0
resources_used.mem = 3628kb
resources_used.vmem = 31688kb
resources_used.walltime = 00:00:00
job_state = C
queue = second
server = pv-knielson-dt
Checkpoint = u
ctime = Tue Jul 28 23:23:15 2015
Error_Path = pv-knielson-dt:/home/knielson/jobs/bigmem.sh.e832
exec_host = pv-knielson-dt/0-3
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Tue Jul 28 23:23:18 2015
Output_Path = pv-knielson-dt:/home/knielson/jobs/bigmem.sh.o832
Priority = 0
qtime = Tue Jul 28 23:23:15 2015
Rerunable = True
Resource_List.walltime = 00:05:00
session_id = 2708
substate = 59
Variable_List = PBS_O_QUEUE=routeme,PBS_O_HOME=/home/knielson,
PBS_O_LOGNAME=knielson,
PBS_O_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/b
in:/usr/games:/usr/local/games,PBS_O_SHELL=/bin/bash,PBS_O_LANG=en_US,
PBS_O_WORKDIR=/home/knielson/jobs,PBS_O_HOST=pv-knielson-dt,
PBS_O_SERVER=pv-knielson-dt
euser = knielson
egroup = company
hashname = 832.pv-knielson-dt
queue_rank = 391
queue_type = E
etime = Tue Jul 28 23:23:15 2015
exit_status = 0
submit_args = -L tasks=2:lprocs=2 ../scripts/bigmem.sh
start_time = Tue Jul 28 23:23:18 2015
start_count = 1
fault_tolerant = False
comp_time = Tue Jul 28 23:23:18 2015
job_radix = 0
total_runtime = 0.093262
submit_host = pv-knielson-dt
req_information.task_count.0 = 2
req_information.lprocs.0 = 2
req_information.thread_usage_policy.0 = allowthreads
req_information.hostlist.0 = pv-knielson-dt:ppn=4
req_information.task_usage.0.task.0.cpu_list = 2,6
req_information.task_usage.0.task.0.mem_list = 0
req_information.task_usage.0.task.0.memory_used = 258048
req_information.task_usage.0.task.0.cput_used = 18
req_information.task_usage.0.task.0.cores = 0
req_information.task_usage.0.task.0.threads = 0
req_information.task_usage.0.task.0.host =
req_information.task_usage.0.task.1.cpu_list = 3,7
req_information.task_usage.0.task.1.mem_list = 0
req_information.task_usage.0.task.1.memory_used = 258048
req_information.task_usage.0.task.1.cput_used = 18
req_information.task_usage.0.task.1.cores = 0
req_information.task_usage.0.task.1.threads = 2
req_information.task_usage.0.task.1.host = pv-knielson-dt

Related Topics 

© 2017 Adaptive Computing