You are here: Using Nitro > Track Job Progress

2.4 Track Job Progress

This topic provides information on viewing job progress and output.

In this topic:

2.4.1 Introduction on How Nitro Tracks the Job

Nitro will print some job information to stdout, such as what workers attached, how many tasks have been run, if any tasks failed, etc.

If your Nitro job is submitted through a scheduler, you may not see any of this until the job has completed and the resource manager has copied the job output to your job's submission directory.

However, Nitro provides a tool called nitrostat to display status information while the job is running. nitrostat is located in the nitro/bin directory where Nitro was installed.

Nitro creates two files that you can use to stay up-to-date on the progress of your job.

Both files are written to the job directory that you provide using the ‑‑job‑dir command line option when submitting your job, or to the default job directory $HOME/nitro/<jobid>.

2.4.2 Job Log

To see job status using nitrostat, you will need the job ID. The job ID is the job ID reported to you when you submitted the job to the schuleder or that you set manually via the ‑‑job‑id command line option in the NITRO_OPTIONS environment variable or via the NITROJOBID environment variable.

2.4.2.A Nitro Job Progress Report

The Nitro job progress reports lets you see the current contents of a job log file.

For example, let's say you have a job that was run by your resource manager as job "23576", running "/opt/nitro/bin/nitrostat 23576"shows you the job's progress.

Nitro Job Progress Report
 
Start Time  : 2016-02-10 09:10:11-0600
Current Time: 2016-02-10 09:10:42-0600
Elapsed Time: 31 seconds (00:00:31)
 
Job Id      : 23576
Coordinator : node01
  Load Pct  : 5.6%
Task Log    : /home/jdoe/jobs/23576/nitro_23576.tasklog.txt
Task File   : /home/jdoe/jobs/survey03.tasks
  File Size : 123366
  Est Tasks : 3016
  Processed : 75%
 
Tasks
------------
Pending     : 500
In Progress : 500
Completed   : 1250
  Success   : 1250
  Failure   : 0
  InsufRes  : 0
  Timeout   : 0
  Invalid   : 0
  Tasks/sec : 40.3
Total Tasks : 2250
 
Workers
-------
Host   Pid    Thrds Status  Assigned Running Completed  Success  Failure  InsufRes Timeout Tasks/sec AsgmtDur
node02 6851     12  running     1250     250      1000     1000        0         0       0      36.0      8.0
node03 14988     4  running      500     250       250      250        0         0       0       9.3     27.0

The following describes the fields and their output descriptions.

2.4.2.B Job Completed Report

Once the job has completed, the job report will show "(final)" on the end of the first line of the report and Current Time is replaced with Finish Time (after Start Time). The following example is based on the previous example for job "23576" .

Nitro Job Progress Report (final)
 
Start Time  : 2016-02-10 09:10:11-0600
Finish Time : 2016-02-10 09:11:36-0600
Elapsed Time: 85 seconds (00:01:25)
 
Job Id      : 23576
Task Log    : /home/jdoe/jobs/23576/nitro_23576.tasklog.txt
Task File   : /home/jdoe/jobs/survey03.tasks
 
Tasks
------------
Pending     : 0
Running     : 0
Completed   : 3000
  Success   : 3000
  Failure   : 0
  InsufRes  : 0
  Timeout   : 0
  Invalid   : 0
  Tasks/sec : 35.3
Total Tasks : 3000
 
Coordinator
-----------
Host    : node01
Threads : 8
 
Worker Resources
----------------
Workers : 2
Threads : 16
 
Workers
-------
Host   Pid    Thrds Status Assigned Running Completed  Success  Failure  InsufRes Timeout Tasks/sec AsgmtDur
node02 6851     12  closed     2250       0      2250     2250        0         0       0      29.2      8.3
node03 14988     4  closed      750       0       750      750        0         0       0       8.8     35.7

2.4.3 Task Log

The task log file contains a listing of all tasks that have been completed and some statistics about the tasks duration and memory consumption. This file is named nitro_<JobID>.tasklog.txt and is located in the same directory as the job log file.

The task log file is tab-delimited, so you can easily import it into a spreadsheet or database, or process it using another program. You can also view the task log using the nitrostat utility.

JobID  TaskID  Line Name        Status  ExitCode Hostname          StartTime                     Duration UserCPU SystemCPU VirtualMem PhysicalMem Labels                 Output
foo    1       1    task001     Success 0        localhost:10004   2015-06-18_15:26:52.954-0600     1.005   0.000     0.000    7364608      630784 foo,foobar,foobaz,xyz
foo    2       2    task002     Success 0        localhost:10004   2015-06-18_15:26:52.954-0600	    1.007   0.000     0.000   87834368      630784 foo,foobar,xyz
foo    3       3    task003     Success 0        localhost:10004   2015-06-18_15:26:52.954-0600	    1.005   0.000     0.000   71728640      901120 foo,xyz
foo    4       4    task004     Success 0        localhost:10004   2015-06-18_15:26:52.955-0600     1.005   0.000     0.000   38837504      630784 foo,foobar,foobaz,abc
foo    5       5    task005     Success 0        localhost:10004   2015-06-18_15:26:53.960-0600	    1.004   0.000     0.000  405946368      630784 foo,foobar,abc
foo    6       6    task006     Success 0        localhost:10004   2015-06-18_15:26:53.961-0600     1.005   0.000     0.000  405946368      946176 foo,abc
foo    7       7    task007     Success 0        localhost:10004   2015-06-18_15:26:53.961-0600	    1.003   0.000     0.000  405946368      630784
foo    8       8    task008     Success 0        localhost:10004   2015-06-18_15:26:53.966-0600	    1.003   0.000     0.000  405946368      700416
foo    9       9    task009     Success 0        localhost:10004   2015-06-18_15:26:54.965-0600	    1.005   0.000     0.000  405946368      630784
foo    10      10   task010     Success 0        localhost:10004   2015-06-18_15:26:54.965-0600	    1.003   0.000     0.000  405946368      630784
foo    11      11   task011     Success 0        localhost:10004   2015-06-18_15:26:55.973-0600	    1.005   0.000     0.000    7364608      630784
foo    12      12               Success 0        localhost:10004   2015-06-18_15:26:55.973-0600	    1.004   0.000     0.000  405946368      626688
foo    13      14   fail        Failure 1        localhost:10004   2015-06-18_15:26:55.973-0600	    0.005   0.000     0.000       8192        4096
foo    14      16   stderr      Success 0        localhost:10004   2015-06-18_15:26:55.974-0600	    0.005   0.000     0.000  405946368      536576
foo    15      18   stderr_fail Failure 1        localhost:10004   2015-06-18_15:26:55.979-0600	    0.005   0.000     0.000  405946368     1228800                         ERROR MESSAGE 
foo    16      20   overtime    Timeout -9       localhost:10004   2015-06-18_15:26:55.980-0600	    2.006   0.000     0.000  405946368      970752                         maxtime exceeded, process was killed
foo    17      21               Success 0        localhost:10004   2015-06-18_15:26:55.985-0600	    1.002   0.000     0.000  405946368      626688
foo    19      23               Success 0        localhost:10004   2015-06-18_15:26:56.979-0600	    1.007   0.000     0.000  405946368      970752
foo    20      24               Success 0        localhost:10004   2015-06-18_15:26:56.988-0600	    1.003   0.000     0.000  405946368      724992
foo    21      25               Success 0        localhost:10004   2015-06-18_15:26:57.986-0600	    1.005   0.000     0.000  405946368      724992
foo    22      26               Success 0        localhost:10004   2015-06-18_15:26:57.988-0600	    1.005   0.000     0.000  405946368      970752
foo    23      27               Success 0        localhost:10004   2015-06-18_15:26:57.988-0600	    1.005   0.000     0.000  405946368      630784
foo    24      28               Success 0        localhost:10004   2015-06-18_15:26:57.995-0600	    1.005   0.000     0.000  405946368      630784
foo    25      29               Success 0        localhost:10004   2015-06-18_15:26:58.993-0600	    1.005   0.000     0.000  405946368      974848
foo    26      30               Success 0        localhost:10004   2015-06-18_15:26:58.994-0600	    1.004   0.000     0.000  405946368      626688

The task log contains the following fields.

Related Topics 

© 2016 Adaptive Computing