(Click to open topic with navigation)
This topic provides information on viewing job progress and output.
In this topic:
2.4.1 Introduction on How Nitro Tracks the Job
Nitro will print some job information to stdout, such as what workers attached, how many tasks have been run, if any tasks failed, etc.
If your Nitro job is submitted through a scheduler, you may not see any of this until the job has completed and the resource manager has copied the job output to your job's submission directory.
However, Nitro provides a tool called nitrostat to display status information while the job is running. nitrostat is located in the nitro/bin directory where Nitro was installed.
Nitro creates two files that you can use to stay up-to-date on the progress of your job.
Both files are written to the job directory that you provide using the ‑‑job‑dir command line option when submitting your job, or to the default job directory $HOME/nitro/<jobid>.
To see job status using nitrostat, you will need the job ID. The job ID is the job ID reported to you when you submitted the job to the schuleder or that you set manually via the ‑‑job‑id command line option in the NITRO_OPTIONS environment variable or via the NITROJOBID environment variable.
2.4.2.A Nitro Job Progress Report
The Nitro job progress reports lets you see the current contents of a job log file.
For example, let's say you have a job that was run by your resource manager as job "23576", running "/opt/nitro/bin/nitrostat 23576"shows you the job's progress.
Nitro Job Progress Report
Start Time : 2016-02-10 09:10:11-0600
Current Time: 2016-02-10 09:10:42-0600
Elapsed Time: 31 seconds (00:00:31)
Job Id : 23576
Coordinator : node01
Load Pct : 5.6%
Task Log : /home/jdoe/jobs/23576/nitro_23576.tasklog.txt
Task File : /home/jdoe/jobs/survey03.tasks
File Size : 123366
Est Tasks : 3016
Processed : 75%
Tasks
------------
Pending : 500
In Progress : 500
Completed : 1250
Success : 1250
Failure : 0
InsufRes : 0
Timeout : 0
Invalid : 0
Tasks/sec : 40.3
Total Tasks : 2250
Workers
-------
Host Pid Thrds Status Assigned Running Completed Success Failure InsufRes Timeout Tasks/sec AsgmtDur
node02 6851 12 running 1250 250 1000 1000 0 0 0 36.0 8.0
node03 14988 4 running 500 250 250 250 0 0 0 9.3 27.0
The following describes the fields and their output descriptions.
Load Pct – Percentage of coordinator load capacity.
2.4.2.B Job Completed Report
Once the job has completed, the job report will show "(final)" on the end of the first line of the report and Current Time is replaced with Finish Time (after Start Time). The following example is based on the previous example for job "23576" .
Nitro Job Progress Report (final)
Start Time : 2016-02-10 09:10:11-0600
Finish Time : 2016-02-10 09:11:36-0600
Elapsed Time: 85 seconds (00:01:25)
Job Id : 23576
Task Log : /home/jdoe/jobs/23576/nitro_23576.tasklog.txt
Task File : /home/jdoe/jobs/survey03.tasks
Tasks
------------
Pending : 0
Running : 0
Completed : 3000
Success : 3000
Failure : 0
InsufRes : 0
Timeout : 0
Invalid : 0
Tasks/sec : 35.3
Total Tasks : 3000
Coordinator
-----------
Host : node01
Threads : 8
Worker Resources
----------------
Workers : 2
Threads : 16
Workers
-------
Host Pid Thrds Status Assigned Running Completed Success Failure InsufRes Timeout Tasks/sec AsgmtDur
node02 6851 12 closed 2250 0 2250 2250 0 0 0 29.2 8.3
node03 14988 4 closed 750 0 750 750 0 0 0 8.8 35.7
The task log file contains a listing of all tasks that have been completed and some statistics about the tasks duration and memory consumption. This file is named nitro_<JobID>.tasklog.txt and is located in the same directory as the job log file.
The task log file is tab-delimited, so you can easily import it into a spreadsheet or database, or process it using another program. You can also view the task log using the nitrostat utility.
JobID TaskID Line Name Status ExitCode Hostname StartTime Duration UserCPU SystemCPU VirtualMem PhysicalMem Labels Output
foo 1 1 task001 Success 0 localhost:10004 2015-06-18_15:26:52.954-0600 1.005 0.000 0.000 7364608 630784 foo,foobar,foobaz,xyz
foo 2 2 task002 Success 0 localhost:10004 2015-06-18_15:26:52.954-0600 1.007 0.000 0.000 87834368 630784 foo,foobar,xyz
foo 3 3 task003 Success 0 localhost:10004 2015-06-18_15:26:52.954-0600 1.005 0.000 0.000 71728640 901120 foo,xyz
foo 4 4 task004 Success 0 localhost:10004 2015-06-18_15:26:52.955-0600 1.005 0.000 0.000 38837504 630784 foo,foobar,foobaz,abc
foo 5 5 task005 Success 0 localhost:10004 2015-06-18_15:26:53.960-0600 1.004 0.000 0.000 405946368 630784 foo,foobar,abc
foo 6 6 task006 Success 0 localhost:10004 2015-06-18_15:26:53.961-0600 1.005 0.000 0.000 405946368 946176 foo,abc
foo 7 7 task007 Success 0 localhost:10004 2015-06-18_15:26:53.961-0600 1.003 0.000 0.000 405946368 630784
foo 8 8 task008 Success 0 localhost:10004 2015-06-18_15:26:53.966-0600 1.003 0.000 0.000 405946368 700416
foo 9 9 task009 Success 0 localhost:10004 2015-06-18_15:26:54.965-0600 1.005 0.000 0.000 405946368 630784
foo 10 10 task010 Success 0 localhost:10004 2015-06-18_15:26:54.965-0600 1.003 0.000 0.000 405946368 630784
foo 11 11 task011 Success 0 localhost:10004 2015-06-18_15:26:55.973-0600 1.005 0.000 0.000 7364608 630784
foo 12 12 Success 0 localhost:10004 2015-06-18_15:26:55.973-0600 1.004 0.000 0.000 405946368 626688
foo 13 14 fail Failure 1 localhost:10004 2015-06-18_15:26:55.973-0600 0.005 0.000 0.000 8192 4096
foo 14 16 stderr Success 0 localhost:10004 2015-06-18_15:26:55.974-0600 0.005 0.000 0.000 405946368 536576
foo 15 18 stderr_fail Failure 1 localhost:10004 2015-06-18_15:26:55.979-0600 0.005 0.000 0.000 405946368 1228800 ERROR MESSAGE
foo 16 20 overtime Timeout -9 localhost:10004 2015-06-18_15:26:55.980-0600 2.006 0.000 0.000 405946368 970752 maxtime exceeded, process was killed
foo 17 21 Success 0 localhost:10004 2015-06-18_15:26:55.985-0600 1.002 0.000 0.000 405946368 626688
foo 19 23 Success 0 localhost:10004 2015-06-18_15:26:56.979-0600 1.007 0.000 0.000 405946368 970752
foo 20 24 Success 0 localhost:10004 2015-06-18_15:26:56.988-0600 1.003 0.000 0.000 405946368 724992
foo 21 25 Success 0 localhost:10004 2015-06-18_15:26:57.986-0600 1.005 0.000 0.000 405946368 724992
foo 22 26 Success 0 localhost:10004 2015-06-18_15:26:57.988-0600 1.005 0.000 0.000 405946368 970752
foo 23 27 Success 0 localhost:10004 2015-06-18_15:26:57.988-0600 1.005 0.000 0.000 405946368 630784
foo 24 28 Success 0 localhost:10004 2015-06-18_15:26:57.995-0600 1.005 0.000 0.000 405946368 630784
foo 25 29 Success 0 localhost:10004 2015-06-18_15:26:58.993-0600 1.005 0.000 0.000 405946368 974848
foo 26 30 Success 0 localhost:10004 2015-06-18_15:26:58.994-0600 1.004 0.000 0.000 405946368 626688
The task log contains the following fields.
The operating system may allocate shared memory and may charge a proportion of this shared memory to random tasks.
Related Topics