7.3 Streaming Job Output

The job output page allows a user to view the standard output and standard error files for a job as the job is running and after the job completes. The page is accessible via the get-output button in the job management page.

7.3.1 Configuring Streaming Job Output

Streaming job output must be configured in order for the get-output button to work. The <job-management> element has three optional child elements to configure the job output feature:

By default, streaming job output is disabled. Configuring the <spool-dir> element enables streaming job output. Viewpoint assumes the job output files are contained in the specified directory both while the job is running and after it is completed. This requires configuration of the file system where the spool directory is located and the resource manager that runs the jobs. Specifically, the resource manager has to be configured to place job output files in your specified directory and keep them there after the job completes. Viewpoint and the resource manager require direct access to this directory. If Viewpoint and any of the MOM daemons are running on different machines, which is the typical scenario, then the spool directory must be configured as a shared network directory to which all MOMs and Viewpoint share access.

These <stdout-format> and <stderr-format> elements allow for string interpolation in the same way as the <title> element for job management. By default, these parameters are configured to use the format TORQUE uses when $spool_as_final_name is configured and a directory is specified via the -o and -e flags for qsub. The format is $name.o$id for stdout and $name.e$id for stderr.

An example configuration for streaming output is given below:

<config>
  ...
  <job-management>
    <spool-dir>/var/spool/torque/spool</spool-dir>
    <stdout-format>$name.o$id</stdout-format>
    <stderr-format>$name.e$id</stderr-format>
    ...
  </job-management>
</config>

7.3.2 Configuring TORQUE and Moab for Streaming Job Output

The only resource manager for which streaming output is supported is TORQUE 2.4 and later. All pbs_mom daemons in TORQUE must be configured with the $spool_as_final_name parameter set to true. This causes the MOM daemons to not move the job output files to another location after the jobs finish running. In addition, submit filters must be used to force all job output to go to the directory configured in Viewpoint. A filter must be supplied for both msub and qsub if both are used by users. An example Moab submit filter for the msub command that forces all job output to go to /var/spool/torque/spool is given below. This script relies on Ruby, RubyGems, the libxml-ruby gem, and the availability of native libxml2 libraries that libxml-ruby needs.

#!/usr/bin/env ruby
require 'rubygems'
require 'libxml'
include LibXML::XML
SpoolDir = '/var/spool/torque/spool/'
ScriptStartRe = %r{^\s*\START}
ScriptDirectiveRe = %r{#{ScriptStartRe}\23!.+?\0a}  
job = Parser.io($stdin).parse
e = job.find_first('/job/SubmitString')
output_directives = "#PBS -o #{SpoolDir}
#PBS -e #{SpoolDir}
"
{ " " => "\\20", "
" => "\\0a", "#" => "\\23" }.each do |a, b| 
  output_directives.gsub!(a, b)
end  
if e.content =~ ScriptDirectiveRe || e.content =~ ScriptStartRe
  e.content.insert($~.size, output_directives)
else
  raise "Failed to recognize start of job submit string"
end
puts job

Finally, job output files must be created with file permissions such that the OS user Viewpoint runs under can read the files. Since Viewpoint is run as a non-root user on most systems, all MOM daemons must be configured to produce job output files that are world-readable, which can be done via the $job_output_file_umask parameter. Specifically, the following line causes job output files to be created with read privileges for all users:

$job_output_file_umask 022

Once (1) TORQUE MOM daemons are configured to use the same location for job output while the job is running and after it is completed, (2) submit filters force all job output to a single directory, (3) the job output umask is configured so that the Viewpoint user can read the job output files, and (4) Viewpoint is properly configured to look inside this directory for job output, the job output Viewpoint feature and the "get-output" job control should function properly.