ppwrun - Parallel Performance Wizard v3.2 User Manual

Next: par2cube, Previous: ppwupcc, Up: Command Reference

B.10 ppwrun

ppwrun is a program that allows you to easily control PPW's runtime performance data recording options, which are otherwise manually set via environment variables. To use ppwrun, prefix your normal program invocation command line with the ppwrun command with any of the options listed below, and the appropriate environment variables will be set.

For example, if you would like to gather profile information and PAPI hardware counter information about your UPC program a.out, and you normally execute that program using upcrun, you might do this instead:

     $ ppwrun --output=aoutprof.par --profile \
              --papi-metrics=PAPI_TOT_CYC \
              upcrun -n 128 ./a.out

Alternatively, if you'd like to collect trace data for a sequential program a.out, you might do this:

     $ ppwrun --output=aouttrace.par --trace \
              ./a.out

The slashes in the example commands above are used to break each example shell command across multiple lines and not actually part of the command itself.

B.10.1 Invoking ppwrun

To invoke ppwrun, use the following syntax:

     
     ppwrun [--help]
            [--output=file]
            [--disable|--trace]
            [--trace-handling=MODE]
            [--disable-throttling]
            [--throttling-count=count]
            [--throttling-duration=duration]
            [--selective-file=file]
            [--comm-stats|--line-comm-stats]
            [--bash|--tcsh]
            upcrun...|a.out...

B.10.2 ppwrun Command Options

ppwrun accepts the following options:

--output=file.par

Output performance data file to file.par.

--trace

Collect trace data for your application. Note that using this option with long-running programs or fine-grained instrumentation may result in very large trace data files.

--trace-buffer=N

Set the trace buffer size to N bytes. Most users shouldn't need to change the default buffer size, but set this to a larger size if you have a particularly slow I/O system on each compute node. In some instances, setting this option to a large value may result in a significant decrease of overhead when collecting trace data.

--comm-stats

Enable collecting communication stats at runtime. This enables you to use the data transfer visualization of ppw(1), but uses up a lot of memory at runtime, on the order of the number of threads/ranks1 squared. Not recommended for runs of size 256 or greater unless your application can spare a lot of extra memory.

--line-comm-stats

Enable collecting detailed, per-line communication statistics. This option implies --comm-stats and uses up even more memory at runtime.

--disable

Disable all data collection. Note that any instrumentation code that has been added to the executable may still decrease your application's performance. To get an accurate baseline of your program's performance, recompile your application normally or give the --noinst option to ppwcc(1) or ppwupcc(1).

--trace-handling=MODE

Set the trace collection mode to MODE. Possible values for MODE include centralized (default), distributed and reduced. Any of these modes will work on any cluster, This option can be used only to optimize the final data collection phase.

In centralized mode, all threads process their trace data in parallel, then master will collects trace data from each thread and writes it to a file. Suited for distributed shared-memory clusters.

In distributed mode, all threads process their trace data in parallel, then each node will write its trace data to the par file. The master node will assist in synchronization between different nodes. Suited for multi-core shared-memory machines.

In reduced mode, all threads process and write their trace data in a sequential manner. Master will assist in synchronization between threads. This mode should be used with clusters with slow IO. The amount of disk IO is minimum in this mode.

--disable-throttling

Disables throttling, which is enabled by default.

THROTTLING:

When throttling is not disabled (This option is not used); PPW determines high frequency, short duration user level events and stops measuring them once it crosses couple of throttling thresholds. An event is eligible for throttling if it is invoked more than throttling-count (can be set by –throttling-count) times and the execution time for that event is less than throttling-duration (can be set by –throttling-duration)

--throttling-count=count

--throttling-duration=duration

Using this option the user can set thresholds for throttling. duration is specified in microseconds. The default values are count=10000 and duration=100. For more details see THROTTLING under –diable-throttling.

--selective-file=file

[Currently applicable only to UPC] Provide a selective measurement file that contains a list of excluded and/or included events. The specified events overrides throttling. See manual for file format, usage and behavior.

--bash

Instead of running anything, write out commands in bash(1)-compatible syntax to stdout that correspond to the data recording options given. Most users will not need this option unless their parallel job spawner does not propagate environment variables properly.

--csh

Similar to the --bash command, except write commands in csh(1)-compatible syntax that can be used with csh(1) or tcsh(1) shells.

--help

Show the help screen.

ppwrun will also accept each command with a single dash instead of two, so you can type

     $ ppwrun -trace ...

instead of

     $ ppwrun --trace ...

B.10.3 ppwrun Notes

If your parallel job spawner does not propagate environment variables for you, then you may experience problems with ppwrun. Symptoms of this problem will be apparent because you will not be able to collect trace data for your applications and any option you give to ppwrun will seem to be ignored.

If this is the case, then you'll need to include the shell commands printed by the --bash or --csh options into your shell's profile file. This file is usually .bash_profile or .cshrc; consult your shell's documentation or your local sysadmin guru for more information.

For UPC programs, PPW does not currently support noncollective UPC exits, such as an exit on one thread that causes a SIGKILL signal to be sent to other threads. As an example, consider the following UPC program:

     ...
     int main() {
       if (MYTHREAD) {
         upc_barrier;
       } else {
         exit(0);
       }
       return 0;
     }

In this program, depending on the UPC compiler and runtime system used, PPW may not write out valid performance data for all threads. A future version of PPW may add “dump” functionality where complete profile data is flushed to disk every N minutes, which will allow you to collect partial performance data from a long-running program that happens to crash a few minutes before it is completed. However, for technical reasons PPW will generally not be able to recover from situations like these, so please do try to debug any crashes in your program before analyzing it with PPW.

When you run your application, you may run into error messages like the following one:

     PPW warning: no source information available

PPW stores a snapshot of your application's source code in a file archive with the extension .ppw.sar. If you move your program's executable and do not move this file to the same directory, you will get this error message whenever you run your program. To fix this problem, keep a copy of the .ppw.sar file in the same directory as your compiled program.

If you'd like to test which recording options are dictated by your current environment variable settings, use the ppw-showopts command. As an example (but keep in mind output will vary from machine to machine) using csh(1)-compatible shell syntax:

     % ppwrun -trace -output=foo.par -csh
     setenv PPW_TRACEMODE 1
     setenv PPW_OUTPUT foo.par
     % setenv PPW_TRACEMODE 1
     % setenv PPW_OUTPUT foo.par
     % ppw-showopts
     Current PPW configuration options (in directory /storage/home/leko):
     
       + Disabled? 0
       + Communication stats? 0
       + Communication stats per line? 0
       + Tracing? 1
       + Trace buffer size? 16384
       + Output? foo.par
       + PAPI metrics? (none)

And the same example using bash(1)-compatible syntax:

     $ ppwrun -trace -output=foo.par -bash
     export PPW_TRACEMODE=1
     export PPW_OUTPUT=foo.par
     $ export PPW_TRACEMODE=1
     $ export PPW_OUTPUT=foo.par
     $ ppw-showopts
     Current PPW configuration options (in directory /storage/home/leko):
     
       + Disabled? 0
       + Communication stats? 0
       + Communication stats per line? 0
       + Tracing? 1
       + Trace buffer size? 16384
       + Output? foo.par
       + PAPI metrics? (none)

B.10.4 ppwrun Environment Variables

To see which environment variables are set by ppwrun, use the --csh and --bash options.