Manual Reference Pages  - UPC_TRACE (1)

NAME

upc_trace - the UPC/GASNet trace summarization tool, version 2.24.2

CONTENTS

Synopsis
Description
Options

SYNOPSIS

upc_trace [options] trace-file(s)

DESCRIPTION

UPC trace file summarization script, v2.0 (GASNet v1.29.1)
trace-file(s) may include any mix of UPC trace files and local memory reports

OPTIONS

-h -? -help See this message.
-o [filename]
  Output results to file. Default is STDOUT.
-report [r1][r2]..
  Indicate which reports to generate: PUT, GET, BARRIER, MEMORY, and/or TI_ARRAY_COPY. Default: all reports.
-sort [f1],[f2]...
  Sort output by one or more fields: TOTAL, AVG, MIN, MAX, CALLS, TYPE, or SRC. (for GET/PUT/MEMORY, TOTAL, AVG, MIN, and MAX refer to size in bytes: for BARRIERS, to time spent in barrier). Default: sort by SRC
-filter [t1],[t2].. Filter out output by one or more types:
  LOCAL, GLOBAL, WAIT, WAITNOTIFY.
-p -[no]peer
  Output per-peer break down for PUT and GET.
-t -[no]thread
  Output detailed information for each thread.
-i -[no]internal
  Show internal events (such as the initial and final barriers) which do not correspond to user source code.
-f -[no]full
  Show the full source file name.
-d Enable debugging output for the parsing script.

GENERAL INFO

As of version 2.0, Berkeley UPC includes ’upc_trace’, a tool for analyzing the communication behavior of UPC programs. When run on the output of a trace-enabled Berkeley UPC program, ’upc_trace’ provides information on which lines of code in your UPC program generated network traffic: how many messages the line caused, what type (local and/or remote gets/puts), what the maximum/minimum/average/combined sizes of the messages were.

How to use ’upc_trace’:

o Tracing must be enabled in order to work. By default, tracing is enabled for debug compilations (i.e. if ’upcc -g’ is used), but not otherwise (as it incurs some overhead). If you wish to also trace non-debug executables, you must rebuild your UPC system and pass ’--with-multiconf=+opt_trace’ to configure.

o You must run your application with ’upcrun -trace ...’ or ’upcrun -tracefile TRACE_FILE_NAME ...’. Either of these flags causes your UPC executable to dump out tracing information while it executes. The ’-trace’ flag causes one file per UPC thread to be generated, with the name the UPC thread’s number. The ’-tracefile NAME’ option lets you specify your own name for the tracing file(s): if the name contains a ’%’ character, one trace file per thread is generated, with the ’%’ replaced with the UPC thread’s number. Otherwise, all threads will write to the same file.

Note that running with tracing may slow down your application considerably: the exact amount depends on your filesystem, and the ratio of communication/computation in your program. If you are only interested in a subset of trace information, consider setting GASNET_TRACEMASK and/or GASNET_TRACELOCAL as described in the Berkeley UPC User’s Guide.

o After your application has completed, you may run ’upc_trace’ on one or more of the trace files generated by your program run:

Running ’upc_trace’ on a trace file generated by a single UPC thread shows the information only for that thread. If you pass multiple files from the same application run, the information for the various threads is coalesced, so passing in all the tracefiles generated by a run allows you to see information for the entire application.

There are a number of flags to ’upc_trace’ which control what kinds of information is reported, and how it is sorted. See ’upc_trace --help’ for details.

Note that upc_trace may take a while to run, especially on large tracefiles. We plan to optimize its performance in the future.

SAMPLE OUTPUT

Here is example output from upc_trace for a 4-thread, 2-node test program:

$ upc_trace -t upc_trace-*



Parsing thread info for upc_trace-testtrace-4-14739-0..



Parsing tracefile for upc_trace-testtrace-4-14739-0.. done



Parsing thread info for upc_trace-testtrace-4-14739-1..



Parsing tracefile for upc_trace-testtrace-4-14739-1.. done



Generating reports..

GET REPORT:



SOURCE         LINE  TYPE          MSG:(min    max     avg     total)   CALLS 
=============================================================================
testtrace.upc      9     GLOBAL        4 B       4 B       4 B       8 B    2
    Thread 0                           4 B       4 B       4 B       4 B    1
    Thread 2                           4 B       4 B       4 B       4 B    1
testtrace.upc      9      LOCAL        4 B       4 B       4 B       8 B    2
    Thread 1                           4 B       4 B       4 B       4 B    1
    Thread 3                           4 B       4 B       4 B       4 B    1
testtrace.upc     18     GLOBAL      100 B     100 B     100 B     200 B    2
    Thread 1                         100 B     100 B     100 B     100 B    1
    Thread 3                         100 B     100 B     100 B     100 B    1
testtrace.upc     18      LOCAL      100 B     100 B     100 B     200 B    2
    Thread 0                         100 B     100 B     100 B     100 B    1
    Thread 2                         100 B     100 B     100 B     100 B    1
testtrace.upc     20     GLOBAL      100 B     100 B     100 B     200 B    2
    Thread 0                         100 B     100 B     100 B     100 B    1
    Thread 2                         100 B     100 B     100 B     100 B    1

PUT REPORT:



SOURCE         LINE  TYPE          MSG:(min    max     avg     total)   CALLS 
=============================================================================   
testtrace.upc      7     GLOBAL        4 B       4 B       4 B       8 B    2
    Thread 1                           4 B       4 B       4 B       4 B    1
    Thread 3                           4 B       4 B       4 B       4 B    1
testtrace.upc      7      LOCAL        4 B       4 B       4 B       8 B    2
    Thread 0                           4 B       4 B       4 B       4 B    1
    Thread 2                           4 B       4 B       4 B       4 B    1
testtrace.upc     13     GLOBAL        4 B       4 B       4 B       8 B    2
    Thread 1                           4 B       4 B       4 B       4 B    1
    Thread 3                           4 B       4 B       4 B       4 B    1
testtrace.upc     13      LOCAL        4 B       4 B       4 B       8 B    2
    Thread 0                           4 B       4 B       4 B       4 B    1
    Thread 2                           4 B       4 B       4 B       4 B    1
testtrace.upc     15     GLOBAL        4 B       4 B       4 B       8 B    2
    Thread 1                           4 B       4 B       4 B       4 B    1
    Thread 3                           4 B       4 B       4 B       4 B    1
testtrace.upc     15      LOCAL        4 B       4 B       4 B       8 B    2
    Thread 0                           4 B       4 B       4 B       4 B    1
    Thread 2                           4 B       4 B       4 B       4 B    1
testtrace.upc     19     GLOBAL      100 B     100 B     100 B     200 B    2
    Thread 1                         100 B     100 B     100 B     100 B    1
    Thread 3                         100 B     100 B     100 B     100 B    1
testtrace.upc     19      LOCAL      100 B     100 B     100 B     200 B    2
    Thread 0                         100 B     100 B     100 B     100 B    1
    Thread 2                         100 B     100 B     100 B     100 B    1
testtrace.upc     20     GLOBAL      100 B     100 B     100 B     200 B    2
    Thread 1                         100 B     100 B     100 B     100 B    1
    Thread 3                         100 B     100 B     100 B     100 B    1

BARRIER REPORT:



SOURCE         LINE  TYPE          MSG:(min    max     avg     total)   CALLS 
=============================================================================   
testtrace.upc      8       WAIT   151.0 us  165.0 us  158.0 us  632.0 us    4
    Thread 0..1                   165.0 us  165.0 us  165.0 us  165.0 us    1
    Thread 2..3                   151.0 us  151.0 us  151.0 us  151.0 us    1
testtrace.upc      8 NOTIFYWAIT    43.0 us   95.0 us   69.0 us  276.0 us    4
    Thread 0..1                    95.0 us   95.0 us   95.0 us   95.0 us    1
    Thread 2..3                    43.0 us   43.0 us   43.0 us   43.0 us    1
testtrace.upc     11       WAIT   241.0 us  330.0 us  285.5 us    1.1 ms    4
    Thread 0..1                   241.0 us  241.0 us  241.0 us  241.0 us    1
    Thread 2..3                   330.0 us  330.0 us  330.0 us  330.0 us    1
testtrace.upc     11 NOTIFYWAIT    25.0 us   27.0 us   26.0 us  104.0 us    4
    Thread 0..1                    25.0 us   25.0 us   25.0 us   25.0 us    1
    Thread 2..3                    27.0 us   27.0 us   27.0 us   27.0 us    1
testtrace.upc     12       WAIT   142.0 us  164.0 us  153.0 us  612.0 us    4
    Thread 0..1                   164.0 us  164.0 us  164.0 us  164.0 us    1
    Thread 2..3                   142.0 us  142.0 us  142.0 us  142.0 us    1
testtrace.upc     12 NOTIFYWAIT    34.0 us   44.0 us   39.0 us  156.0 us    4
    Thread 0..1                    34.0 us   34.0 us   34.0 us   34.0 us    1
    Thread 2..3                    44.0 us   44.0 us   44.0 us   44.0 us    1
testtrace.upc     23       WAIT   167.0 us  368.0 us  267.5 us    1.1 ms    4
    Thread 0..1                   368.0 us  368.0 us  368.0 us  368.0 us    1
    Thread 2..3                   167.0 us  167.0 us  167.0 us  167.0 us    1
testtrace.upc     23 NOTIFYWAIT    30.0 us   56.0 us   43.0 us  172.0 us    4
    Thread 0..1                    56.0 us   56.0 us   56.0 us   56.0 us    1
    Thread 2..3                    30.0 us   30.0 us   30.0 us   30.0 us    1
testtrace.upc     29       WAIT    80.0 us  424.0 us  252.0 us    1.0 ms    4
    Thread 0..1                    80.0 us   80.0 us   80.0 us   80.0 us    1
    Thread 2..3                   424.0 us  424.0 us  424.0 us  424.0 us    1
testtrace.upc     29 NOTIFYWAIT    18.0 us   32.0 us   25.0 us  100.0 us    4
    Thread 0..1                    18.0 us   18.0 us   18.0 us   18.0 us    1
    Thread 2..3                    32.0 us   32.0 us   32.0 us   32.0 us    1

Puts and gets (accesses via pointer-to-shared) are each reported based on the source line that performed the access with a count and message size statistics. The type (LOCAL vs GLOBAL) indicates whether the access was performed locally using shared memory or using network communication.

The barrier report lists each barrier executed by the program run, grouped by source line number with a count and timing statistics. Each barrier operation has two corresponding entries - NOTIFYWAIT indicates the time interval between the upc_notify and corresponding upc_wait operation for the barrier (will be very small in the case of upc_barrier), and WAIT indicates the time interval spent blocking at the upc_wait operation awaiting barrier completion. High WAIT times generally indicate load imbalance, which could possibly be resolved by separating the upc_notify and upc_wait operations to increase the NOTIFYWAIT time and thereby overlap some of the barrier time with useful computation.

REPORTING BUGS

We are very interested in fixing any bugs in upc_trace. For bug reporting instructions, please go to http://upc.lbl.gov.

SEE ALSO

upcc(1), upcrun(1)

The Berkeley UPC User’s Guide (available at http://upc.lbl.gov)


Berkeley UPC UPC_TRACE (1) March 2017
Generated by manServer 1.07 from upc_trace.1 using man macros.