upc_trace - the UPC/GASNet trace summarization tool, version 2022.10.0
Synopsis
Description
Options
upc_trace [options] trace-file(s)
UPC trace file summarization script, v2.0 (GASNet v2022.9.0)
trace-file(s) may include any mix of UPC trace files and local memory reports
-h -? -help See this message. -o [filename] Output results to file. Default is STDOUT. -report [r1][r2].. Indicate which reports to generate: PUT, GET, BARRIER, MEMORY, and/or TI_ARRAY_COPY. Default: all reports. -sort [f1],[f2]... Sort output by one or more fields: TOTAL, AVG, MIN, MAX, CALLS, TYPE, or SRC. (for GET/PUT/MEMORY, TOTAL, AVG, MIN, and MAX refer to size in bytes: for BARRIERS, to time spent in barrier). Default: sort by SRC -filter [t1],[t2].. Filter out output by one or more types: LOCAL, GLOBAL, WAIT, WAITNOTIFY. -p -[no]peer Output per-peer break down for PUT and GET. -t -[no]thread Output detailed information for each thread. -i -[no]internal Show internal events (such as the initial and final barriers) which do not correspond to user source code. -f -[no]full Show the full source file name. -d Enable debugging output for the parsing script.
As of version 2.0, Berkeley UPC includes upc_trace, a tool for analyzing the communication behavior of UPC programs. When run on the output of a trace-enabled Berkeley UPC program, upc_trace provides information on which lines of code in your UPC program generated network traffic: how many messages the line caused, what type (local and/or remote gets/puts), what the maximum/minimum/average/combined sizes of the messages were.
How to use upc_trace:
o Tracing must be enabled in order to work. By default, tracing is enabled for debug compilations (i.e. if upcc -g is used), but not otherwise (as it incurs some overhead). If you wish to also trace non-debug executables, you must rebuild your UPC system and pass --with-multiconf=+opt_trace to configure.
o You must run your application with upcrun -trace ... or upcrun -tracefile TRACE_FILE_NAME .... Either of these flags causes your UPC executable to dump out tracing information while it executes. The -trace flag causes one file per UPC thread to be generated, with the name upc_trace-a.out..-N, where a.out is the name of your executable, and N is the UPC threads number. The -tracefile NAME option lets you specify your own name for the tracing file(s): if the name contains a % character, one trace file per thread is generated, with the % replaced with the UPC threads number. Otherwise, all threads will write to the same file.
Note that running with tracing may slow down your application considerably: the exact amount depends on your filesystem, and the ratio of communication/computation in your program. If you are only interested in a subset of trace information, consider setting GASNET_TRACEMASK and/or GASNET_TRACELOCAL as described in the Berkeley UPC Users Guide.
o After your application has completed, you may run upc_trace on one or more of the trace files generated by your program run:
Running upc_trace on a trace file generated by a single UPC thread shows the information only for that thread. If you pass multiple files from the same application run, the information for the various threads is coalesced, so passing in all the tracefiles generated by a run allows you to see information for the entire application.
There are a number of flags to upc_trace which control what kinds of information is reported, and how it is sorted. See upc_trace --help for details.
Note that upc_trace may take a while to run, especially on large tracefiles. We plan to optimize its performance in the future.
Here is example output from upc_trace for a 4-thread, 2-node test program:
$ upc_trace -t upc_trace-*
Parsing thread info for upc_trace-testtrace-4-14739-0.. Parsing tracefile for upc_trace-testtrace-4-14739-0.. done Parsing thread info for upc_trace-testtrace-4-14739-1.. Parsing tracefile for upc_trace-testtrace-4-14739-1.. done Generating reports..
GET REPORT: SOURCE LINE TYPE MSG:(min max avg total) CALLS ============================================================================= testtrace.upc 9 GLOBAL 4 B 4 B 4 B 8 B 2 Thread 0 4 B 4 B 4 B 4 B 1 Thread 2 4 B 4 B 4 B 4 B 1 testtrace.upc 9 LOCAL 4 B 4 B 4 B 8 B 2 Thread 1 4 B 4 B 4 B 4 B 1 Thread 3 4 B 4 B 4 B 4 B 1 testtrace.upc 18 GLOBAL 100 B 100 B 100 B 200 B 2 Thread 1 100 B 100 B 100 B 100 B 1 Thread 3 100 B 100 B 100 B 100 B 1 testtrace.upc 18 LOCAL 100 B 100 B 100 B 200 B 2 Thread 0 100 B 100 B 100 B 100 B 1 Thread 2 100 B 100 B 100 B 100 B 1 testtrace.upc 20 GLOBAL 100 B 100 B 100 B 200 B 2 Thread 0 100 B 100 B 100 B 100 B 1 Thread 2 100 B 100 B 100 B 100 B 1
PUT REPORT: SOURCE LINE TYPE MSG:(min max avg total) CALLS ============================================================================= testtrace.upc 7 GLOBAL 4 B 4 B 4 B 8 B 2 Thread 1 4 B 4 B 4 B 4 B 1 Thread 3 4 B 4 B 4 B 4 B 1 testtrace.upc 7 LOCAL 4 B 4 B 4 B 8 B 2 Thread 0 4 B 4 B 4 B 4 B 1 Thread 2 4 B 4 B 4 B 4 B 1 testtrace.upc 13 GLOBAL 4 B 4 B 4 B 8 B 2 Thread 1 4 B 4 B 4 B 4 B 1 Thread 3 4 B 4 B 4 B 4 B 1 testtrace.upc 13 LOCAL 4 B 4 B 4 B 8 B 2 Thread 0 4 B 4 B 4 B 4 B 1 Thread 2 4 B 4 B 4 B 4 B 1 testtrace.upc 15 GLOBAL 4 B 4 B 4 B 8 B 2 Thread 1 4 B 4 B 4 B 4 B 1 Thread 3 4 B 4 B 4 B 4 B 1 testtrace.upc 15 LOCAL 4 B 4 B 4 B 8 B 2 Thread 0 4 B 4 B 4 B 4 B 1 Thread 2 4 B 4 B 4 B 4 B 1 testtrace.upc 19 GLOBAL 100 B 100 B 100 B 200 B 2 Thread 1 100 B 100 B 100 B 100 B 1 Thread 3 100 B 100 B 100 B 100 B 1 testtrace.upc 19 LOCAL 100 B 100 B 100 B 200 B 2 Thread 0 100 B 100 B 100 B 100 B 1 Thread 2 100 B 100 B 100 B 100 B 1 testtrace.upc 20 GLOBAL 100 B 100 B 100 B 200 B 2 Thread 1 100 B 100 B 100 B 100 B 1 Thread 3 100 B 100 B 100 B 100 B 1
BARRIER REPORT: SOURCE LINE TYPE MSG:(min max avg total) CALLS ============================================================================= testtrace.upc 8 WAIT 151.0 us 165.0 us 158.0 us 632.0 us 4 Thread 0..1 165.0 us 165.0 us 165.0 us 165.0 us 1 Thread 2..3 151.0 us 151.0 us 151.0 us 151.0 us 1 testtrace.upc 8 NOTIFYWAIT 43.0 us 95.0 us 69.0 us 276.0 us 4 Thread 0..1 95.0 us 95.0 us 95.0 us 95.0 us 1 Thread 2..3 43.0 us 43.0 us 43.0 us 43.0 us 1 testtrace.upc 11 WAIT 241.0 us 330.0 us 285.5 us 1.1 ms 4 Thread 0..1 241.0 us 241.0 us 241.0 us 241.0 us 1 Thread 2..3 330.0 us 330.0 us 330.0 us 330.0 us 1 testtrace.upc 11 NOTIFYWAIT 25.0 us 27.0 us 26.0 us 104.0 us 4 Thread 0..1 25.0 us 25.0 us 25.0 us 25.0 us 1 Thread 2..3 27.0 us 27.0 us 27.0 us 27.0 us 1 testtrace.upc 12 WAIT 142.0 us 164.0 us 153.0 us 612.0 us 4 Thread 0..1 164.0 us 164.0 us 164.0 us 164.0 us 1 Thread 2..3 142.0 us 142.0 us 142.0 us 142.0 us 1 testtrace.upc 12 NOTIFYWAIT 34.0 us 44.0 us 39.0 us 156.0 us 4 Thread 0..1 34.0 us 34.0 us 34.0 us 34.0 us 1 Thread 2..3 44.0 us 44.0 us 44.0 us 44.0 us 1 testtrace.upc 23 WAIT 167.0 us 368.0 us 267.5 us 1.1 ms 4 Thread 0..1 368.0 us 368.0 us 368.0 us 368.0 us 1 Thread 2..3 167.0 us 167.0 us 167.0 us 167.0 us 1 testtrace.upc 23 NOTIFYWAIT 30.0 us 56.0 us 43.0 us 172.0 us 4 Thread 0..1 56.0 us 56.0 us 56.0 us 56.0 us 1 Thread 2..3 30.0 us 30.0 us 30.0 us 30.0 us 1 testtrace.upc 29 WAIT 80.0 us 424.0 us 252.0 us 1.0 ms 4 Thread 0..1 80.0 us 80.0 us 80.0 us 80.0 us 1 Thread 2..3 424.0 us 424.0 us 424.0 us 424.0 us 1 testtrace.upc 29 NOTIFYWAIT 18.0 us 32.0 us 25.0 us 100.0 us 4 Thread 0..1 18.0 us 18.0 us 18.0 us 18.0 us 1 Thread 2..3 32.0 us 32.0 us 32.0 us 32.0 us 1
Puts and gets (accesses via pointer-to-shared) are each reported based on the source line that performed the access with a count and message size statistics. The type (LOCAL vs GLOBAL) indicates whether the access was performed locally using shared memory or using network communication.
The barrier report lists each barrier executed by the program run, grouped by
source line number with a count and timing statistics. Each barrier operation
has two corresponding entries - NOTIFYWAIT indicates the time interval between
the upc_notify and corresponding upc_wait operation for the barrier (will be
very small in the case of upc_barrier), and WAIT indicates the time interval
spent blocking at the upc_wait operation awaiting barrier completion. High
WAIT times generally indicate load imbalance, which could possibly be resolved
by separating the upc_notify and upc_wait operations to increase the NOTIFYWAIT
time and thereby overlap some of the barrier time with useful computation.
REPORTING BUGS
We are very interested in fixing any bugs in upc_trace. For bug reporting instructions, please go to https://upc.lbl.gov.
upcc(1), upcrun(1)
The Berkeley UPC Users Guide (available at https://upc.lbl.gov)
Berkeley UPC | UPC_TRACE (1) | October 2022 |