[GENERAL INFO] As of version 2.0, Berkeley UPC includes 'upc_trace', a tool for analyzing the communication behavior of UPC programs. When run on the output of a trace-enabled Berkeley UPC program, 'upc_trace' provides information on which lines of code in your UPC program generated network traffic: how many messages the line caused, what type (local and/or remote gets/puts), what the maximum/minimum/average/combined sizes of the messages were. How to use 'upc_trace': .IP \(bu 2 Tracing must be enabled in order to work. By default, tracing is enabled for debug compilations (i.e. if .B 'upcc -g' is used), but not otherwise (as it incurs some overhead). If you wish to also trace non-debug executables, you must rebuild your UPC system and pass .B '--with-multiconf=+opt_trace' to configure. .IP \(bu You must run your application with .B 'upcrun -trace ...' or .B 'upcrun -tracefile TRACE_FILE_NAME ...'. Either of these flags causes your UPC executable to dump out tracing information while it executes. The .B '-trace' flag causes one file per UPC thread to be generated, with the name 'upc_trace-a.out..-N', where 'a.out' is the name of your executable, and 'N' is the UPC thread's number. The .B '-tracefile NAME' option lets you specify your own name for the tracing file(s): if the name contains a '%' character, one trace file per thread is generated, with the '%' replaced with the UPC thread's number. Otherwise, all threads will write to the same file. .RS .IP .I Note that running with tracing may slow down your application .I considerably: the exact amount depends on your filesystem, and the ratio of .I communication/computation in your program. If you are only interested in a .I subset of trace information, consider setting .B GASNET_TRACEMASK .I and/or .B GASNET_TRACELOCAL .I as described in the Berkeley UPC User's Guide. .RE .IP \(bu After your application has completed, you may run 'upc_trace' on one or more of the trace files generated by your program run: .RS .IP Running .B 'upc_trace' on a trace file generated by a single UPC thread shows the information only for that thread. If you pass multiple files from the same application run, the information for the various threads is coalesced, so passing in all the tracefiles generated by a run allows you to see information for the entire application. .IP There are a number of flags to .B 'upc_trace' which control what kinds of information is reported, and how it is sorted. See .B 'upc_trace --help' for details. .IP Note that upc_trace may take a while to run, especially on large tracefiles. We plan to optimize its performance in the future. .RE .RE [SAMPLE OUTPUT] Here is example output from upc_trace for a 4-thread, 2-node test program: .po 0 .in 0 .ll 79 .B $ upc_trace -t upc_trace-* .RE .nf .in 0 Parsing thread info for upc_trace-testtrace-4-14739-0.. .RE .in 0 Parsing tracefile for upc_trace-testtrace-4-14739-0.. done .RE .in 0 Parsing thread info for upc_trace-testtrace-4-14739-1.. .RE .in 0 Parsing tracefile for upc_trace-testtrace-4-14739-1.. done .RE .in 0 Generating reports.. .fi .nf GET REPORT: .RE .in 0 SOURCE LINE TYPE MSG:(min max avg total) CALLS ============================================================================= testtrace.upc 9 GLOBAL 4 B 4 B 4 B 8 B 2 Thread 0 4 B 4 B 4 B 4 B 1 Thread 2 4 B 4 B 4 B 4 B 1 testtrace.upc 9 LOCAL 4 B 4 B 4 B 8 B 2 Thread 1 4 B 4 B 4 B 4 B 1 Thread 3 4 B 4 B 4 B 4 B 1 testtrace.upc 18 GLOBAL 100 B 100 B 100 B 200 B 2 Thread 1 100 B 100 B 100 B 100 B 1 Thread 3 100 B 100 B 100 B 100 B 1 testtrace.upc 18 LOCAL 100 B 100 B 100 B 200 B 2 Thread 0 100 B 100 B 100 B 100 B 1 Thread 2 100 B 100 B 100 B 100 B 1 testtrace.upc 20 GLOBAL 100 B 100 B 100 B 200 B 2 Thread 0 100 B 100 B 100 B 100 B 1 Thread 2 100 B 100 B 100 B 100 B 1 .fi .nf PUT REPORT: .RE .in 0 SOURCE LINE TYPE MSG:(min max avg total) CALLS ============================================================================= testtrace.upc 7 GLOBAL 4 B 4 B 4 B 8 B 2 Thread 1 4 B 4 B 4 B 4 B 1 Thread 3 4 B 4 B 4 B 4 B 1 testtrace.upc 7 LOCAL 4 B 4 B 4 B 8 B 2 Thread 0 4 B 4 B 4 B 4 B 1 Thread 2 4 B 4 B 4 B 4 B 1 testtrace.upc 13 GLOBAL 4 B 4 B 4 B 8 B 2 Thread 1 4 B 4 B 4 B 4 B 1 Thread 3 4 B 4 B 4 B 4 B 1 testtrace.upc 13 LOCAL 4 B 4 B 4 B 8 B 2 Thread 0 4 B 4 B 4 B 4 B 1 Thread 2 4 B 4 B 4 B 4 B 1 testtrace.upc 15 GLOBAL 4 B 4 B 4 B 8 B 2 Thread 1 4 B 4 B 4 B 4 B 1 Thread 3 4 B 4 B 4 B 4 B 1 testtrace.upc 15 LOCAL 4 B 4 B 4 B 8 B 2 Thread 0 4 B 4 B 4 B 4 B 1 Thread 2 4 B 4 B 4 B 4 B 1 testtrace.upc 19 GLOBAL 100 B 100 B 100 B 200 B 2 Thread 1 100 B 100 B 100 B 100 B 1 Thread 3 100 B 100 B 100 B 100 B 1 testtrace.upc 19 LOCAL 100 B 100 B 100 B 200 B 2 Thread 0 100 B 100 B 100 B 100 B 1 Thread 2 100 B 100 B 100 B 100 B 1 testtrace.upc 20 GLOBAL 100 B 100 B 100 B 200 B 2 Thread 1 100 B 100 B 100 B 100 B 1 Thread 3 100 B 100 B 100 B 100 B 1 .fi .nf BARRIER REPORT: .RE .in 0 SOURCE LINE TYPE MSG:(min max avg total) CALLS ============================================================================= testtrace.upc 8 WAIT 151.0 us 165.0 us 158.0 us 632.0 us 4 Thread 0..1 165.0 us 165.0 us 165.0 us 165.0 us 1 Thread 2..3 151.0 us 151.0 us 151.0 us 151.0 us 1 testtrace.upc 8 NOTIFYWAIT 43.0 us 95.0 us 69.0 us 276.0 us 4 Thread 0..1 95.0 us 95.0 us 95.0 us 95.0 us 1 Thread 2..3 43.0 us 43.0 us 43.0 us 43.0 us 1 testtrace.upc 11 WAIT 241.0 us 330.0 us 285.5 us 1.1 ms 4 Thread 0..1 241.0 us 241.0 us 241.0 us 241.0 us 1 Thread 2..3 330.0 us 330.0 us 330.0 us 330.0 us 1 testtrace.upc 11 NOTIFYWAIT 25.0 us 27.0 us 26.0 us 104.0 us 4 Thread 0..1 25.0 us 25.0 us 25.0 us 25.0 us 1 Thread 2..3 27.0 us 27.0 us 27.0 us 27.0 us 1 testtrace.upc 12 WAIT 142.0 us 164.0 us 153.0 us 612.0 us 4 Thread 0..1 164.0 us 164.0 us 164.0 us 164.0 us 1 Thread 2..3 142.0 us 142.0 us 142.0 us 142.0 us 1 testtrace.upc 12 NOTIFYWAIT 34.0 us 44.0 us 39.0 us 156.0 us 4 Thread 0..1 34.0 us 34.0 us 34.0 us 34.0 us 1 Thread 2..3 44.0 us 44.0 us 44.0 us 44.0 us 1 testtrace.upc 23 WAIT 167.0 us 368.0 us 267.5 us 1.1 ms 4 Thread 0..1 368.0 us 368.0 us 368.0 us 368.0 us 1 Thread 2..3 167.0 us 167.0 us 167.0 us 167.0 us 1 testtrace.upc 23 NOTIFYWAIT 30.0 us 56.0 us 43.0 us 172.0 us 4 Thread 0..1 56.0 us 56.0 us 56.0 us 56.0 us 1 Thread 2..3 30.0 us 30.0 us 30.0 us 30.0 us 1 testtrace.upc 29 WAIT 80.0 us 424.0 us 252.0 us 1.0 ms 4 Thread 0..1 80.0 us 80.0 us 80.0 us 80.0 us 1 Thread 2..3 424.0 us 424.0 us 424.0 us 424.0 us 1 testtrace.upc 29 NOTIFYWAIT 18.0 us 32.0 us 25.0 us 100.0 us 4 Thread 0..1 18.0 us 18.0 us 18.0 us 18.0 us 1 Thread 2..3 32.0 us 32.0 us 32.0 us 32.0 us 1 .fi .ll .RE Puts and gets (accesses via pointer-to-shared) are each reported based on the source line that performed the access with a count and message size statistics. The type (LOCAL vs GLOBAL) indicates whether the access was performed locally using shared memory or using network communication. The barrier report lists each barrier executed by the program run, grouped by source line number with a count and timing statistics. Each barrier operation has two corresponding entries - NOTIFYWAIT indicates the time interval between the upc_notify and corresponding upc_wait operation for the barrier (will be very small in the case of upc_barrier), and WAIT indicates the time interval spent blocking at the upc_wait operation awaiting barrier completion. High WAIT times generally indicate load imbalance, which could possibly be resolved by separating the upc_notify and upc_wait operations to increase the NOTIFYWAIT time and thereby overlap some of the barrier time with useful computation. [REPORTING BUGS] We are very interested in fixing any bugs in upc_trace. For bug reporting instructions, please go to https://upc.lbl.gov. [SEE ALSO] upcc(1), upcrun(1) The Berkeley UPC User's Guide (available at https://upc.lbl.gov)