Compiling UPC Programs - Parallel Performance Wizard v3.2 User Manual

Next: Running UPC Programs, Up: Analyzing UPC Programs

3.1 Compiling UPC Programs

In order to analyze the performance of your UPC program, you'll first need to recompile it using a PPW compiler wrapper script. Instead of compiling with upc or upcc, use ppwupcc instead.

The ppwupcc wrapper script has a few important options that can reduce the amount of performance data collected and help reduce instrumentation overhead. In particular, you can pass the --inst-local and --inst-functions options to ppwupcc to record more detailed performance information at the cost of higher perturbation.

We recommend compiling with the --inst-functions flag, which will allow you to relate performance information back to individual functions. The --inst-local option is useful if you'd like to identify segments of code that frequently access shared data local to the node, in addition to remote shared data accesses. Local accesses will show up in visualizations under regions having a ‘local’ suffix, such as ‘upc_get_local’. Note that tracking shared-local accesses is more expensive than tracking remote accesses only, and may cause PPW to over-report the actual time taken for parts of your code that perform many local data accesses in a short amount of time. If you experience very high overhead (ie, much longer execution times) while running your program under PPW, see Managing Overhead for tips on how to reduce that overhead.

For more information on the ppwupcc command, please see ppwupcc.