Instrumentation and Measurement Modules - Parallel Performance Wizard v3.2 Developer's Guide

Next: PAR Data File Notes, Previous: Build System, Up: Top

3 Instrumentation and Measurement Modules

For this version of PPW, all instrumentation and measurement code lives in the src directory. There is extensive documentation on that source code available by using the Doxygen tool, or referring to the online version of the documentation.

3.1 High-Level Overview of Measurement Module

The measurement module consists of ANSI C code that deals with the low-level details of recording performance data efficiently at runtime. The code uses an object-oriented like method of storing all state information inside opaque handles that need to be passed in on all subsequent invocations of functions related to that class of operations. For example, the ppw_io.h code uses a ppw_file handle that is returned by ppw_open() or ppw_create(), and has to be passed in to all related file I/O functions like ppw_read_bytes(). This style of code is very prevalent in the rest of the ANSI C code.

At the lowest level of the measurement module sits the generic I/O routines defined inside ppw_io.h and ppw_io_buffered.h. These functions provide both buffered an unbuffered I/O functionality, which also provide endian-aware functions for writing binary data. There are also a few special routines for easily storing strings and arrays.

Sitting on top of the raw I/O routines are the ppw_io_struct.h routines, which provide a way to serialize and deserialize entire structs to a portable format. These routines rely on the existing of static “offset” arrays and format strings for all data structs, which are generated automatically from a Perl script in the codegen directory. This Perl script is driven by the format-1.1.conf configuration file, which has definitions of some constants used by PPW and also has definitions of all structs used by the measurement code. The Perl code generator script uses this config file to generate ppw_structs.h, which house the aforementioned offset arrays and format strings for each struct. The offset arrays are computed at compile time with the help of the simple macro ppw_offsetof, which is a version of the standard C99 offsetof macro that should work on most systems (even those without C99 support).

Sitting on top of the raw structs lies ppw_profile.h, which simply takes care of grouping together these structs in a certain way to form raw performance data file files.

PPW's profiling logic is embedded inside a bunch of inline functions located in ppw_profiler.h, which is used by ppw_meas.h to provide PPW's basic measurement API. The ppw_meas.h interface takes care of most of the drudgery associated with starting up the measurement interface, and also uses many inline function definitions embedded in the interface file for efficiency. The ppw_meas.h interface also takes care of handling trace record buffering. See the Doxygen docs for more information on the exact algorithms used by the profiler and measurement interfaces.

User configuration is controlled by the function set shown in ppw_userconfig.h, which simply reads environment variables and sets default configuration options. The ppw_meas.h interface automatically handles getting and validating the user's configuration options.

Trace and profile merging code is entirely handled inside the functions defined in ppw_merge.h. The merge code relies on model-independent “upcalls” (defined in ppw_upcall.h) to implement the data collection and processing phase. These upcalls must be written in each new language that PPW is ported to, and include only basic operations such as a barrier and generic send and receive operations. As with user configuration, the ppw_meas.h interface provides high-level routines for initiating the merge phase.

3.2 Language-Dependent Parts of the Measurement Module

PPW's measurement API has been specifically designed to work well with GASP-enabled languages. To this end, language support for UPC and SHMEM are handled by GASP wrappers that interface a GASP-enabled language with the standard measurement API defined in ppw_meas.h.

While most UPC compilers already have (or will soon have) support for the GASP interface, we had to retrofit a GASP interface onto library-based languages such as SHMEM that do not already have a standard performance interface as robust as GASP. Special “GASP adapters” that add a GASP interface to SHMEM can be found in the gaspref subdirectory of the source installation. It is strongly suggested that adding support for additional non-GASP languages (such as MPI) be handled by creating new GASP adapters similar to the SHMEM GASP adapter already in place for Quadrics SHMEM. This greatly simplifies the process of adding support for new languages, or adding support for another implementation of a language that is already supported (such as other variants of SHMEM).

Each GASP wrapper is contained in a file named gasp_[language], and tends to be rather language-specific. Each wrapper contains implementations for all of the upcalls defined in ppw_upcall.h, in a separate header file that is recompiled against the user's code when necessary (as with UPC's static/dynamic threads environment). The overall workflow of each wrapper is similar, and generally the wrappers do the following at runtime:

Create a new measurement handle inside their version of gasp_init and do some rudimentary querying of the execution environment, such as getting the number of nodes in the run.
Handle general events in the GASP event notification by making appropriate calls to the measurement API. In particular, the ppw_meas_srcid function is called for each GASP event notification to get the generic source identifier for this particular call.
Handle calls to ppw_gasp_lookupsrc, which is called by the measurement API when the wrapper calls ppw_meas_srcid with a source ID the measurement API hasn't seen yet. It is inside the ppw_gasp_lookupsrc function where the mapping of language-specific operations to PPW's general event classes takes place. See the next section for more information on PPW's data model.
Handle special event types (exits, etc) in the GASP event notification callback routine.

The UPC and SHMEM language-depending implementations also include a simple clock synchronization that is based on F. Cristian's paper entitled “A Probabilistic Approach to Distributed Clock Synchronization” which has been modified to use one-sided communications. The global clock synchronization algorithm is essentially remote clock reading and is very simple, but effective, and is used to adjust timestamps on trace records during the merge phase.

3.3 Measurement Data Model

In recent versions of PPW, PPW uses a flexible data model to record performance information. See the format-1.1.conf configuration file for a full description of all data structures used by this data model, although a few of the more important concepts will be discussed here.

The basic item in the PPW data model is a numeric source identifier, which marries a specific operation with a particular line of code. These operations are further broken down into generic operation types, such as “Function,” “Get,” or “Barrier.” Each operation type also has a trace record body associated with it when profiling operations of that type. For instance, the trace record body associated with “Get” operations contains information about the get operation, such as the number of bytes read and which thread the data was read from.

Instead of using a fixed list of operation types, these types are included in the data file itself. This allows new types to be added to the file format without needing to change existing data readers. Furthermore, information about trace record body sizes are also encoded in the file itself (rather than in a header or configuration file) so that when a reader encounters a trace record type it doesn't know anything about, it can still safely skip over the trace record body without knowing anything about what kind of data is inside the trace record.

This flexible data format simplifies the frontend data browsers significantly. By generating a few simple lookup tables when reading files, code that reads this data format can efficiently handle data for operation types and languages that were not supported when the data browser was initially written. (For a good example of this, see the “Operation Types Pie Chart” visualization of the Java frontend). Additionally, through some simple string matching, language-dependent analyses can still be performed, although such “hard-coded” schemes for displaying or analyzing data should be avoided.

Even though the data format is self-defining, care must be taken when modifying the individual data structures that make of the file format. In particular, if new data is needed for a particular operation that isn't covered by an existing operation type or trace record body type, rather than extending the existing type a new type should be added. This avoids the problem of breaking compatibility with existing readers that rely on particular operations being structured in a certain way.

3.4 Adding Support for New Languages

When adding support for new languages to PPW, the first thing to do is to consider any extensions that have to be made to the existing data model so that operations in the new language can be sufficiently tracked. If the language uses a SPMD-style execution model, support for the new language can probably be accomplished by simply adding new operation types and trace body types where appropriate. If not, then the data model may have to be extended in other ways. Be sure to follow the guidelines set out in the last section for keeping compatibility with existing readers. If you absolutely must change the file format in a way that will break compatibility with existing readers, update the file format's version stamp so that the existing readers can at least display a helpful error message to a user.

Once you are satisfied with any extensions to the data model, you'll need to provide a wrapper that interfaces the instrumentation technique for this new language with the existing measurement API. Since the API is GASP-centric, it makes sense to implement GASP support directly into the language/compiler implementation, or to write an “adapter” in a similar manner to the GASP SHMEM adapter that currently exists for Quadrics SHMEM.

So far, we have implicitly assumed that you want to measure a program and present data to the user alongside their original source code. If you are working with a programming environment in which this doesn't make much sense (as in reconfigurable computing), you'll want to figure out some other way of integrating data collected from these sources to PPW's overall display. One possible way of doing this is to link against the measurement API library and periodically query the current state of the measurement code so you can associate data from an outside entity with other parts of the application. If you are collecting trace-style data, then you'll probably also want to use whatever timers that PPW uses internally so your performance data will agree with PPW's timestamps used in its normal trace records.