This page outlines all the changes that were needed to add GASP support into GCC UPC, a branch of GCC that adds UPC support written by Intrepid. It is meant to give GASP language implementors an idea of what was required to add GASP support into an existing compiler infrastructure.

1. Patch file

You can download the patch described on this page here. The diff file was taken against the latest GCC UPC CVS snapshot as of 3/11/2007.

Alternatively, if you have a Subversion client, the following should check out the latest version of my GCC UPC GASP modifications:

svn co https://svn.hcs.ufl.edu/gccupc-gasp/trunk/ gccupc-gasp

Older patches:

2. High-level view of changes

To add GASP support to the GCC compiler, I basically followed the steps in the GASP implementors webpage and started by adding a “vacuous” implementation, then added support for more events. To do this, I

Overall, I wanted the whole workflow to work as I described in this email. In particular, my implementation allows a user to compile their UPC application with and without GASP instrumentation without having to flip back and forth between different installations of GCC UPC.

I handled GASP support in libupc by adding a profiled version of each function, which doubles the number of functions in the library but retains binary compatibility with object files produced by the regular version of GCC UPC 4.0.3.1. This approach has the advantage of not having to touch the existing libupc functions at all, and allows code compiled without GASP instrumentation to run “full speed”. This is very useful if a user only wants to profile parts of their application.

It's important to point out that nothing out of the ordinary is done unless a user/wrapper script passes one of the -finstrument-gasp flags to the GCC frontend. In fact, with my GASP changes, GCC UPC produces the exact same assembly code (minus a bug fix I mention below) as the original GCC UPC version 4.0.3.1 found on the Intrepid website. I think this is a useful property — it means if done with some forethought, adding GASP support to an existing compiler implementation can be done without fear of any bugs in that GASP implementation affecting normal users.

3. Detailed descriptions of the changes

When implementing GASP support in GCC UPC, I did encounter a few interesting things which I document here. These detailed notes serve two purposes: 1) as documentation for Intrepid to detail exactly what was changed and why, and 2) for other GASP implementors who might find this information useful while adding GASP support into their existing compiler.

Most of my code changes are protected with #ifndef DISABLE_GASP/#endif macros, except in cases where such a macro isn't needed.

3.1. GASP header files

This step was rather easy, as all I had to do was modify the canned GASP header files on the GASP website and add them to UPC_H in libupc/smp/Make-defs. Because the header files need to be installed alongside upc.h and are referenced by the source for libupc.a, I placed them in libupc/include and included symlinks to them in libupc/smp.

3.2. GASP init and exit function calls

This in itself was easy to do, as I just scanned libupc/smp/upc_main.c and added function calls at appropriate places. I did need to study the code a little bit to make sure I got all the entry and exit points for both collective and noncollective exits. See my notes at the top of upc_main.c for more information.

One slightly odd point I handled was that calls to exit(3) are rewritten by a macro in gcc-upc-lib.h into __upc_exit. Using a macro trick I describe later, I rewrite these calls to __upc_exitg which fires off GASP collective exit events. I'm not able to stuff in source line information as extra arguments because adding extra arguments creates a parameter mismatch for which causes a compilation error. A short example:

[leko@mu ~]$ cat test.c
#define exit(n) myexit(n, __FILE__, __LINE__)

#include <stdio.h>
#include <stdlib.h>

void myexit(int n, const char *file, int line) {
  printf("Exit at %s:%d\n", file, line);
}

int main() {
  exit(0);
  return 0;
}
[leko@mu ~]$ gcc test.c
In file included from test.c:4:
/usr/include/stdlib.h:640: error: syntax error before string constant

GCC does come with some functionality to hack up header files automatically (see fixincludes in the source directory), but rewriting all instances of exit() to take in two extra parameters using this same technique seems extremely hackish and is likely to break easily.

Note
Because exit() and upc_global_exit() are only rewritten if GASP instrumentation is being done, any calls to these functions done in code not compiled with a GASP flag will not result in a GASP exit event being passed to a tool.

3.3. GASP instrumentation flags

To add the -finstrument-gasp and -finstrument-gasp-functions flags, I modified gcc/c.opt and added the flags for C and UPC modes and added short descriptions for each flag. I added an extern definition for them in gcc/c-common.h, added initializers in gcc/upc/upc-lang.c, added handlers in gcc/upc/upc-lang.c, and added an error message in gcc/c-opts.c if the flags were passed in but GASP_DISABLED is defined in this compilation of the frontend.

Since -finstrument-gasp-functions implies -finstrument-gasp, I made sure to set the global flags as such inside gcc/upc/upc-act.c.

3.4. PUPC macro definition

This was easy: I just added a call to cpp_define in gcc/upc/upc-act.c, and added another for the __UPC_PUPC_INST__ macro when one of the GASP flags is set.

3.5. Pragma pupc

To add support for #pragma pupc, I registered a new pragma using the same technique used for #pragma upc (calling c_register_pragma()), and exported a get_upc_pupc_mode() function for querying the current setting of #pragma pupc, which defaults to on if a GASP flag was passed to the frontend.

Note
Since my #pragma pupc support is implemented the same way as #pragma upc is, only one setting can apply to a basic block of code. This is the same limitation that #pragma upc currently has in GCC UPC, and is somewhat annoying. I know that the GOMP implementation needs pragmas at finer granularity, so I checked out what they did… it turns out the authors of GOMP had to extend the C pragma interface by adding deferred pragmas that are pushed through libcpp entirely (ie, not handled at all by the preprocessor). This functionality was added after GCC v4.0.3, so I wasn't able to use it, but if/when GCC UPC is updated to a newer version of GCC we probably should change both the pupc and upc pragmas to use the new deferred pragma support.

While testing my pragma-handling code, I found a bug which I describe in a later section.

3.6. Gimplify hooks for instrumented libupc calls

This next bit was the trickiest part to get going. As outlined above and in my email, I needed to change the UPC gimplification process in gcc/upc/upc-gimplify.c to insert calls to either the standard libupc functions or the profiled libupc functions with extra arguments denoting source information.

To get everything rolling correctly, I modified gcc/optabs.h and gcc/optabs.c and added profiled versions of each UPC operation to the operator tables. Since the optabs can only have a one-character suffix, I used the convention of making the suffix denote the number of arguments used in the operation, and added a “g” character to the prefix for each profiled function. As an example, a regular shared put operation to an integer eventually invokes the __putsi2() function, while a profiled put invokes __putgsi4() instead.

In GCC, the input_filename and input_line macros give the current source filename and line positions, and I used those as arguments to my add_gasp_srcargs() function. The add_gasp_srcargs() takes care of adding the source arguments to the lib_args tree argument. Getting the filename passed in correctly was a bit tricky but I eventually found out how to do it by looking at some code for gcov support in gcc/coverage.c.

While implementing this code, I found a bug which I describe in a later section.

3.7. Macro tricks for regular UPC library calls

Since the standard UPC library calls such as upc_memget() and upc_memcpy() aren't translated or handled by the frontend, we need some macro tricks to get source information passed to the profiled versions. To do this, I use a simple macro trick that looks like this for a function upc_XXX():

#ifdef __UPC_PUPC_INST__
#define upc_XXX(arg1, arg2)     upc_XXXg(arg1, arg2, __FILE__, __LINE__)
#endif

Since __UPC_PUPC_INST__ is only defined when the user requests GASP instrumentation, this has the effect of “translating” all regular UPC library calls into profiled library calls with source information tacked at the end of the argument list that is automatically filled in by the preprocessor.

This technique works out just fine in practice, even if it is a little hackish. Unlike the problems associated with messing with functions defined in system headers (see the myexit() example above), it is unlikely that doing this magical rewrite via macros will cause compilation problems. However, since the macros are expanded in the preprocessing stage and not during the compilation stage, any #pragma pupc statements will not affect the “translation” of these library function calls into their profiled counterparts.

One other possibility for getting the #pragma pupc statements to affect the regular UPC library calls would be to pass along an extra parameter to the profiled variants of the library calls. This extra parameter would determine whether the GASP callback should happen, and the parameter's value would be be filled in by a macro that is changed whenever a #pragma pupc directive is processed. For example, something like this crude example:

#define upc_memcpy(p1, p2, n)    upc_memcpyg(p1, p2, n, __FILE__, __LINE__, pupcval)

void userfunc() {
  /* this next line would #define pupcval to 0 */
  #pragma pupc off

  /* this would be translated to upc_memcpyg with pupcval = 0 */
  upc_memcpy(p1, p2, n);

  /* this next line would #define pupcval to 1 */
  #pragma pupc on
}

This is similar to what Berkeley UPC does. Unfortunately, I wasn't able to get this to work. When I tried calling cpp_define() and cpp_undef() within the handle_pragma_pupc() handler, the GCC frontend would segfault with an internal error. This is probably happening because the ordering between the macro handler callbacks and macro expansions is done in such a way that the macro handler is not able to change macro definitions. In other words, the #pramga pupc handler can't do the equivalent of a #define.

It is unfortunate that I wasn't able to find an easy way to get the #pragma pupc directives to affect the regular UPC library function calls. This functionality could be achieved if all library function calls were handled in the UPC gimplification process; however, that would require significant code changes to the UPC frontend.

3.8. Function and upc_forall instrumentation

To instrument upc_forall loops, I changed the rule for the upc_forall_statement grammar construct in gcc/c-parse.in. I added a call to a upc_forall_gasp() function (defined in gcc/upc/upc-act.c) which returns a GASP function call tree if the frontend is in GASP instrumentation mode. If GASP support isn't compiled into the frontend or GASP instrumentation has been disabled, the upc_forall_gasp() returns NULL_TREE and nothing special is done.

As a side note, this would be a good place to add the upc_forall optimizations that are present in Berkeley UPC…

To instrument functions, I changed the gimplify_function_tree() function in gcc/gimplify.c and copied the same technique used by the -finstrument-functions flag. This was more straightforward than I originally thought, but there were some issues I had to solve in somewhat odd ways.

The function instrumentation technique basically places the function definition inside a big try/finally block, with the profile calls placed as the first statement in the function body and as the statement in the finally block. This creates the desired effect of calling a specific function on the entry and exit of a function, without having to analyze the tree for all possible exit points.

The first oddity encountered with adding GASP function call instrumentation here is that it is kind of gross to handle it in the toplevel gimplify rather than the UPC-specific gimplify handler. However, I'm pretty sure the gimplify process doesn't allow a language to modify a function definition directly, instead gimplifying the function body and handling the function definition without consulting any language hooks, so I was forced to modify the root gimplify_function_tree() function directly.

Because gcc/gimplify.c doesn't bring in the tree manipulation function prototypes, I define the ones I need access to as extern at the top of the file. In particular, I needed to use build_function_call() instead of build_function_call_expr() for reasons outlined below in the bug section.

The second oddity encountered was that I wanted to call a function that might not be defined in the current scope yet. Each of the libupc functions has its prototype defined after the user does #include <upc.h>, which makes the calls to lookup_name() in gcc/upc/upc-gimplify.c work just fine. However, using lookup_name() in the function instrumentation hook breaks when doing something like this:

void myfunc() { return; }

#include <upc.h>

int main() { myfunc(); return 0; }

If our profiled function hook's prototype is defined in upc.h, then compiling the bit of code above the #include <upc.h> fails miserably with a “runtime function not found” error because using lookup_name() relies on the function's prototype already being seen.

To get around this, I defined a new builtin inside gcc/tree.c and gcc/builtins.def and used the implicit_built_in_decls table instead of the lookup_name() function.

3.9. Instrumented libupc functions

There's really not too much to say about the instrumented versions of the libupc functions, aside from the fact that they use some simple macros I define in libupc/smp/gasp_utils.h. If you want to see a list of the names of all the profiled flavors of each function, see libupc/include/upc.h and libupc/include/gcc-upc-lib.h. I tried my best to follow the “g” suffix convention for all profiled function names, but as mentioned above I had to change it slightly to handle the single-character suffix for the operations defined by the optables.

One other note to make here is that the pupc functions (like pupc_event_start()) are “translated” directly into gasp_event_notify() calls as suggested by the GASP implementor's page. Since the “translation” is done using a simple library call with vararg macros, we make use of GCC's variadic preprocessing capabilities to smuggle in source information to the pupc functions. This will only cause problems if someone tries to use a third-party preprocessor that does not support variadic macros instead of using GCC's built-in preprocessor, which is an unlikely scenario considering that preprocessing is considered an integral part of compilation according to the C language specification.

3.10. Weak symbols for GASP tool functions

As mentioned above, a goal of this implementation was to allow users to use the same GCC UPC installation to compile regular and profiled versions of their applications.

Since libupc is statically linked against a user's UPC code, any functions it references must be defined somewhere. And since libupc.a contains the profiled functions which refer to gasp_event_notify(), unless there is a “dummy” version of gasp_event_notify() available, users will get linker errors when compiling code without a tool, even if none of their code was compiled with a GASP flag. A simple way to fix this is to provide a dummy implementation of gasp_event_notify() — however, this creates a problem when a tool wants to link in its own definition of gasp_event_notify().

To get around this problem, I give “dummy” definitions of all GASP toolside functions and define those dummy definitions as weak. Most platforms I know of do support weak functions, and for platforms/linkers that don't the worst that will happen is that users will never be able use a performance tool because they will never have the chance to override the dummy GASP tool functions, which is acceptable. An alternative workaround would be to automatically add a library with these dummy definitions to the linker if the user links their code without using a GASP flag. I'm not familiar with the process the frontend uses to invoke the linker, so I wasn't able to make that change.

Another small issue is that since main() is defined inside libupc.a (or libupc_pt.a) and the same libupc.a is used regardless of GASP being on or off, the GASP init and exit callbacks occur even when a user doesn't compile their code with a GASP flag. Since these callbacks end up invoking the dummy functions and the calls do not occur in performance-critical parts of the UPC runtime library, the extra function calls are harmless and my view is that trying to get rid of them is probably not worth the effort.

4. Bugs encountered during implementation

Here I document some small bugs I ran into while implementing GASP support in GCC UPC.

4.1. Out-of-date c-parse.y

The c-parse.y and c-parse.c files distributed with GCC UPC are out of date and cause problems if they have newer timestamps than c-parse.in. This shows up as a parsing problem when compiling with xupc or xgcc during the bootstrap phase (I forget which), and the errors reported are not obvious by any stretch of the imagination.

Steps to recreate: Type these commands after expanding the GCC UPC source tarball:

touch gcc/c-parse.y gcc/c-parse.c

Then build GCC UPC as normal. You will get fatal compiler errors halfway through the build.

Suggested fix: Delete c-parse.c and c-parse.y files from the tarball, or commit versions of them to CVS that are up-to-date.

A fix for this bug is now included in the GCC UPC CVS repository.

4.2. Strange assembly generation for synchronization calls

In gcc/upc/upc-gimplify.c, most of the libupc calls are generated using the build_function_call() function, except for the upc_gimplify_sync_stmt() function which used build_function_call_expr() instead. This generates some strange-looking assembly code on 64-bit machines. My Subversion commit entry explains this:

r16 | leko | 2006-11-24 02:09:30 -0500 (Fri, 24 Nov 2006) | 46 lines
Changed paths:
   M /trunk/gcc/upc/upc-gimplify.c

Fix code generation for upc_barrier, upc_notify, and upc_wait.  These
calls to the libupc were being generated via the
build_function_call_expr, which was generating some questionable
assembly that oddly seemed to work OK with the regular sync calls.
However, the code generated didn't work with the profiled calls.
Changing this to build_function_call() fixed the problem.

Here's a snipped of code that was generated by the call_expr() flavor:

        movq    .LC0(%rip), %rsi
        movzbq  .LC0+8(%rip), %rdx
        movzbq  .LC0+9(%rip), %rax
        salq    $8, %rax
        orq     %rax, %rdx
        movzbq  .LC0+10(%rip), %rax
        salq    $16, %rax
        orq     %rax, %rdx
        movzbq  .LC0+11(%rip), %rax
        salq    $24, %rax
        orq     %rax, %rdx
        movzbq  .LC0+12(%rip), %rax
        salq    $32, %rax
        orq     %rax, %rdx
        movl    $5, %ecx
        movl    $1, %edi
        call    __upc_barrierg

And here's the same that was generated by the fix:

        movl    $5, %edx
        movl    $.LC0, %esi
        movl    $1, %edi
        call    __upc_barrierg

Compare that with GCC's translation of this C fragment:

  my_barrierg(444, __FILE__, __LINE__); => to

        movl    $9, %edx
        movl    $.LC1, %esi
        movl    $444, %edi
        call    my_barrierg

So it was probably just the wrong function being used.

Note that the build_function_call() version generates four instructions for the call to __upc_barrierg as opposed to a dozen or so for the one generated by build_function_call_expr(). I'm not 100% sure of the differences between the two flavors, but the quadword assembly instructions generated by build_function_call_expr() look pretty wrong to me, and caused segfaults when the passed-in const char * string pointer was dereferenced.

Steps to recreate: Problem does not surface unless upc_gimplify_sync_stmt() passes in a const char *. You can recreate this by changing the build_function_call() to build_function_call_expr() after applying the patch above.

Suggested fix: Use build_function_call() in upc_gimplify_sync_stmt().

A fix for this bug is now included in the GCC UPC CVS repository.

4.3. Werror used when compiling in libupc

By default, libupc is compiled with -Wall -Werror. However, when a user adds extra warnings to the CFLAGS environment variable (which I do for testing our performance tool), GCC UPC fails to compile. For the record, here's the set of flags I use in my .cshrc that trigger the problem:

setenv CFLAGS "-g -Wall -Wno-unused \
    -Wdeclaration-after-statement -Wpointer-arith \
    -Wnested-externs -Wwrite-strings"

I'm not sure that hardcoding in -Werror is a good idea, especially if the user wants to compile GCC UPC with something other than GCC.

Steps to recreate: Set the CFLAGS environment variable as shown above and try to compile GCC UPC (specifically, libupc).

Suggested fix: Don't hardcode -Werror into WFLAGS. The patch above DOES NOT contain the fix.

4.4. Makefile in upc_tests/test assumes . is in PATH

The embedded scripts inside upc_test/test/Makefile assume that . (ie, the current working directory) is included in the user's PATH environment variable and fails to run when it is not.

Steps to recreate: make sure . is not in your PATH environment variable and type ./run_tests from within the upc_test/test directory after compiling GCC UPC.

Suggested fix: Add ./ to the appropriate lines of the Makefile and run_tests scripts in the upc_test/test directory. The patch above DOES NOT contain the fix.

4.5. pragma upc not allowed after opening brace

Previous versions of GCC UPC (at least versions 3.2.3.5 to 3.3.2.9) allowed use #pragma upc right after an opening brace nested inside a function definition, as in:

#include <upc.h>

shared int x;

int main() {
  {
    #pragma upc strict
    x = 2;
  }
  return 0;
}

However, with any of the 4.x-based versions of GCC UPC, I get this warning:

test.upc: In function 'main':
test.upc:7: warning: #pragma upc not allowed in this context

Comparing the c-parse.y from previous versions of GCC UPC, there used to be several calls to permit_pramga_upc() and deny_pragma_upc() at several nonterminal symbols, in addition to calls to push_upc_consistency_mode() and pop_upc_consistency_mode(). These calls are missing in the c-parse.y that is distributed with the last GCC UPC 4.x release, and are also missing from version in the CVS repository. Most likely they were accidentally dropped during the conversion to the 4.x codebase.

Steps to recreate: Try compiling the above file with recent versions of GCC UPC.

Suggested fix: Re-insert the calls mentioned above in the UPC grammar file. I'm not familiar with the C grammar to know which rules need which calls, but I'm guessing the grammar hasn't changed dramatically from the 3.x codebase so most of the original calls can be transferred over without many issues.

5. Interaction with Berkeley UPC

The above changes should integrate very well with GCC UPC+UPCR mode. The new profiled flavors of the libupc functions should be added to the upcr_gccupc.h and upcr_gccupc.c files in Berkeley UPC, and the profiled versions of those functions should use the UPCR_SET_SRCPOS macro appropriately.

One small issue is that in the current GCC UPC+UPCR mode (I believe), when UPCR is compiled with GASP support it normally relies on rewriting #pragma pupc into a macro which affects a thread-local variable that controls whether the GASP callback occurs or not — upcri_pevt_pragmahandoff(). This means that if the UPCR_SET_SRCPOS is used as suggested above, then a profiled get followed by a nonprofiled get will result in two GASP callbacks with the same source line information even though the first one should only be allowed to sneak through. I think this can be fixed by manually setting UPCRI_PEVT_PRAGMA to 1 for the duration of the profiled function calls, and keep it set to 0 for the unprofiled versions. Something like:

int
__getqi2 (upcr_shared_ptr_t src)
{
   UPCR_BEGIN_FUNCTION();
   char result;
   UPCRI_PEVT_PRAGMA = 0;
   upcr_get_shared(&result, src, (ptrdiff_t) 0, sizeof(result));
   return (int) result;
}

int
__getgqi4 (upcr_shared_ptr_t src, const char *file, int line)
{
   UPCR_BEGIN_FUNCTION();
   char result;
   UPCR_SET_SRCPOS(file, line);
   UPCRI_PEVT_PRAGMA = 1;
   upcr_get_shared(&result, src, (ptrdiff_t) 0, sizeof(result));
   UPCRI_PEVT_PRAGMA = 0;
   return (int) result;
}

Since the #pragma pupc and GASP instrumentation flags are handled by the GCC UPC frontend, we only need to ensure that the profiled versions of each libupc function incur a GASP callback.

I think there might also be a way to sneak in support for the --inst-local flag somehow by using some clever macros and making the tool callback decision for --inst vs --inst-local inside upcr_shaccess.h (which I think is how it is done with UPCR+GASP right now).

On the plus side, the current set of changes retain binary compatibility with Berkeley UPC 2.4.0. I was able to use my GASP-enabled GCC UPC as a frontend for Berkeley UPC 2.4.0 without making any changes to the Berkeley UPC code. Compiling with GASP enabled (-Wc,-finstrument-gasp) works just fine but predictably fails during linking because the current Berkeley UPC version does not contain definitions for the profiled versions of the libupc functions.

Note
Someone more familiar with UPCR should check the above for accuracy :-) I'm not sure if the UPCR_BEGIN_FUNCTION is really needed or not.