|
Berkeley UPC - Unified Parallel C(A joint project of LBNL and UC Berkeley) |
|
Berkeley UPC User's Guide version 2.1.0 |
This version of Berkeley UPC includes
#include <upc_relaxed.h>
#include <stdio.h>
int main() {
printf("Hello from thread %i/%i\n", MYTHREAD, THREADS);
upc_barrier;
return 0;
}
This program prints a message once from each thread (in some arbitrary
interleaving), executes a barrier (optional), and exits.
For more involved examples of UPC code, see the UPC Language Tutorials - from the UPC Language Community website and the 'upc-examples' directory in of the Berkeley UPC runtime distribution. The official UPC language specification is a useful reference, and contains a description of the standard library. Finally, the UPC Collectives Specification describes the collective operations available in UPC.
upcc -o light particle.upc wave.c -lgrottymath
Note that 'wave.c' can contain either UPC code or regular C code, and the
'grottymath' library that is linked into the application can be a regular C
library: Berkeley UPC is fully interoperable with regular C source, object, and
library files (note: if you compile with the -pthreads flag,
any C libraries you use must be thread-safe). Berkeley UPC 2.0 also adds
support for linking C++/FORTRAN/MPI objects into a UPC executable: see Mixing C/C++/MPI/FORTRAN with UPC.
upcc recognizes most commonly used C compiler flags (-D, -I, etc.). It also uses a number of its own flags for the choice of network API your program will run over, for compiling your UPC code for a static number of threads, and other UPC-specific options. See the upcc man page for details.
| Name | Description |
| lapi | LAPI API for IBM SP networks |
| gm | GM API for Myrinet networks |
| elan | elan API for Quadrics networks |
| vapi | API for Mellanox-based Infiniband networks |
| sci | SISCI API for Dolphin-based SCI networks (EXPERIMENTAL- currently requires the Linux BigPhysMem kernel patch in order to get more than 1MB of shared heap space) |
| shmem | SHMEM API for SGI Altix systems and the Cray X1. Other systems providing a SHMEM API may also work, but have not been tested. |
| udp | UDP: works on any system with a standard TCP/IP stack, but is typically slower than using one of the native network types. Generally the fastest option for systems with only Ethernet hardware (notably faster than MPI-over-TCP). |
| mpi | MPI: works on any system with MPI installed, but is typically slower than using one of the other network types. |
| smp | "Symmetric multiprocessor (SMP)" mode: uses no network. Currently runs with only a single process, so you must use -pthreads to run with multiple UPC threads. |
Note that you can only compile for a given network type if your Berkeley UPC runtime was configured to support it at build/installation time. To see which APIs are supported in your installation, and to see which is used by default, use 'upcc --version'.
An executable compiled for a fixed number of UPC threads will fail at startup if you try to run it with a different number of threads. However, fixing the number of threads allows optimization on certain operations (such as shared pointer arithmetic), especially when the number of threads is a power of 2.
| Name | Value | Description | Standard |
| __UPC__ | 1 | Defined by any UPC implementation | UPC language |
| __UPC_VERSION__ | Monotonically increasing positive integer constant | UPC specification supported: value is YYYYMM date of that version's ratification (ex: '200310L)' | UPC language |
| __UPC_STATIC_THREADS__ | 1 if static threads: else undefined | Set to 1 if the '-T' flag was passed to upcc | UPC language |
| __UPC_DYNAMIC_THREADS__ | 1 if dynamic threads: else undefined | Set to 1 unless the '-T' flag was passed to upcc | UPC language |
| __BERKELEY_UPC__ | Monotonically increasing positive integer constant | The major version number of the Berkeley UPC release. Example: '1' for release '1.0.3'. | Berkeley UPC only |
| __BERKELEY_UPC_MINOR__ | An integer constant | The minor version number of the Berkeley UPC release. Example: '0' for release '1.0.3'. | Berkeley UPC only |
| __BERKELEY_UPC_PATCHLEVEL__ | An integer constant | The patch version number of the Berkeley UPC release. Example: '3' for release '1.0.3'. | Berkeley UPC only |
| __BERKELEY_UPC_<NETWORK>_CONDUIT__ | 1, or undefined | Identifies the network API used. Example: if 'upcc -network=mpi' is used, '__BERKELEY_UPC_MPI_CONDUIT__' will be defined, with the value of 1 | Berkeley UPC only |
| __BERKELEY_UPC_PTHREADS__ | 1, or undefined | Defined to 1 if and only if the '-pthreads' flag is used | Berkeley UPC only |
A remote translator can be used either over HTTP, or SSH. To use HTTP, the the 'upcc.cgi' CGI script (located in the 'contrib' directory of the runtime distribution) must be installed and configured with a web server on the remote host. Simply set the 'translator' parameter in your '$HOME/.upccrc' file (or the global 'upcc.conf') to the URL for the CGI script. To use SSH, you must be able to login to the remote host using SSH, and the 'translator' parameter must be set to 'remote_host:/path/to/translator'. You will want to use key-based authentication, and 'ssh-agent' to avoid entering your password each time you compile. See our SSH Agent Tutorial.
Berkeley UPC executables should be run the same way as any other parallel program on your system that uses the same underyling network API. So, for instance, a program compiled with '--network=mpi' is run on many systems via 'mpirun -np <number of processes> a.out'. Other systems may use other invocations, such as 'prun' or 'poe', especially when API's other the MPI are used. Consult your system's documentation for details.
upcrun -n 4 parboil
This example runs the UPC executable 'parboil' on 4 nodes.
An additional benefit of using upcrun is that it provides consistent support for propagating environment variables to all threads of your UPC program. If you use upcrun, any environment variable beginning with either 'UPC_' or 'GASNET_' is guaranteed to be propagated to all threads. (Support for propagating all environment variables is planned). If you do not use upcrun, environment propagation will only work to the extent that the parallel job launcher you use provides it normally.
You can see how upcrun thinks your job should be run without actually running it by passing the '-t' flag to it. Also, 'upcrun -i <executable>' will provide information about a Berkeley UPC executable, such as the network API that it was built against, and the number of fixed threads (if any) that it was compiled for.
See 'upcrun --help' or the upcrun man page for more information.
The default amount of shared memory to reserve per UPC thread on a system is chosen at configure time (see the INSTALL document in the runtime distribution for details), but you can override that value for a particular application either at compile time, or at startup. Generally this is only needed if you observe that your application is running out of either shared or regular C memory.
To embed a different default amount of shared memory into your application, simply pass '-shared-heap=144MB' for instance (to get 144 megabytes per UPC thread). You can also use 'GB' for gigabyte amounts (if neither 'MB' nor 'GB' is used, megabytes are assumed). To override the embedded default amount of shared memory at application startup, set the UPC_SHARED_HEAP_SIZE environment variable to whatever value you want ('2GB', etc.).
Note: The Berkeley UPC runtime currently defaults to a limit of 2 Gigabytes maximum of shared memory per-process (i.e. if you are using pthreads, this limit is shared by the pthreads within a each process: otherwise the limit is per UPC thread). If your system can support more than this, you may configure the runtime to use a different maximum with 'configure --with-shared-mmap-max=16GB' (for 16 Gigs per process, etc.). The need to explicitly configure the runtime for large shared memory support will be removed in a future release.
While it is tempting to simply grab an extremely large shared memory segment, be aware that this is not always a good idea, or even possible. Since the shared address space range cannot be used for regular malloc allocations, creating too large of a shared space can cause the amount of regular heap memory available to your application to become small (causing malloc to eventually return NULL when you request more memory). Also, the shared memory space is reserved via an mmap() call, and while this does not generally cause any physical memory pages to be allocated, certain operating systems (for instance, Linux) will not allow more memory to be reserved by applications then the OS can guarantee is available, and so allocating a shared region larger than the physical memory (plus swap space) may fail.
The default amount of shared memory per UPC thread can be changed system-wide by modifying the 'shared_heap' parameter in the installation's upcc.conf file. You can override the system-wide default for your own applications by setting shared_heap in your $HOME/.upccrc file.
The upcc.conf file also provides a 'heap_offset' parameter (and upcc provides a '-heap-offset' flag) that affects where the address region for shared memory is located in your program. However, at present it is not useful on any of our supported systems, and so we do not recommend its use.
The '-pthreads' flag must be passed consistently at all stages of compilation and linking. Also, when pthreads are used, upcc needs to delay much of the compilation of your code until link time, so if you split code generation into separate compilation and linking steps (i.e., 'upcc -c foo.upc', followed by 'upcc foo.o bar.o'), you need to pass any macro and/or include path directives (ex: '-DFOO=bar -I/usr/local/include') to upcc to both the compilation and link commands.
Any C libraries that your code links against must be thread-safe in order to be used with -pthreads. If one or more of your libraries is not thread-safe, you must compile without pthreads, and run separate processes on the same machine to exploit an SMP system. Currently, such processes will not use any shared memory optimizations, and will communicate with each other via the network API. While this is generally still much faster than communicating with UPC threads on other nodes, it is still not as fast as using shared memory. Support for shared memory between non-pthreaded Berkeley UPC processes will be provided in the near future.
When you link an application with '-pthreads', a subdirectory named <executable_name>_pthread-link will be created in the current directory. This directory exists in order to speed up further linking commands of the same program. If you link the same application again with the same object file names, and none of the global static unshared variables in your program have changed name or size, recompilation of all the files in your application can be avoided, which can make a significant difference in build time for programs with many source files. You may delete the temporary directory at any time without any side effects (other than possibly longer link times).
Unless otherwise specified, pthreaded UPC applications use a default number of pthreads per process (run 'upcc --version' to see the default for your system. This number is set in the upcc.conf configuration file, and can be changed there (or in your '$HOME/.upccrc' file). It can also be overridden in several ways. Compiling with 'upcc -pthreads=<NUMBER>' changes the default number of pthreads per UPC process for an executable to NUMBER. If the 'UPC_PTHREADS_PER_PROC' environment variable is set to a nonzero integer when you run a UPC program, it will override any default value. Finally, upcrun is smart about pthreads in several ways. First, if you run a pthreaded parallel job with 'upcrun -n <NUMBER> ...', the number of processes actually launched will be divided by the number of pthreads, so that exactly NUMBER UPC threads are used. Second, if you use smp network option (which generates a non-parallel, executable that will run only a single process), upcrun will automatically set the number of pthreads to NUMBER.
How to use 'upc_trace':
Note that running with tracing may slow down your application considerably: the exact amount depends on your filesystem, and the ratio of communication/computation in your program.
There is currently no support in Berkeley UPC for debugging programs at the UPC source level. However, we are currently working with Etnus to provide support for Berkeley UPC within the TotalView debugger.
In the meantime, Berkeley UPC does come with several mechanisms for attaching a regular C debugger to one or more of your UPC application's threads at various points during execution. This can be very useful if you wish to submit a helpful bug report to us. See our Debugging Berkeley UPC programs page for more information.
int bupc_dump_shared(shared const void *ptr, char *buf, int maxlen);
Any pointer to a shared type may be passed to this function. The 'maxlen'
parameter gives the length of the buffer pointed to by 'buf', and this
length must be at least BUPC_DUMP_MIN_LENGTH, or else -1 is returned,
and errno set to EINVAL. On success, the function returns 0,
The buffer will contain either "<NULL>" if the pointer to shared == NULL, or a
string of the form
"<address=0x1234 (addrfield=0x1234), thread=4, phase=1>"
The 'address' field provides the virtual address for the pointer, while the
'addrfield' contains the actual contents of the shared pointer's address bits.
On some configurations these values may be the same (if the full address of the
pointer can be fit into the address bits), while on others they may be quite
different (if the address bits store an offset from a base initial address
that may differ from thread to thread).
Both bupc_dump_shared() and BUPC_DUMP_MIN_LENGTH are visible when any of the standard UPC headers (upc.h, upc_relaxed.h, or upc_strict.h) are #included.
You will normally not need to call this function, as the runtime will automagically perform checks for incoming network requests whenever your UPC code causes network activity to be performed, and this usually occurs fairly frequently in a UPC application. However, if you writing your own 'spin lock' style synchronization, you may need to use this function to avoid deadlock. Here is an example:
shared strict int flag[THREADS];
...
if (MYTHREAD % 2) {
while (flag[MYTHREAD] == 0)
bupc_poll();
} else {
... some calculation ...
flag[MYTHREAD - 1] = 1;
}
Here the 'even' UPC threads are performing some calculation, then informing the
'odd' threads that the result is ready by setting a per-thread flag. If the
'bupc_poll()' were omitted, the 'odd' threads might (on certain
platforms/networks) consume all of the CPU forever in the 'while' test,
never checking for the incoming network message that would set flag[MYTHREAD].
If a program contains computationally intensive sections in which no remote accesses are performed for a long time, it is also possible that performance may be improved by intermittently calling bupc_poll, particularly if other threads are likely to be performing remote accesses (or memory allocation requests) during this time.
The Berkeley UPC guarantees that 'getenv' allows retrieval of environment variable values that were present when the job was launched. At present this function is only guaranteed to retrieve these value for all threads if the environment variable's name begins with 'UPC_' or 'GASNET_'. On some platforms all environment variables seen by the job launcher may be propagated, but it is not portable to rely on this.
The 'setenv()' and 'unsetenv' functions are not guaranteed to work in a Berkeley UPC runtime environment, and should be avoided.
If 'expr' has a static type which is identical to 'type', does nothing. Otherwise, prints a non-fatal warning containing the line number and a description of the two differing types.
#define NDEBUG
#include <assert.h>
will not work as expected if the NDEBUG definition modifies the behavior of
assert.h (which, in this example, it does: this NDEBUG/assert.h case is the most
common case where users run into this issue with our compiler).
There is a simple workaround: if you need to define a macro that affects the behavior of #included files, define it on the command line to upcc:
upcc -DNDEBUG myprogam.upc
Thank you for using Berkeley UPC!