Berkeley UPC - Unified Parallel C
(A joint project of LBNL and UC Berkeley)
Lawrence Berkeley National Laboratory is hiring!
The Pagoda project and Computer Languages and System Software Group (CLaSS)
at LBNL are recruiting for the following positions:
NEW April 17, 2020 -- Berkeley UPC version 2020.4.0 released!
The UPC Language
Unified Parallel C (UPC) is an
the C programming language designed for high performance computing on large-scale
parallel machines.The language provides
a uniform programming model for both shared and distributed memory hardware. The
programmer is presented with a single shared, partitioned address
space, where variables may be directly read and written by any processor,
but each variable is physically associated with a single processor. UPC
uses a Single Program Multiple Data (SPMD) model of computation in which
the amount of parallelism is fixed at program startup time, typically with
a single thread of execution per processor.
In order to express parallelism, UPC extends ISO C 99 with the following
The UPC language evolved from experiences with three other earlier
languages that proposed parallel extensions to ISO C 99: AC
, Split-C, and Parallel
C Preprocessor (PCP). UPC is not a superset of these three languages,
but rather an attempt to distill the best characteristics of each. UPC combines
the programmability advantages of the shared memory programming paradigm
and the control over data layout and performance of the message passing
Our work at UC Berkeley/LBNL
|Berkeley UPC downloads since 01/May/2005
|Berkeley UPC Runtime Source
|Berkeley UPC Translator Source
|Berkeley UPC Cygwin Binary
|Berkeley UPC MacOS Binary
The Berkeley UPC compiler suite is currently maintained primarily by the
Pagoda Project at Lawrence Berkeley National Laboratory.
The goal of the Berkeley UPC team is to develop a portable, high performance
implementation of UPC for large-scale multiprocessors, PC clusters, and
clusters of shared memory multiprocessors. We are developing
an open-source UPC compiler suite whose
goals are portability and high-performance.
The major components of our project are:
Lightweight Runtime and Networking Layers: On distributed memory hardware,
references to remote shared variables usually translate into calls to a
communication library. Because of the shared
memory abstraction that it offers, UPC encourages a programming style where
remote data is accessed with a low granularity (i.e. the granularity of
an access is often the size of the primitive C types - int, float, double).
In order to be able to obtain good performance from an implementation,
it is therefore important that the overhead of accessing the underlying
communication hardware is minimized and the implementation exploits the
most efficient hardware mechanisms available.
Our group has thus developed a lightweight communication
and run-time layer for global address space programming languages.
In an effort to make our code useful to other projects, we have separated the
UPC-specific parts our runtime layer from the networking logic. If you are
implementing your own global address space language (or otherwise need a
low-level, portable networking library), you should look at our GASNet library,
which currently runs over a wide variety of high-performance networks (as well
as over any MPI 1.1 implementation).
Additionally, several external projects
have adopted GASNet for their PGAS networking requirements.
Compilation techniques for explicitly parallel languages: The group
is working on developing communication optimizations to mask the latency of
network communication, aggregate communication into more efficient bulk
operations, and cache data locally.
UPC allows programmers to specify memory accesses with
"relaxed " consistency semantics, which can be exploited by the compiler to hide
communication latency by overlapping communications with computation and/or
We are implementing optimizations for the common special cases in UPC where a
programmer uses either the default, cyclic block layout for distributed arrays,
or a shared array with 'indefinite' blocksize (i.e., existing entirely on one
processor). We are also examining optimizations based on avoiding the overhead
of shared pointer manipulation when accesses are known to be local.
Application benchmarks: The group is working on benchmarks and applications
to demonstrate the features of the UPC language and compilers, especially
targeting problems with irregular computation and communication patterns.
This effort will also allow us to determine the potential for optimizations
in UPC programs. In general,
applications with fine-grained data sharing benefit from the lightweight
communication that underlies UPC implementations, and the shared address
space model is especially appropriate when the communication is asynchronous.
Active Testing: UPC programs can have classes of bugs not possible in
a programming model such as MPI. In order to help find and correct data races,
deadlocks and other programming errors, we are working on
Dynamic Tasking: UPC Task Library
is a simple and effective way of adding task parallelism to SPMD programs.
It provides a high-level API that abstracts concurrent task management details
and a dynamic load balancing mechanism.
Some of the research findings from these areas of work can be found on our publications page.
UPC Compiler Infrastructure
There are multiple compiler infrastructures available for use with the
Berkeley UPC runtime and compiler driver. The LLVM-based (Clang-UPC) and
GCC-based (GUPC) compilers are developed by INTREPID Technology Inc.. An Open64-based
(BUPC) translator is developed at LBNL. Currently multiple options
encompassing these technologies are supported, for details please see the download page.
All compilers have been tested using the same procedure and our
recommendations in terms of robustness and performance are below.
- Clang-UPC source-to-source UPC-to-C (CUPC2C) translator
We consider this to be the most robust option.
Though results will vary with your choice of a backend C compiler,
this option is likely to result in best performance.
- Clang-UPC source-to-binary (CUPC) compiler
We believe this to be as robust as CUPC2C.
The performance is determined by LLVM (no option to use other backend compilers).
- Open64-based source-to-source UPC-to-C (BUPC) translator
This is the default compiler, hosted as a netcompile service by LBNL.
Performance is determined by the backend C compiler.
- GNU UPC source-to-binary (GUPC) compiler
We consider this option to be generally robust.
The performance is determined by GCC (no option to use other backend compilers)
All of the compiler options above use the Berkeley UPC runtime and compiler
driver, which is available on the
- Language Resources:
- UPC-related mailing list archives:
- Other UPC implementations: (incomplete list)
- HP UPC - for all HP-branded platforms, including Tru64, HPUX and Linux systems
- Cray UPC - for Cray X1, XT, XE, XK, XC and future Cray platforms
- SGI UPC - for Altix UV systems
- GNU UPC - for Linux, MacOSX, SGI IRIX, Cray T3E
- IBM UPC - for IBM Blue Gene and AIX SMP's
- Clang UPC - for Linux, MacOSX and others
- Michigan Tech MuPC - MPI-based reference implementation for Linux and Tru64
- Other past and present collaborations: (incomplete list)
- HPC Network hardware supported by Berkeley UPC, via the
GASNet communication system: (incomplete list)
General contact info
Berkeley UPC is developed and maintained by the
Pagoda Project at Lawrence Berkeley National Laboratory,
and is funded by the DOE Office of Science
and the Department of Defense.
This page last modified on Tuesday, 04-Aug-2020 11:32:00 PDT