From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Mar 10 2010 - 16:49:47 PST
Lingyuan,
The only efforts I am aware of to layer threads on top of UPC have
been analogous to MPI_THREAD_FUNNELLED, in which one is permitted to
have multiple threads running but only one representative thread is
permitted to call (even implicitly) any code in the UPC runtime
library. If there is anybody on this list with success with any other
ways to mix threads with UPC, I'd be interested to learn about it.
The Berkeley UPC runtime and GASNet communication runtime libraries
are built in both thread-safe ("par") and non thread-safe ("seq")
configurations. If one passes -pthreads to the upcc command, then the
thread-safe version is linked, and otherwise the non thread-safe version
is used. My initial guess is that your segmentation fault is related to
use of the non thread-safe libraries. That can be resolved by passing
-pthreads=1 to upcc. This will ensure that the thread-safe libraries
are linked, but will run with one UPC thread per process. However, that
is probably not enough... read on.
I cannot be certain that a hybrid UPC+pthreads program as you
describe will work. The reason I am uncertain is that thread-safe
versions of both the UPC and GASNet runtime libraries make use of
thread-specific data. For instance if you try to reference the UPC
built-in "MYTHREAD" from a pthread that you have spawned it will almost
certainly fail because in the thread-safe library it is implemented via
thread-specific data that has only been allocated/initialized for the
thread(s) that the UPC runtime has spawned. So, it appears you would
require a runtime configuration that provides for the thread safe
invocation of UPC built-ins but without assigning each thread an
individual UPC-level identity. Unfortunately, we have not implemented
such a configuration. I don't have any immediate estimate of what
effort would be required, but for our support of the TotalView debugger
we did implement a mode in which there is a single UPC thread per
process plus a non-UPC thread to ensure remote accesses could progress
even when the debugger had frozen the UPC thread. That work means that
within the UPC runtime there is already a separation of thread-safety
into two distinct parts:
UPCRI_SUPPORT_PTHREADS - the UPC runtime is thread-safe (to at least
the extent needed for the TotalView support) and calls thread-safe GASNet
UPCRI_UPC_PTHREADS - UPC threads are implemented as multiple pthreads
per process.
I suspect, but cannot verify, that linking in the libraries intended for
TotalView support will get you most of the way to what you appear to
need. For instance, I believe (with 90%+ certainty) that MYTHREAD will
not utilize thread-specific data in the "tv" version of libupcr. If
you are lucky the "tv" libraries might be sufficient for what you want.
If I (and my peers here is\n Berkeley) were not so busy right now with
several deadlines rapidly approaching, I'd be interested in helping you
at least conduct the experiment to see if the "tv" libraries work for
you. Having support for the sort of hybrid programming you describe
would be interesting to us. So, I hope you can keep us up-to-date on
any progress you make.
I am not certain what you are referring to when you say the IB driver
only allows one snd/rcv buffer per process. We support upto
GASNET_NETWORKDEPTH_PP IB-level operations in flight to a given peer
(default 64), or GASNET_NETWORKDEPTH_TOTAL operations outstanding total
(default is computed from HCA resource limits), and will stall waiting
for at least one outstanding operation to complete when either limit is
reached.
-Paul
Lingyuan Wang wrote:
> Greetings,
>
> I am concerning about the thread safety of UPC memory copy library
> functions, whether it is safe to call those routines in parallel from
> multiple threads?
>
> I use a multi-threading layer of Pthread on top of each UPC thread, at
> some points of the program I need to do all-to-all communications
> among all UPC threads. I am looking to call UPC memory copy functions
> directly from my pool of Pthreads in parallel, since the data is
> sliced locally (and it would be less efficient to pack the data for a
> bulk synchronized communication). However, I got segmentation fault
> when run more than one thread per thread pool, and the code works fine
> for single thread per pool cases.
>
> I am using the IBV conduit of Berkeley UPC 2.10, with PSHM enabled. I
> am aware the fact that InfiniBand driver allows one send/receive
> buffer per process. My further questions regarding the thread safety
> are, is is a network driver/conduit specific issue, or a general
> gasnet/BUPC runtime restriction? As I can not find any thread safety
> definition from the UPC spec, would it be possible to support it
> potentially? And how does the UPC native Pthead conduit handle it?
> Thanks in advance.
>
> --
> Regards
>
--
Paul H. Hargrove PHHargrove_at_lbl_dot_gov
Future Technologies Group Tel: +1-510-495-2352
HPC Research Department Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory