From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Mar 10 2010 - 16:49:47 PST
Lingyuan, The only efforts I am aware of to layer threads on top of UPC have been analogous to MPI_THREAD_FUNNELLED, in which one is permitted to have multiple threads running but only one representative thread is permitted to call (even implicitly) any code in the UPC runtime library. If there is anybody on this list with success with any other ways to mix threads with UPC, I'd be interested to learn about it. The Berkeley UPC runtime and GASNet communication runtime libraries are built in both thread-safe ("par") and non thread-safe ("seq") configurations. If one passes -pthreads to the upcc command, then the thread-safe version is linked, and otherwise the non thread-safe version is used. My initial guess is that your segmentation fault is related to use of the non thread-safe libraries. That can be resolved by passing -pthreads=1 to upcc. This will ensure that the thread-safe libraries are linked, but will run with one UPC thread per process. However, that is probably not enough... read on. I cannot be certain that a hybrid UPC+pthreads program as you describe will work. The reason I am uncertain is that thread-safe versions of both the UPC and GASNet runtime libraries make use of thread-specific data. For instance if you try to reference the UPC built-in "MYTHREAD" from a pthread that you have spawned it will almost certainly fail because in the thread-safe library it is implemented via thread-specific data that has only been allocated/initialized for the thread(s) that the UPC runtime has spawned. So, it appears you would require a runtime configuration that provides for the thread safe invocation of UPC built-ins but without assigning each thread an individual UPC-level identity. Unfortunately, we have not implemented such a configuration. I don't have any immediate estimate of what effort would be required, but for our support of the TotalView debugger we did implement a mode in which there is a single UPC thread per process plus a non-UPC thread to ensure remote accesses could progress even when the debugger had frozen the UPC thread. That work means that within the UPC runtime there is already a separation of thread-safety into two distinct parts: UPCRI_SUPPORT_PTHREADS - the UPC runtime is thread-safe (to at least the extent needed for the TotalView support) and calls thread-safe GASNet UPCRI_UPC_PTHREADS - UPC threads are implemented as multiple pthreads per process. I suspect, but cannot verify, that linking in the libraries intended for TotalView support will get you most of the way to what you appear to need. For instance, I believe (with 90%+ certainty) that MYTHREAD will not utilize thread-specific data in the "tv" version of libupcr. If you are lucky the "tv" libraries might be sufficient for what you want. If I (and my peers here is\n Berkeley) were not so busy right now with several deadlines rapidly approaching, I'd be interested in helping you at least conduct the experiment to see if the "tv" libraries work for you. Having support for the sort of hybrid programming you describe would be interesting to us. So, I hope you can keep us up-to-date on any progress you make. I am not certain what you are referring to when you say the IB driver only allows one snd/rcv buffer per process. We support upto GASNET_NETWORKDEPTH_PP IB-level operations in flight to a given peer (default 64), or GASNET_NETWORKDEPTH_TOTAL operations outstanding total (default is computed from HCA resource limits), and will stall waiting for at least one outstanding operation to complete when either limit is reached. -Paul Lingyuan Wang wrote: > Greetings, > > I am concerning about the thread safety of UPC memory copy library > functions, whether it is safe to call those routines in parallel from > multiple threads? > > I use a multi-threading layer of Pthread on top of each UPC thread, at > some points of the program I need to do all-to-all communications > among all UPC threads. I am looking to call UPC memory copy functions > directly from my pool of Pthreads in parallel, since the data is > sliced locally (and it would be less efficient to pack the data for a > bulk synchronized communication). However, I got segmentation fault > when run more than one thread per thread pool, and the code works fine > for single thread per pool cases. > > I am using the IBV conduit of Berkeley UPC 2.10, with PSHM enabled. I > am aware the fact that InfiniBand driver allows one send/receive > buffer per process. My further questions regarding the thread safety > are, is is a network driver/conduit specific issue, or a general > gasnet/BUPC runtime restriction? As I can not find any thread safety > definition from the UPC spec, would it be possible to support it > potentially? And how does the UPC native Pthead conduit handle it? > Thanks in advance. > > -- > Regards > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory