Re: Atomic set for double

Date view	Thread view	Subject view	Author view	Attachment view

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Feb 12 2010 - 10:24:44 PST

Next message: Nikita Andreev: "Compilation error for instrumented code"

Previous message: Dorian Krause: "Re: Atomic set for double"
In reply to: Dorian Krause: "Re: Atomic set for double"

Dorian Krause wrote:
>
>> There are two functions for allocating a UPC lock (static definitions
>> are prohibited):
>>    upc_all_lock_alloc() is called collectively and all UPC threads
>> receive the same pointer
>>    upc_global_lock_alloc() is called by a single thread.
>>
>> In Berkeley UPC we do allocate the locks from the UPC shared heap:
>>     In the case of the collective upc_all_lock_alloc() all such
>> allocations ARE from thread 0.
>>     In the case of the non-collective upc_global_lock_alloc() the
>> allocation is local to the calling thread.
>>    
>
> Nikita,
> Paul,
>
> thanks a lot. Changing calls from upc_all_lock_alloc() to 
> upc_global_lock_alloc() indeed makes a difference (not in run-time, 
> though ...).
>
> Still I see that upc_lock_attempt and upc_lock require active 
> participation by the thread "hosting" a lock. To show what I mean, I 
> attached a test program which lets a master thread sleep and a slave 
> tries to acquire a lock. On my system (Opteron, ibv) I see the output:
>
> pvfs2-compute-2-13% upcrun -n 2 ./test_upc
> UPCR: UPC thread 0 of 2 on pvfs2-compute-2-13.local (process 0 of 2, 
> pid=8829)
> UPCR: UPC thread 1 of 2 on pvfs2-compute-2-13.local (process 1 of 2, 
> pid=8830)
>  [0]: Done sleeping.
>  [1]: Got the lock; took me 7.462001e-02
>  [0]: Done sleeping.
>  [1]: Got the lock; took me 1.491880e-01
>
> I don't quiet understand these results. Can't the lock simply be 
> implemented using e.g. bupc_atomic* functions without this "two-sided" 
> behavior?
>
> Dorian
>
>

Dorian,

  I am sorry I hadn't answered the second part of your question - the 
part about progress.

  You are correct that the Berkeley UPC runtime library relies on the 
process with affinity to the lock making entries to the library in order 
to make progress.

  As for implementing locks via atomics, it is not that simple.  First 
there is the simple matter that the locks were implemented years earlier 
than the atomics.  The second reason is that the likely implementation 
of locks via atomics would require polling across the network.  In other 
words a upc_lock() call that found a lock held by another thread would 
keep making atomics calls over-and-over until it acquired the lock 
creating a potential "storm" of network traffic (there are possible 
solutions to that, but they are complex).  And finally, in the case of 
networks without RDMA (such as UDP and MPI) atomics use the same 
ActiveMessage mechanism as locks and thus suffer from the same progress 
problems and an atomics-based implementation would potentially require 
MORE attentiveness from the remote threads than the current 
implementation.  I agree that it would be nice if we could do better 
than we do now.

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
Future Technologies Group                 Tel: +1-510-495-2352
HPC Research Department                   Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory

Next message: Nikita Andreev: "Compilation error for instrumented code"

Previous message: Dorian Krause: "Re: Atomic set for double"
In reply to: Dorian Krause: "Re: Atomic set for double"

Date view	Thread view	Subject view	Author view	Attachment view