Re: Atomic set for double

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Feb 12 2010 - 10:24:44 PST

  • Next message: Nikita Andreev: "Compilation error for instrumented code"
    Dorian Krause wrote:
    >
    >> There are two functions for allocating a UPC lock (static definitions
    >> are prohibited):
    >>    upc_all_lock_alloc() is called collectively and all UPC threads
    >> receive the same pointer
    >>    upc_global_lock_alloc() is called by a single thread.
    >>
    >> In Berkeley UPC we do allocate the locks from the UPC shared heap:
    >>     In the case of the collective upc_all_lock_alloc() all such
    >> allocations ARE from thread 0.
    >>     In the case of the non-collective upc_global_lock_alloc() the
    >> allocation is local to the calling thread.
    >>    
    >
    > Nikita,
    > Paul,
    >
    > thanks a lot. Changing calls from upc_all_lock_alloc() to 
    > upc_global_lock_alloc() indeed makes a difference (not in run-time, 
    > though ...).
    >
    > Still I see that upc_lock_attempt and upc_lock require active 
    > participation by the thread "hosting" a lock. To show what I mean, I 
    > attached a test program which lets a master thread sleep and a slave 
    > tries to acquire a lock. On my system (Opteron, ibv) I see the output:
    >
    > pvfs2-compute-2-13% upcrun -n 2 ./test_upc
    > UPCR: UPC thread 0 of 2 on pvfs2-compute-2-13.local (process 0 of 2, 
    > pid=8829)
    > UPCR: UPC thread 1 of 2 on pvfs2-compute-2-13.local (process 1 of 2, 
    > pid=8830)
    >  [0]: Done sleeping.
    >  [1]: Got the lock; took me 7.462001e-02
    >  [0]: Done sleeping.
    >  [1]: Got the lock; took me 1.491880e-01
    >
    > I don't quiet understand these results. Can't the lock simply be 
    > implemented using e.g. bupc_atomic* functions without this "two-sided" 
    > behavior?
    >
    > Dorian
    >
    >
    
    Dorian,
    
      I am sorry I hadn't answered the second part of your question - the 
    part about progress.
    
      You are correct that the Berkeley UPC runtime library relies on the 
    process with affinity to the lock making entries to the library in order 
    to make progress.
     
      As for implementing locks via atomics, it is not that simple.  First 
    there is the simple matter that the locks were implemented years earlier 
    than the atomics.  The second reason is that the likely implementation 
    of locks via atomics would require polling across the network.  In other 
    words a upc_lock() call that found a lock held by another thread would 
    keep making atomics calls over-and-over until it acquired the lock 
    creating a potential "storm" of network traffic (there are possible 
    solutions to that, but they are complex).  And finally, in the case of 
    networks without RDMA (such as UDP and MPI) atomics use the same 
    ActiveMessage mechanism as locks and thus suffer from the same progress 
    problems and an atomics-based implementation would potentially require 
    MORE attentiveness from the remote threads than the current 
    implementation.  I agree that it would be nice if we could do better 
    than we do now.
    
    -Paul
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Nikita Andreev: "Compilation error for instrumented code"