Re: UPC Runtime System Questions

From: Dan Bonachea (bonachea_at_cs_dot_berkeley_dot_edu)
Date: Tue Feb 15 2005 - 17:16:32 PST

  • Next message: jcduell_at_lbl_dot_gov: "Re: upc on phoenix"
    At 01:20 PM 2/15/2005, Hung-Hsun Su wrote:
    >I have been trying to understand the intermediate code generated by a dummy 
    >upc program (see attached) that I've put together that covers all the 
    >construct as in UPC spec 1.1 and I have some questions I would like to ask:
    
    Many of the questions you're asking are answered in the UPC Runtime spec and 
    two aux notes documents, which are here:
    
    http://upc.lbl.gov/docs/system/
    
    I'll try to answer some of the specific implementation questions which are not 
    covered in the documented interfaces above, however any implementation details 
    below those interfaces are considered subject to change at any time without 
    notice (and some may even depend on platform and/or various configure options, 
    etc). The translator never makes assumptions about anything below those 
    interfaces.
    
    >1. For the declaration of shared variables and pointers, I see that sometimes 
    >upcr_pshared_ptr_t is used while other times upcr_shared_ptr_t is used. What 
    >is the rule in determining which to use? More generally, what are the rules 
    >to translate shared variables and pointers?
    
    see docs above.
    
    >2. For lock declaration, What is the purpose of UPCR_TLD_DEFINE_TENTATIVE?
    
    see docs above.
    
    >3. For MYTHREAD translations:
    >
    >   3a. Where can I find some more information on the variable parg? what is 
    > this use for?
    
    Don't know what parg you're talking about...
    
    >   3b. What is the purpose of the multiple indirection? i.e. why go through 
    > the following translation 
    > [MYTHREAD-->upcr_mythread()-->upcri_mypthreadinfo()->mythread-->_upcr_pthreadinfo->mythread 
    > = pargs->mythread] rather than directly using the later variables? This also 
    > applies to some other translation as sometimes I would see one thing 
    > translated to another and then to yet another one, why is this necessary?
    
    Most of the translations above are macros and therefore have no runtime cost. 
    Some real indirection is necessary to perform thread identification in pthread 
    configurations, where we merge all the pthread information for performance 
    reasons (to amortize getspecific calls) - hence the 
    _upcr_pthreadinfo->mythread . There is no "MYTHREAD" variable after 
    translation in pthreads mode (nor can there be). In non-pthread mode, there is 
    such a variable and some of the macros above expand differently to use it. 
    Most of the macros are there to preserve the runtime interface documented 
    above, and/or to provide different expansions based on the build 
    configuration.
    
    
    >4. For UPC_MAX_BLOCK_SIZE,  upc_localsizeof, etc. Are these always compile 
    >time constant? If so, what are things that will always be compile time 
    >constant? runtime constants?
    
    UPC spec mandates these to be compile-time constants - see the UPC spec for a 
    complete list.  UPC_MAX_BLOCK_SIZE is implemented using a preprocessor define 
    (as required by UPC spec) upc_*of() operators are expanded by the translator 
    using type information.
    
    
    >5. For upcr_pshared_to_local(),  why would this only work if shared 
    >variableis  own by calling thread? And similar to 3b, why is it necessary to 
    >use the intermediate variable such as Mcvtptr_bupc_2?
    
    UPC spec mandates that casting a ptr-to-shared to a ptr-to-local has undefined 
    behavior if the pts target does not have local affinity. UPCR flags this as a 
    runtime error in debug mode. The translator generates various intermediate 
    temporary variables in the process of code generation. It simplifies 
    code-generation significantly (makes it much easier to get it correct), 
    although sometimes to a human reader they look extraneous. Good C optimizers 
    will fold away useless temporaries anyhow.
    
    >6. In _upcr_notify, will upcri_mypthreads() = #threads? When does this get 
    >assigned?
    
      upcri_mypthreads() is the number of pthreads in the local process.
    
    >7. Where is the actual definition of upcr_barrier()? I've looked through all 
    >the .c and .h file at the upcr level and I've not been able to find this
    
    It's a macro in upcr_postinclude/upcr_proxy.h
    
    
    >8. For upc_for_all, What is the use for upc_forall_control? What is the 
    >general mechanism behind the implementation of upc_for_all?
    
    upc_forall_control is used to provide the difference in behavior based on 
    dynamic scope required by the UPC spec (ie decides which forall is the 
    "controlling" upc_forall loop).
    The translation mechanism is best seen by looking at examples, although we're 
    currently doing optimizer work to improve it - see:
    http://upc.lbl.gov/publications/wychen-master-report.pdf
    
    
    >9. For UPCR_GLOBAL_ALLOC, What is the purpose of UPCRI_ALLOCCALLER?
    
    The memory allocation functions pass along a small token which it used to 
    improve the information provided by the error message for out-of-memory 
    errors.
    
    >10. For UPCR_ALL_ALLOC and UPCR_ALL_LOCK_ALLOC, I see that there is an 
    >all-to-all communication going on, why is this necessary?
    
    Both require that all threads return the same value, which is broadcast from 
    zero who performs the allocation. See docs above for full explanation of our 
    shared memory allocation strategy.
    
    
    >11. Where is UPCR_ALLOC defined?
    
    upcr_postinclude/upcr_proxy.h
    
    
    >12. For upc_threadof() --> UPCR_THREADOF_PSHARED, What is UPCRI_PACKED_SPTR 
    >and how is it used?
    
    We have several different implementations for the pointer-to-shared 
    representation. UPCRI_PACKED_SPTR  packs them into 64-bit ints. We also have a 
    struct based version, and a third new one called "symmetric" that does more 
    sophisticated platform-specific tricks (and which therefore is only available 
    on certain configurations).
    
    >13. I couldn't find the translation for upc_resetphase(), is this supported 
    >and how?
    
    It's a no-op for pshared, and uses UPCR_SHARED_RESETPHASE for shared.
    
    
    >14. Where is upcaffinitysize defined?
    
    Your test misspells it - the correct spelling is upc_affinitysize().
    
    
    >15. For _upcr_free()
    >
    >     15a. I see that there are checks in upcri_do_local_free() that is really 
    > not necessary as far as I can tell since this function can only be called if 
    > the checking is true. Is there another purpose for the multiple check other 
    > than to provide extra protection?
    
    If you mean the upcri_assert() calls, that's true of all assertions - they are 
    sanity checks which should never fail (but if they do, it means the system has 
    a bug which needs fixed). Note that all assertions and most safety checks are 
    automatically disabled when not in debug mode, so they never hurt production 
    performance.
    
    
    >     15b. Where is upcra_sharedlocal_free() defined?
    >
    >     15c. Where is upcra_sharedglobal_free() defined?
    
    see umalloc directory
    
    
    >16. Is UPCR_LOCK_ATTEMPT() equivalent to upcr_lock()? If so, why? If not, can 
    >you explain how they are implemented differently as they seem the same to me.
    
    See UPC spec - UPCR_LOCK_ATTEMPT is non-blocking, UPCR_LOCK blocks until 
    acquire.
    
    >17. For UPCR_LOCK_FREE, why is an AM short send out even if the lock owner is 
    >local?
    
    Simplicity. If the AM is loopback, GASNet expands it to a synchronous call 
    anyhow - none of our conduits should use the network hardware in that case.
    
    
    >18. What��s the purpose of UPCRI_PASS_GAS() in all the memput/memget 
    >operations?
    
    Passes GASNet the pthread identification information it needs, as an 
    optimization to prevent a duplicate lookup inside GASNet. You can blame Jason 
    for the offensive name.. :)
    
    >19. Is there optimization going on at the upc runtime level that aggregates 
    >communication? If so, what is the logic for this?
    
    Not currently - all such optimizations are in the translator, although once 
    the UPCR caching layer is complete, that will also perform some aggregation.
    
    Dan
    
    
    >
    >
    >Thanks in advance!
    >
    >
    >
    >Hung-Hsun
    >
    >-----------------------------------------------------------------------------------------------------------
    >Sincerely,
    >
    >Hung-Hsun Su
    >
    >Ph.D. Student, UPC Group Leader, Research Assistant, Teaching Assistant
    >High-performance Computing and Simulation (HCS) Research Laboratory
    >Dept. of Electrical and Computer Engineering , University of Florida,
    >Gainesville, FL 32611-6200
    >Email: <mailto:[email protected]>su_at_hcs_dot_ufl_dot_edu, 
    ><mailto:hunghsun_at_ufl_dot_edu>hunghsun_at_ufl_dot_edu
    >------------------------------------------------------------------------------------------------------------
    

  • Next message: jcduell_at_lbl_dot_gov: "Re: upc on phoenix"