From: Dan Bonachea (bonachea_at_cs_dot_berkeley_dot_edu)
Date: Tue Feb 15 2005 - 17:16:32 PST
At 01:20 PM 2/15/2005, Hung-Hsun Su wrote: >I have been trying to understand the intermediate code generated by a dummy >upc program (see attached) that I've put together that covers all the >construct as in UPC spec 1.1 and I have some questions I would like to ask: Many of the questions you're asking are answered in the UPC Runtime spec and two aux notes documents, which are here: http://upc.lbl.gov/docs/system/ I'll try to answer some of the specific implementation questions which are not covered in the documented interfaces above, however any implementation details below those interfaces are considered subject to change at any time without notice (and some may even depend on platform and/or various configure options, etc). The translator never makes assumptions about anything below those interfaces. >1. For the declaration of shared variables and pointers, I see that sometimes >upcr_pshared_ptr_t is used while other times upcr_shared_ptr_t is used. What >is the rule in determining which to use? More generally, what are the rules >to translate shared variables and pointers? see docs above. >2. For lock declaration, What is the purpose of UPCR_TLD_DEFINE_TENTATIVE? see docs above. >3. For MYTHREAD translations: > > 3a. Where can I find some more information on the variable parg? what is > this use for? Don't know what parg you're talking about... > 3b. What is the purpose of the multiple indirection? i.e. why go through > the following translation > [MYTHREAD-->upcr_mythread()-->upcri_mypthreadinfo()->mythread-->_upcr_pthreadinfo->mythread > = pargs->mythread] rather than directly using the later variables? This also > applies to some other translation as sometimes I would see one thing > translated to another and then to yet another one, why is this necessary? Most of the translations above are macros and therefore have no runtime cost. Some real indirection is necessary to perform thread identification in pthread configurations, where we merge all the pthread information for performance reasons (to amortize getspecific calls) - hence the _upcr_pthreadinfo->mythread . There is no "MYTHREAD" variable after translation in pthreads mode (nor can there be). In non-pthread mode, there is such a variable and some of the macros above expand differently to use it. Most of the macros are there to preserve the runtime interface documented above, and/or to provide different expansions based on the build configuration. >4. For UPC_MAX_BLOCK_SIZE, upc_localsizeof, etc. Are these always compile >time constant? If so, what are things that will always be compile time >constant? runtime constants? UPC spec mandates these to be compile-time constants - see the UPC spec for a complete list. UPC_MAX_BLOCK_SIZE is implemented using a preprocessor define (as required by UPC spec) upc_*of() operators are expanded by the translator using type information. >5. For upcr_pshared_to_local(), why would this only work if shared >variableis own by calling thread? And similar to 3b, why is it necessary to >use the intermediate variable such as Mcvtptr_bupc_2? UPC spec mandates that casting a ptr-to-shared to a ptr-to-local has undefined behavior if the pts target does not have local affinity. UPCR flags this as a runtime error in debug mode. The translator generates various intermediate temporary variables in the process of code generation. It simplifies code-generation significantly (makes it much easier to get it correct), although sometimes to a human reader they look extraneous. Good C optimizers will fold away useless temporaries anyhow. >6. In _upcr_notify, will upcri_mypthreads() = #threads? When does this get >assigned? upcri_mypthreads() is the number of pthreads in the local process. >7. Where is the actual definition of upcr_barrier()? I've looked through all >the .c and .h file at the upcr level and I've not been able to find this It's a macro in upcr_postinclude/upcr_proxy.h >8. For upc_for_all, What is the use for upc_forall_control? What is the >general mechanism behind the implementation of upc_for_all? upc_forall_control is used to provide the difference in behavior based on dynamic scope required by the UPC spec (ie decides which forall is the "controlling" upc_forall loop). The translation mechanism is best seen by looking at examples, although we're currently doing optimizer work to improve it - see: http://upc.lbl.gov/publications/wychen-master-report.pdf >9. For UPCR_GLOBAL_ALLOC, What is the purpose of UPCRI_ALLOCCALLER? The memory allocation functions pass along a small token which it used to improve the information provided by the error message for out-of-memory errors. >10. For UPCR_ALL_ALLOC and UPCR_ALL_LOCK_ALLOC, I see that there is an >all-to-all communication going on, why is this necessary? Both require that all threads return the same value, which is broadcast from zero who performs the allocation. See docs above for full explanation of our shared memory allocation strategy. >11. Where is UPCR_ALLOC defined? upcr_postinclude/upcr_proxy.h >12. For upc_threadof() --> UPCR_THREADOF_PSHARED, What is UPCRI_PACKED_SPTR >and how is it used? We have several different implementations for the pointer-to-shared representation. UPCRI_PACKED_SPTR packs them into 64-bit ints. We also have a struct based version, and a third new one called "symmetric" that does more sophisticated platform-specific tricks (and which therefore is only available on certain configurations). >13. I couldn't find the translation for upc_resetphase(), is this supported >and how? It's a no-op for pshared, and uses UPCR_SHARED_RESETPHASE for shared. >14. Where is upcaffinitysize defined? Your test misspells it - the correct spelling is upc_affinitysize(). >15. For _upcr_free() > > 15a. I see that there are checks in upcri_do_local_free() that is really > not necessary as far as I can tell since this function can only be called if > the checking is true. Is there another purpose for the multiple check other > than to provide extra protection? If you mean the upcri_assert() calls, that's true of all assertions - they are sanity checks which should never fail (but if they do, it means the system has a bug which needs fixed). Note that all assertions and most safety checks are automatically disabled when not in debug mode, so they never hurt production performance. > 15b. Where is upcra_sharedlocal_free() defined? > > 15c. Where is upcra_sharedglobal_free() defined? see umalloc directory >16. Is UPCR_LOCK_ATTEMPT() equivalent to upcr_lock()? If so, why? If not, can >you explain how they are implemented differently as they seem the same to me. See UPC spec - UPCR_LOCK_ATTEMPT is non-blocking, UPCR_LOCK blocks until acquire. >17. For UPCR_LOCK_FREE, why is an AM short send out even if the lock owner is >local? Simplicity. If the AM is loopback, GASNet expands it to a synchronous call anyhow - none of our conduits should use the network hardware in that case. >18. What��s the purpose of UPCRI_PASS_GAS() in all the memput/memget >operations? Passes GASNet the pthread identification information it needs, as an optimization to prevent a duplicate lookup inside GASNet. You can blame Jason for the offensive name.. :) >19. Is there optimization going on at the upc runtime level that aggregates >communication? If so, what is the logic for this? Not currently - all such optimizations are in the translator, although once the UPCR caching layer is complete, that will also perform some aggregation. Dan > > >Thanks in advance! > > > >Hung-Hsun > >----------------------------------------------------------------------------------------------------------- >Sincerely, > >Hung-Hsun Su > >Ph.D. Student, UPC Group Leader, Research Assistant, Teaching Assistant >High-performance Computing and Simulation (HCS) Research Laboratory >Dept. of Electrical and Computer Engineering , University of Florida, >Gainesville, FL 32611-6200 >Email: <mailto:[email protected]>su_at_hcs_dot_ufl_dot_edu, ><mailto:hunghsun_at_ufl_dot_edu>hunghsun_at_ufl_dot_edu >------------------------------------------------------------------------------------------------------------