Re: shared data and operator

From: Tahar Amari (amari_at_cpht.polytechnique.fr)
Date: Wed Feb 18 2009 - 01:34:35 PST

  • Next message: Si Hammond: "(no subject)"
    Paul ,
    
    Thank you very much for this very nice explanation.
    I understand , and you put this in a very clear way.
    
    It would indeed be very interesting to have an answer to what you  
    suggested below (to avoid doing it "hard coded" in the MPI way)
    
    > .  For that reason, our compiler (and others) have a limited ability  
    > to discover certain communications patterns (such as fetching an  
    > entire row of ghost cells from a neighbor) and move the  
    > communication outside of the loop.  I am not the expert on that, but  
    > perhaps another member of our group can say something about how to  
    > enable this compiler optimization and what patterns can be recognized.
    
    
    If someone else knows how  to do this, it would be very helpful .
    
    Many thanks again,
    
    
    
    Tahar
    
    
    
    --------------------------------------------
    T. Amari
    Centre de Physique Theorique
    Ecole Polytechnique
    91128 Palaiseau Cedex France
    tel : 33 1 69 33 42 52
    fax: 33 1 69 33 30 08
    email: <mailto:amari@cpht.polytechnique.fr>
    URL : http://www.cpht.polytechnique.fr/cpht/amari
    
    
    Le 17 févr. 09 à 22:12, Paul H. Hargrove a écrit :
    
    > Tahar,
    >
    >   You are correct that in UPC accessing data that does not have  
    > affinity to the thread issuing the access will result in  
    > communication.  This does not require any matching operation be  
    > performed by the thread that does have the affinity to the data.   
    > This one-sided communication model is one of the key programmer  
    > productivity features of UPC and related language efforts.  So, the  
    > answer to "is it possible to do this" as a resounding YES.
    >  I don't have a simple answer to the question of the performance  
    > "hit".  You are expecting things to be "poor", but in many cases use  
    > of UPC may help performance.  For one thing, it will depend greatly  
    > on the platform where you run.  For instance, for a run using  
    > pthreads on a wide multiprocessor machine, the code should run  
    > faster than MPI would on the same problem.  Similarly, there are  
    > many modern networks (such as InifiniBand) where the UPC code should  
    > run faster than an MPI code that was written in the same "fine- 
    > grained" style (meaning the ghost cells are communicated one at a  
    > time).  However, in general we can expect a "fine-grained" UPC  
    > program will run less efficiently than a "course-grained" MPI  
    > program (in which an entire vector of ghost cells are communicated  
    > in a single MPIsend/recv pair).  For that reason, our compiler (and  
    > others) have a limited ability to discover certain communications  
    > patterns (such as fetching an entire row of ghost cells from a  
    > neighbor) and move the communication outside of the loop.  I am not  
    > the expert on that, but perhaps another member of our group can say  
    > something about how to enable this compiler optimization and what  
    > patterns can be recognized.  The alternative, as one may guess, is  
    > to manually change fine-grained communication to course-grained by  
    > grouping your communication into copies of entire vectors of ghost  
    > cells.  The typical way to do this for contiguous data is to copy  
    > from a shared array to a private array (for instance one allocated  
    > using malloc()), with the upc_memget() function.  For non-contiguous  
    > data, the Berkeley UPC compiler offers the "VIS" extensions (see  
    > second half of http://upc.gwu.edu/~upc/upcworkshop04/bonachea-memcpy-0904-final.pdf 
    >  and Section 5 of http://upc.lbl.gov/publications/upc_memcpy.pdf).   
    > However, those are not (yet) a portable option.
    >
    > -Paul
    >
    > Tahar Amari wrote:
    >> Hello,
    >>
    >> A naive question :
    >>
    >> I am trying to understand the manual for some typical application.
    >> SUppose that I distribute my data as hared one blocks (it is  
    >> supposed to be for example a 2D grid with scalar number defined at  
    >> the center of each cell)
    >>
    >> Then I perfom some operation that needs information of neighbouring  
    >> blocs (typically
    >> like in finite difference computing a laplacian in each block).  
    >> This operation will access
    >> some values close to the "interface " of the block, so with no  
    >> "affinity" with the local block.
    >>
    >> With MPI we need to communicate (send and get) this precise type of  
    >> needed data , known as "ghost" cells
    >> and then perform the operator.
    >>
    >> with UPC, if I do not do anything special (unlike in MPI)  is it  
    >> possible to do this ?
    >>
    >> If yes, I guess that UPC will do the communication, and therefore  
    >> if nothing special tells
    >> what data to comunicate,  this is where penalty will be big ?
    >>
    >>
    >> Has anyone tested a simple laplacian on a square grid, with simple  
    >> shared data, and measured how much we loose or not , compared to  
    >> the method in which
    >> one will do almost as much effort as with MPI ?
    >>
    >> Many thanks
    >>
    >> Tahar
    >>
    >>
    >> --------------------------------------------
    >>
    >> T. Amari
    >>
    >> Centre de Physique Theorique
    >>
    >> Ecole Polytechnique
    >>
    >> 91128 Palaiseau Cedex France
    >>
    >> tel : 33 1 69 33 42 52
    >>
    >> fax: 33 1 69 33 30 08
    >>
    >> email: <mailto:amari@cpht.polytechnique.fr>
    >>
    >> URL : http://www.cpht.polytechnique.fr/cpht/amari
    >>
    >>
    >
    >
    > -- 
    > Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > Future Technologies Group                 Tel: +1-510-495-2352
    > HPC Research Department                   Fax: +1-510-486-6900
    > Lawrence Berkeley National Laboratory
    

  • Next message: Si Hammond: "(no subject)"