Re: shared data and operator

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Feb 17 2009 - 13:12:09 PST

  • Next message: Paul H. Hargrove: "Re: Support for NVIDIA Tesla"
    Tahar,
    
        You are correct that in UPC accessing data that does not have 
    affinity to the thread issuing the access will result in communication.  
    This does not require any matching operation be performed by the thread 
    that does have the affinity to the data.  This one-sided communication 
    model is one of the key programmer productivity features of UPC and 
    related language efforts.  So, the answer to "is it possible to do this" 
    as a resounding YES.
       I don't have a simple answer to the question of the performance 
    "hit".  You are expecting things to be "poor", but in many cases use of 
    UPC may help performance.  For one thing, it will depend greatly on the 
    platform where you run.  For instance, for a run using pthreads on a 
    wide multiprocessor machine, the code should run faster than MPI would 
    on the same problem.  Similarly, there are many modern networks (such as 
    InifiniBand) where the UPC code should run faster than an MPI code that 
    was written in the same "fine-grained" style (meaning the ghost cells 
    are communicated one at a time).  However, in general we can expect a 
    "fine-grained" UPC program will run less efficiently than a 
    "course-grained" MPI program (in which an entire vector of ghost cells 
    are communicated in a single MPIsend/recv pair).  For that reason, our 
    compiler (and others) have a limited ability to discover certain 
    communications patterns (such as fetching an entire row of ghost cells 
    from a neighbor) and move the communication outside of the loop.  I am 
    not the expert on that, but perhaps another member of our group can say 
    something about how to enable this compiler optimization and what 
    patterns can be recognized.  The alternative, as one may guess, is to 
    manually change fine-grained communication to course-grained by grouping 
    your communication into copies of entire vectors of ghost cells.  The 
    typical way to do this for contiguous data is to copy from a shared 
    array to a private array (for instance one allocated using malloc()), 
    with the upc_memget() function.  For non-contiguous data, the Berkeley 
    UPC compiler offers the "VIS" extensions (see second half of 
    http://upc.gwu.edu/~upc/upcworkshop04/bonachea-memcpy-0904-final.pdf and 
    Section 5 of http://upc.lbl.gov/publications/upc_memcpy.pdf).  However, 
    those are not (yet) a portable option.
    
    -Paul
    
    Tahar Amari wrote:
    > Hello,
    >
    > A naive question :
    >
    >  I am trying to understand the manual for some typical application.
    > SUppose that I distribute my data as hared one blocks (it is supposed 
    > to be for example a 
    > 2D grid with scalar number defined at the center of each cell)
    >
    > Then I perfom some operation that needs information of neighbouring 
    > blocs (typically
    > like in finite difference computing a laplacian in each block). This 
    > operation will access
    > some values close to the "interface " of the block, so with no 
    > "affinity" with the local block.
    >
    > With MPI we need to communicate (send and get) this precise type of 
    > needed data , known as "ghost" cells
    > and then perform the operator.
    >
    > with UPC, if I do not do anything special (unlike in MPI)  is it 
    > possible to do this ?
    >
    > If yes, I guess that UPC will do the communication, and therefore if 
    > nothing special tells
    > what data to comunicate,  this is where penalty will be big ?
    >
    >
    > Has anyone tested a simple laplacian on a square grid, with simple 
    > shared data, 
    > and measured how much we loose or not , compared to the method in which
    > one will do almost as much effort as with MPI ?
    >
    > Many thanks
    >
    > Tahar
    >
    >
    > --------------------------------------------
    >
    > T. Amari
    >
    > Centre de Physique Theorique
    >
    > Ecole Polytechnique
    >
    > 91128 Palaiseau Cedex France
    >
    > tel : 33 1 69 33 42 52
    >
    > fax: 33 1 69 33 30 08
    >
    > email: <mailto:[email protected]>
    >
    > URL : http://www.cpht.polytechnique.fr/cpht/amari
    >
    >
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Paul H. Hargrove: "Re: Support for NVIDIA Tesla"