Re: shared data and operator

Date view	Thread view	Subject view	Author view	Attachment view

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Feb 17 2009 - 13:12:09 PST

Next message: Paul H. Hargrove: "Re: Support for NVIDIA Tesla"

Previous message: Jeff Glickman: "Support for NVIDIA Tesla"
In reply to: Tahar Amari: "shared data and operator"
Next in thread: Tahar Amari: "Re: shared data and operator"
Reply: Tahar Amari: "Re: shared data and operator"

Tahar,

    You are correct that in UPC accessing data that does not have 
affinity to the thread issuing the access will result in communication.  
This does not require any matching operation be performed by the thread 
that does have the affinity to the data.  This one-sided communication 
model is one of the key programmer productivity features of UPC and 
related language efforts.  So, the answer to "is it possible to do this" 
as a resounding YES.
   I don't have a simple answer to the question of the performance 
"hit".  You are expecting things to be "poor", but in many cases use of 
UPC may help performance.  For one thing, it will depend greatly on the 
platform where you run.  For instance, for a run using pthreads on a 
wide multiprocessor machine, the code should run faster than MPI would 
on the same problem.  Similarly, there are many modern networks (such as 
InifiniBand) where the UPC code should run faster than an MPI code that 
was written in the same "fine-grained" style (meaning the ghost cells 
are communicated one at a time).  However, in general we can expect a 
"fine-grained" UPC program will run less efficiently than a 
"course-grained" MPI program (in which an entire vector of ghost cells 
are communicated in a single MPIsend/recv pair).  For that reason, our 
compiler (and others) have a limited ability to discover certain 
communications patterns (such as fetching an entire row of ghost cells 
from a neighbor) and move the communication outside of the loop.  I am 
not the expert on that, but perhaps another member of our group can say 
something about how to enable this compiler optimization and what 
patterns can be recognized.  The alternative, as one may guess, is to 
manually change fine-grained communication to course-grained by grouping 
your communication into copies of entire vectors of ghost cells.  The 
typical way to do this for contiguous data is to copy from a shared 
array to a private array (for instance one allocated using malloc()), 
with the upc_memget() function.  For non-contiguous data, the Berkeley 
UPC compiler offers the "VIS" extensions (see second half of 
http://upc.gwu.edu/~upc/upcworkshop04/bonachea-memcpy-0904-final.pdf and 
Section 5 of http://upc.lbl.gov/publications/upc_memcpy.pdf).  However, 
those are not (yet) a portable option.

-Paul

Tahar Amari wrote:
> Hello,
>
> A naive question :
>
>  I am trying to understand the manual for some typical application.
> SUppose that I distribute my data as hared one blocks (it is supposed 
> to be for example a 
> 2D grid with scalar number defined at the center of each cell)
>
> Then I perfom some operation that needs information of neighbouring 
> blocs (typically
> like in finite difference computing a laplacian in each block). This 
> operation will access
> some values close to the "interface " of the block, so with no 
> "affinity" with the local block.
>
> With MPI we need to communicate (send and get) this precise type of 
> needed data , known as "ghost" cells
> and then perform the operator.
>
> with UPC, if I do not do anything special (unlike in MPI)  is it 
> possible to do this ?
>
> If yes, I guess that UPC will do the communication, and therefore if 
> nothing special tells
> what data to comunicate,  this is where penalty will be big ?
>
>
> Has anyone tested a simple laplacian on a square grid, with simple 
> shared data, 
> and measured how much we loose or not , compared to the method in which
> one will do almost as much effort as with MPI ?
>
> Many thanks
>
> Tahar
>
>
> --------------------------------------------
>
> T. Amari
>
> Centre de Physique Theorique
>
> Ecole Polytechnique
>
> 91128 Palaiseau Cedex France
>
> tel : 33 1 69 33 42 52
>
> fax: 33 1 69 33 30 08
>
> email: <mailto:[email protected]>
>
> URL : http://www.cpht.polytechnique.fr/cpht/amari
>
>


-- 
Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
Future Technologies Group                 Tel: +1-510-495-2352
HPC Research Department                   Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory

Next message: Paul H. Hargrove: "Re: Support for NVIDIA Tesla"

Previous message: Jeff Glickman: "Support for NVIDIA Tesla"
In reply to: Tahar Amari: "shared data and operator"
Next in thread: Tahar Amari: "Re: shared data and operator"
Reply: Tahar Amari: "Re: shared data and operator"

Date view	Thread view	Subject view	Author view	Attachment view