Re: shared data and operator

Date view	Thread view	Subject view	Author view	Attachment view

From: Tahar Amari (amari_at_cpht.polytechnique.fr)
Date: Wed Feb 18 2009 - 01:34:35 PST

Next message: Si Hammond: "(no subject)"

Previous message: Paul H. Hargrove: "Re: Support for NVIDIA Tesla"
In reply to: Paul H. Hargrove: "Re: shared data and operator"

Paul ,

Thank you very much for this very nice explanation.
I understand , and you put this in a very clear way.

It would indeed be very interesting to have an answer to what you  
suggested below (to avoid doing it "hard coded" in the MPI way)

> .  For that reason, our compiler (and others) have a limited ability  
> to discover certain communications patterns (such as fetching an  
> entire row of ghost cells from a neighbor) and move the  
> communication outside of the loop.  I am not the expert on that, but  
> perhaps another member of our group can say something about how to  
> enable this compiler optimization and what patterns can be recognized.


If someone else knows how  to do this, it would be very helpful .

Many thanks again,



Tahar



--------------------------------------------
T. Amari
Centre de Physique Theorique
Ecole Polytechnique
91128 Palaiseau Cedex France
tel : 33 1 69 33 42 52
fax: 33 1 69 33 30 08
email: <mailto:[email protected]>
URL : http://www.cpht.polytechnique.fr/cpht/amari


Le 17 f�vr. 09 � 22:12, Paul H. Hargrove a �crit :

> Tahar,
>
>   You are correct that in UPC accessing data that does not have  
> affinity to the thread issuing the access will result in  
> communication.  This does not require any matching operation be  
> performed by the thread that does have the affinity to the data.   
> This one-sided communication model is one of the key programmer  
> productivity features of UPC and related language efforts.  So, the  
> answer to "is it possible to do this" as a resounding YES.
>  I don't have a simple answer to the question of the performance  
> "hit".  You are expecting things to be "poor", but in many cases use  
> of UPC may help performance.  For one thing, it will depend greatly  
> on the platform where you run.  For instance, for a run using  
> pthreads on a wide multiprocessor machine, the code should run  
> faster than MPI would on the same problem.  Similarly, there are  
> many modern networks (such as InifiniBand) where the UPC code should  
> run faster than an MPI code that was written in the same "fine- 
> grained" style (meaning the ghost cells are communicated one at a  
> time).  However, in general we can expect a "fine-grained" UPC  
> program will run less efficiently than a "course-grained" MPI  
> program (in which an entire vector of ghost cells are communicated  
> in a single MPIsend/recv pair).  For that reason, our compiler (and  
> others) have a limited ability to discover certain communications  
> patterns (such as fetching an entire row of ghost cells from a  
> neighbor) and move the communication outside of the loop.  I am not  
> the expert on that, but perhaps another member of our group can say  
> something about how to enable this compiler optimization and what  
> patterns can be recognized.  The alternative, as one may guess, is  
> to manually change fine-grained communication to course-grained by  
> grouping your communication into copies of entire vectors of ghost  
> cells.  The typical way to do this for contiguous data is to copy  
> from a shared array to a private array (for instance one allocated  
> using malloc()), with the upc_memget() function.  For non-contiguous  
> data, the Berkeley UPC compiler offers the "VIS" extensions (see  
> second half of http://upc.gwu.edu/~upc/upcworkshop04/bonachea-memcpy-0904-final.pdf 
>  and Section 5 of http://upc.lbl.gov/publications/upc_memcpy.pdf).   
> However, those are not (yet) a portable option.
>
> -Paul
>
> Tahar Amari wrote:
>> Hello,
>>
>> A naive question :
>>
>> I am trying to understand the manual for some typical application.
>> SUppose that I distribute my data as hared one blocks (it is  
>> supposed to be for example a 2D grid with scalar number defined at  
>> the center of each cell)
>>
>> Then I perfom some operation that needs information of neighbouring  
>> blocs (typically
>> like in finite difference computing a laplacian in each block).  
>> This operation will access
>> some values close to the "interface " of the block, so with no  
>> "affinity" with the local block.
>>
>> With MPI we need to communicate (send and get) this precise type of  
>> needed data , known as "ghost" cells
>> and then perform the operator.
>>
>> with UPC, if I do not do anything special (unlike in MPI)  is it  
>> possible to do this ?
>>
>> If yes, I guess that UPC will do the communication, and therefore  
>> if nothing special tells
>> what data to comunicate,  this is where penalty will be big ?
>>
>>
>> Has anyone tested a simple laplacian on a square grid, with simple  
>> shared data, and measured how much we loose or not , compared to  
>> the method in which
>> one will do almost as much effort as with MPI ?
>>
>> Many thanks
>>
>> Tahar
>>
>>
>> --------------------------------------------
>>
>> T. Amari
>>
>> Centre de Physique Theorique
>>
>> Ecole Polytechnique
>>
>> 91128 Palaiseau Cedex France
>>
>> tel : 33 1 69 33 42 52
>>
>> fax: 33 1 69 33 30 08
>>
>> email: <mailto:[email protected]>
>>
>> URL : http://www.cpht.polytechnique.fr/cpht/amari
>>
>>
>
>
> -- 
> Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
> Future Technologies Group                 Tel: +1-510-495-2352
> HPC Research Department                   Fax: +1-510-486-6900
> Lawrence Berkeley National Laboratory

Next message: Si Hammond: "(no subject)"

Previous message: Paul H. Hargrove: "Re: Support for NVIDIA Tesla"
In reply to: Paul H. Hargrove: "Re: shared data and operator"

Date view	Thread view	Subject view	Author view	Attachment view