From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Feb 17 2009 - 13:12:09 PST
Tahar, You are correct that in UPC accessing data that does not have affinity to the thread issuing the access will result in communication. This does not require any matching operation be performed by the thread that does have the affinity to the data. This one-sided communication model is one of the key programmer productivity features of UPC and related language efforts. So, the answer to "is it possible to do this" as a resounding YES. I don't have a simple answer to the question of the performance "hit". You are expecting things to be "poor", but in many cases use of UPC may help performance. For one thing, it will depend greatly on the platform where you run. For instance, for a run using pthreads on a wide multiprocessor machine, the code should run faster than MPI would on the same problem. Similarly, there are many modern networks (such as InifiniBand) where the UPC code should run faster than an MPI code that was written in the same "fine-grained" style (meaning the ghost cells are communicated one at a time). However, in general we can expect a "fine-grained" UPC program will run less efficiently than a "course-grained" MPI program (in which an entire vector of ghost cells are communicated in a single MPIsend/recv pair). For that reason, our compiler (and others) have a limited ability to discover certain communications patterns (such as fetching an entire row of ghost cells from a neighbor) and move the communication outside of the loop. I am not the expert on that, but perhaps another member of our group can say something about how to enable this compiler optimization and what patterns can be recognized. The alternative, as one may guess, is to manually change fine-grained communication to course-grained by grouping your communication into copies of entire vectors of ghost cells. The typical way to do this for contiguous data is to copy from a shared array to a private array (for instance one allocated using malloc()), with the upc_memget() function. For non-contiguous data, the Berkeley UPC compiler offers the "VIS" extensions (see second half of http://upc.gwu.edu/~upc/upcworkshop04/bonachea-memcpy-0904-final.pdf and Section 5 of http://upc.lbl.gov/publications/upc_memcpy.pdf). However, those are not (yet) a portable option. -Paul Tahar Amari wrote: > Hello, > > A naive question : > > I am trying to understand the manual for some typical application. > SUppose that I distribute my data as hared one blocks (it is supposed > to be for example a > 2D grid with scalar number defined at the center of each cell) > > Then I perfom some operation that needs information of neighbouring > blocs (typically > like in finite difference computing a laplacian in each block). This > operation will access > some values close to the "interface " of the block, so with no > "affinity" with the local block. > > With MPI we need to communicate (send and get) this precise type of > needed data , known as "ghost" cells > and then perform the operator. > > with UPC, if I do not do anything special (unlike in MPI) is it > possible to do this ? > > If yes, I guess that UPC will do the communication, and therefore if > nothing special tells > what data to comunicate, this is where penalty will be big ? > > > Has anyone tested a simple laplacian on a square grid, with simple > shared data, > and measured how much we loose or not , compared to the method in which > one will do almost as much effort as with MPI ? > > Many thanks > > Tahar > > > -------------------------------------------- > > T. Amari > > Centre de Physique Theorique > > Ecole Polytechnique > > 91128 Palaiseau Cedex France > > tel : 33 1 69 33 42 52 > > fax: 33 1 69 33 30 08 > > email: <mailto:[email protected]> > > URL : http://www.cpht.polytechnique.fr/cpht/amari > > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory