From: Tahar Amari (amari_at_cpht.polytechnique.fr)
Date: Wed Feb 18 2009 - 01:34:35 PST
Paul , Thank you very much for this very nice explanation. I understand , and you put this in a very clear way. It would indeed be very interesting to have an answer to what you suggested below (to avoid doing it "hard coded" in the MPI way) > . For that reason, our compiler (and others) have a limited ability > to discover certain communications patterns (such as fetching an > entire row of ghost cells from a neighbor) and move the > communication outside of the loop. I am not the expert on that, but > perhaps another member of our group can say something about how to > enable this compiler optimization and what patterns can be recognized. If someone else knows how to do this, it would be very helpful . Many thanks again, Tahar -------------------------------------------- T. Amari Centre de Physique Theorique Ecole Polytechnique 91128 Palaiseau Cedex France tel : 33 1 69 33 42 52 fax: 33 1 69 33 30 08 email: <mailto:[email protected]> URL : http://www.cpht.polytechnique.fr/cpht/amari Le 17 f�vr. 09 � 22:12, Paul H. Hargrove a �crit : > Tahar, > > You are correct that in UPC accessing data that does not have > affinity to the thread issuing the access will result in > communication. This does not require any matching operation be > performed by the thread that does have the affinity to the data. > This one-sided communication model is one of the key programmer > productivity features of UPC and related language efforts. So, the > answer to "is it possible to do this" as a resounding YES. > I don't have a simple answer to the question of the performance > "hit". You are expecting things to be "poor", but in many cases use > of UPC may help performance. For one thing, it will depend greatly > on the platform where you run. For instance, for a run using > pthreads on a wide multiprocessor machine, the code should run > faster than MPI would on the same problem. Similarly, there are > many modern networks (such as InifiniBand) where the UPC code should > run faster than an MPI code that was written in the same "fine- > grained" style (meaning the ghost cells are communicated one at a > time). However, in general we can expect a "fine-grained" UPC > program will run less efficiently than a "course-grained" MPI > program (in which an entire vector of ghost cells are communicated > in a single MPIsend/recv pair). For that reason, our compiler (and > others) have a limited ability to discover certain communications > patterns (such as fetching an entire row of ghost cells from a > neighbor) and move the communication outside of the loop. I am not > the expert on that, but perhaps another member of our group can say > something about how to enable this compiler optimization and what > patterns can be recognized. The alternative, as one may guess, is > to manually change fine-grained communication to course-grained by > grouping your communication into copies of entire vectors of ghost > cells. The typical way to do this for contiguous data is to copy > from a shared array to a private array (for instance one allocated > using malloc()), with the upc_memget() function. For non-contiguous > data, the Berkeley UPC compiler offers the "VIS" extensions (see > second half of http://upc.gwu.edu/~upc/upcworkshop04/bonachea-memcpy-0904-final.pdf > and Section 5 of http://upc.lbl.gov/publications/upc_memcpy.pdf). > However, those are not (yet) a portable option. > > -Paul > > Tahar Amari wrote: >> Hello, >> >> A naive question : >> >> I am trying to understand the manual for some typical application. >> SUppose that I distribute my data as hared one blocks (it is >> supposed to be for example a 2D grid with scalar number defined at >> the center of each cell) >> >> Then I perfom some operation that needs information of neighbouring >> blocs (typically >> like in finite difference computing a laplacian in each block). >> This operation will access >> some values close to the "interface " of the block, so with no >> "affinity" with the local block. >> >> With MPI we need to communicate (send and get) this precise type of >> needed data , known as "ghost" cells >> and then perform the operator. >> >> with UPC, if I do not do anything special (unlike in MPI) is it >> possible to do this ? >> >> If yes, I guess that UPC will do the communication, and therefore >> if nothing special tells >> what data to comunicate, this is where penalty will be big ? >> >> >> Has anyone tested a simple laplacian on a square grid, with simple >> shared data, and measured how much we loose or not , compared to >> the method in which >> one will do almost as much effort as with MPI ? >> >> Many thanks >> >> Tahar >> >> >> -------------------------------------------- >> >> T. Amari >> >> Centre de Physique Theorique >> >> Ecole Polytechnique >> >> 91128 Palaiseau Cedex France >> >> tel : 33 1 69 33 42 52 >> >> fax: 33 1 69 33 30 08 >> >> email: <mailto:[email protected]> >> >> URL : http://www.cpht.polytechnique.fr/cpht/amari >> >> > > > -- > Paul H. Hargrove PHHargrove_at_lbl_dot_gov > Future Technologies Group Tel: +1-510-495-2352 > HPC Research Department Fax: +1-510-486-6900 > Lawrence Berkeley National Laboratory