From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Jul 24 2009 - 15:00:58 PDT
I have run your code (I needed to provide a gettime.h) and did not see any errors. I tried on both an x86 cluster with myrinet and on a CrayXT. To answer your question: I don't believe that use of this data structure will cause any performance penalty for the collectives, since the structure is just a "trick" for indexing the block of data. Additionally, static-vs-dynamic allocation of memory should not have an effect on the collectives performance either. --Paul sainath l wrote: > Hi paul, > > I have attached my code. The first iteration runs till the > deallocation part and then the code breaks. > > *** Caught a fatal signal: SIGSEGV(11) on node 0/16 > _pmii_daemon(SIGCHLD): PE 0 exit signal Segmentation fault > [NID 26]Apid 315852: initiated application termination > > > Also I would be very happy to know, if I want to write > micro-benchmarks for the collectives will using this datastructure be > of any problem ? (overhead incurred by using this datastructure) Or > should I just declare arrays statically and use them. > > In practice, in general, are the source and destination variables of > collective operations dynamically allocated ? If yes will that > degrade the perfromance. > > Thank you very much. > > Cheers, > Sainath > > > > > > > > > > > > On Fri, Jul 24, 2009 at 3:54 AM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov > <mailto:PHHargrove_at_lbl_dot_gov>> wrote: > > sainath, > > There is no obvious reason why upc_free() would not work for > Gary's datastructure. Are you sure you are calling upc_free(a) > from exactly one thread, and only after all threads have finished > accessing the array? Could you provide more information on how > "it breaks"? > > -Paul > > sainath l wrote: > > sorry. that was runtime > > Cheers > sainath > > > On Fri, Jul 24, 2009 at 12:45 AM, sainath l > <ls_dot_sainath_at_gmail_dot_com <mailto:ls_dot_sainath_at_gmail_dot_com> > <mailto:ls_dot_sainath_at_gmail_dot_com <mailto:ls_dot_sainath_at_gmail_dot_com>>> > wrote: > > Its not possible to deallocate memory using upc_free for the > derived data type that is given in Gary's example . > ALthough the code compiles without any noise during compile > time > it breaks. > Could someone tell me as to why this is the case. And also is > there a way to deallocate the memory for the array of > structures > in Gary's Example. > > cheers, > sainath > > > On Fri, Jul 24, 2009 at 12:23 AM, sainath l > <ls_dot_sainath_at_gmail_dot_com <mailto:ls_dot_sainath_at_gmail_dot_com> > <mailto:ls.sainath_at_gmail_dot_com > <mailto:ls.sainath_at_gmail_dot_com>>> wrote: > > Hello , > > Thanks again Gary. > > > Cheers, > sainath > > > > > > > On Thu, Jul 23, 2009 at 6:19 AM, Gary Funck > <gary_at_intrepid_dot_com <mailto:gary_at_intrepid_dot_com> > <mailto:gary_at_intrepid_dot_com <mailto:gary_at_intrepid_dot_com>>> > wrote: > > On 07/23/09 02:00:02, sainath l wrote: > > I am very much interested in knowing any > workaround, > if possible, for > > dynamically allocating an array with variable > block > size at runtime. > > > > Lets say I want to know if it is possible to > create > the following array > > dynamically where N and M are some variables. > If yes > then how can we do > > it. > > > > shared [M] int A[N][M]; > > Sainath, > I'm not sure if this is what you're asking about, but > attached is > a program that uses a "trick" to ensure that each > row of > the array > has affinity to a single thread, in a thread-cyclic > fashion. > > The trick is that by placing the row vector 'y' > inside of > struct, we ensure that y is allocated contiguously on a > given thread. And for each a[i+1] (based upon UPC's > indexing rules) we know that it will be allocated on > the next thread (in cyclic order) after thread 'i'. > > $ upc alloc_row_struct.upc -o alloc_row_struct > $ alloc_row_struct -n 4 4 5 > threadof a[0].y[0] = 0 > threadof a[0].y[1] = 0 > threadof a[0].y[2] = 0 > threadof a[0].y[3] = 0 > threadof a[0].y[4] = 0 > threadof a[1].y[0] = 1 > threadof a[1].y[1] = 1 > threadof a[1].y[2] = 1 > threadof a[1].y[3] = 1 > threadof a[1].y[4] = 1 > threadof a[2].y[0] = 2 > threadof a[2].y[1] = 2 > threadof a[2].y[2] = 2 > threadof a[2].y[3] = 2 > threadof a[2].y[4] = 2 > threadof a[3].y[0] = 3 > threadof a[3].y[1] = 3 > threadof a[3].y[2] = 3 > threadof a[3].y[3] = 3 > threadof a[3].y[4] = 3 > > Above '-n 4' indicates that the program will run on > 4 threads. > That number was chosen to agree with the value of N > (also 4) > given above, but in fact could be any number. > > Whether this is the best method, or even a recommended > practice, > for accomplishing your objective, I'm not sure. > Perhaps > others > on the list can offer some comment or suggest > alternative > methods? > > - Gary > > > > > > > -- > Paul H. Hargrove PHHargrove_at_lbl_dot_gov > <mailto:PHHargrove_at_lbl_dot_gov> > Future Technologies Group Tel: +1-510-495-2352 > HPC Research Department Fax: +1-510-486-6900 > Lawrence Berkeley National Laboratory > > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory