Re: Defining block size during runtime

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Jul 24 2009 - 15:00:58 PDT

  • Next message: sainath l: "Re: Defining block size during runtime"
    I have run your code (I needed to provide a gettime.h) and did not see 
    any errors.  I tried on both an x86 cluster with myrinet and on a CrayXT.
    
    To answer your question: I don't believe that use of this data structure 
    will cause any performance penalty for the collectives, since the 
    structure is just a "trick" for indexing the block of data.  
    Additionally, static-vs-dynamic allocation of memory should not have an 
    effect on the collectives performance either.
    
    --Paul
    
    sainath l wrote:
    > Hi paul,
    >
    > I have attached my code. The first iteration runs till the 
    > deallocation part and then the code breaks.
    >
    > *** Caught a fatal signal: SIGSEGV(11) on node 0/16
    > _pmii_daemon(SIGCHLD): PE 0 exit signal Segmentation fault
    > [NID 26]Apid 315852: initiated application termination
    >
    >
    > Also I would be very happy to know, if I want to write 
    > micro-benchmarks for the collectives will using this datastructure be 
    > of any problem ? (overhead incurred by using this datastructure)  Or 
    > should I just declare arrays statically and use them.
    >
    > In practice, in general, are the source and destination variables of 
    > collective operations dynamically allocated ?  If yes will that 
    > degrade the perfromance.
    >
    > Thank you very much.
    >
    > Cheers,
    > Sainath
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > On Fri, Jul 24, 2009 at 3:54 AM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov 
    > <mailto:PHHargrove_at_lbl_dot_gov>> wrote:
    >
    >     sainath,
    >
    >      There is no obvious reason why upc_free() would not work for
    >     Gary's datastructure.  Are you sure you are calling upc_free(a)
    >     from exactly one thread, and only after all threads have finished
    >     accessing the array?  Could you provide more information on how
    >     "it breaks"?
    >
    >     -Paul
    >
    >     sainath l wrote:
    >
    >         sorry. that was runtime
    >
    >         Cheers
    >         sainath
    >
    >
    >         On Fri, Jul 24, 2009 at 12:45 AM, sainath l
    >         <ls_dot_sainath_at_gmail_dot_com <mailto:ls_dot_sainath_at_gmail_dot_com>
    >         <mailto:ls_dot_sainath_at_gmail_dot_com <mailto:ls_dot_sainath_at_gmail_dot_com>>>
    >         wrote:
    >
    >            Its not possible  to deallocate memory using upc_free for the
    >            derived data type that is given in Gary's example .
    >            ALthough the code compiles without any noise during compile
    >         time
    >            it breaks.
    >            Could someone tell me as to why this is the case. And also is
    >            there a way to deallocate the memory for the array of
    >         structures
    >            in Gary's Example.
    >
    >            cheers,
    >            sainath
    >
    >
    >            On Fri, Jul 24, 2009 at 12:23 AM, sainath l
    >         <ls_dot_sainath_at_gmail_dot_com <mailto:ls_dot_sainath_at_gmail_dot_com>
    >            <mailto:ls.sainath_at_gmail_dot_com
    >         <mailto:ls.sainath_at_gmail_dot_com>>> wrote:
    >
    >                Hello ,
    >
    >                Thanks again Gary.
    >
    >
    >                Cheers,
    >                sainath
    >
    >
    >
    >
    >
    >
    >                On Thu, Jul 23, 2009 at 6:19 AM, Gary Funck
    >         <gary_at_intrepid_dot_com <mailto:gary_at_intrepid_dot_com>
    >                <mailto:gary_at_intrepid_dot_com <mailto:gary_at_intrepid_dot_com>>>
    >         wrote:
    >
    >                    On 07/23/09 02:00:02, sainath l wrote:
    >                    >    I am very much interested in knowing any
    >         workaround,
    >                    if possible, for
    >                    >    dynamically allocating an array with variable
    >         block
    >                    size at runtime.
    >                    >
    >                    >    Lets say I want to know if it is possible to
    >         create
    >                    the following array
    >                    >    dynamically where N and M are some variables.
    >         If yes
    >                    then how can we do
    >                    >    it.
    >                    >
    >                    >    shared [M] int A[N][M];
    >
    >                    Sainath,
    >                    I'm not sure if this is what you're asking about, but
    >                    attached is
    >                    a program that uses a "trick" to ensure that each
    >         row of
    >                    the array
    >                    has affinity to a single thread, in a thread-cyclic
    >         fashion.
    >
    >                    The trick is that by placing the row vector 'y'
    >         inside of
    >                    struct, we ensure that y is allocated contiguously on a
    >                    given thread.  And for each a[i+1] (based upon UPC's
    >                    indexing rules) we know that it will be allocated on
    >                    the next thread (in cyclic order) after thread 'i'.
    >
    >                    $ upc alloc_row_struct.upc -o alloc_row_struct
    >                    $ alloc_row_struct -n 4 4 5
    >                    threadof a[0].y[0] = 0
    >                    threadof a[0].y[1] = 0
    >                    threadof a[0].y[2] = 0
    >                    threadof a[0].y[3] = 0
    >                    threadof a[0].y[4] = 0
    >                    threadof a[1].y[0] = 1
    >                    threadof a[1].y[1] = 1
    >                    threadof a[1].y[2] = 1
    >                    threadof a[1].y[3] = 1
    >                    threadof a[1].y[4] = 1
    >                    threadof a[2].y[0] = 2
    >                    threadof a[2].y[1] = 2
    >                    threadof a[2].y[2] = 2
    >                    threadof a[2].y[3] = 2
    >                    threadof a[2].y[4] = 2
    >                    threadof a[3].y[0] = 3
    >                    threadof a[3].y[1] = 3
    >                    threadof a[3].y[2] = 3
    >                    threadof a[3].y[3] = 3
    >                    threadof a[3].y[4] = 3
    >
    >                    Above '-n 4' indicates that the program will run on
    >         4 threads.
    >                    That number was chosen to agree with the value of N
    >         (also 4)
    >                    given above, but in fact could be any number.
    >
    >                    Whether this is the best method, or even a recommended
    >                    practice,
    >                    for accomplishing your objective, I'm not sure.
    >          Perhaps
    >                    others
    >                    on the list can offer some comment or suggest
    >         alternative
    >                    methods?
    >
    >                    - Gary
    >
    >
    >
    >
    >
    >
    >     -- 
    >     Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >     <mailto:PHHargrove_at_lbl_dot_gov>
    >     Future Technologies Group                 Tel: +1-510-495-2352
    >     HPC Research Department                   Fax: +1-510-486-6900
    >     Lawrence Berkeley National Laboratory    
    >
    >
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: sainath l: "Re: Defining block size during runtime"