Re: Defining block size during runtime

From: sainath l (ls.sainath_at_gmail_dot_com)
Date: Fri Jul 24 2009 - 21:37:20 PDT

  • Next message: Paul H. Hargrove: "Re: Defining block size during runtime"
    Hi,
    
    Thank you very much for answering my questions Paul. And extremely sorry for
    not providing the "gettime.h" file. Will make sure that I provide all the
    related files from next time.
    
    The code is running fine in an smp X4600 SMP node with 16 procs.
    But it is not running in XT 4.
    when I run it in XT 4 the code breaks during the first iteration. the first
    iteration does not complete. the printf after the upc_free(B) command does
    not execute.
    
    What do you think might be the reason for this? I get the same error message
    which I had pasted in my earlier post.
    
    Thanks in Advance.
    
    Cheers,
    sainath
    
    
    On Fri, Jul 24, 2009 at 11:00 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote:
    
    > I have run your code (I needed to provide a gettime.h) and did not see any
    > errors.  I tried on both an x86 cluster with myrinet and on a CrayXT.
    >
    > To answer your question: I don't believe that use of this data structure
    > will cause any performance penalty for the collectives, since the structure
    > is just a "trick" for indexing the block of data.  Additionally,
    > static-vs-dynamic allocation of memory should not have an effect on the
    > collectives performance either.
    >
    > --Paul
    >
    > sainath l wrote:
    >
    >> Hi paul,
    >>
    >> I have attached my code. The first iteration runs till the deallocation
    >> part and then the code breaks.
    >>
    >> *** Caught a fatal signal: SIGSEGV(11) on node 0/16
    >> _pmii_daemon(SIGCHLD): PE 0 exit signal Segmentation fault
    >> [NID 26]Apid 315852: initiated application termination
    >>
    >>
    >> Also I would be very happy to know, if I want to write micro-benchmarks
    >> for the collectives will using this datastructure be of any problem ?
    >> (overhead incurred by using this datastructure)  Or should I just declare
    >> arrays statically and use them.
    >>
    >> In practice, in general, are the source and destination variables of
    >> collective operations dynamically allocated ?  If yes will that degrade the
    >> perfromance.
    >>
    >> Thank you very much.
    >>
    >> Cheers,
    >> Sainath
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >> On Fri, Jul 24, 2009 at 3:54 AM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov<mailto:
    >> PHHargrove_at_lbl_dot_gov>> wrote:
    >>
    >>    sainath,
    >>
    >>     There is no obvious reason why upc_free() would not work for
    >>    Gary's datastructure.  Are you sure you are calling upc_free(a)
    >>    from exactly one thread, and only after all threads have finished
    >>    accessing the array?  Could you provide more information on how
    >>    "it breaks"?
    >>
    >>    -Paul
    >>
    >>    sainath l wrote:
    >>
    >>        sorry. that was runtime
    >>
    >>        Cheers
    >>        sainath
    >>
    >>
    >>        On Fri, Jul 24, 2009 at 12:45 AM, sainath l
    >>        <ls_dot_sainath_at_gmail_dot_com <mailto:ls_dot_sainath_at_gmail_dot_com>
    >>        <mailto:ls_dot_sainath_at_gmail_dot_com <mailto:ls_dot_sainath_at_gmail_dot_com>>>
    >>        wrote:
    >>
    >>           Its not possible  to deallocate memory using upc_free for the
    >>           derived data type that is given in Gary's example .
    >>           ALthough the code compiles without any noise during compile
    >>        time
    >>           it breaks.
    >>           Could someone tell me as to why this is the case. And also is
    >>           there a way to deallocate the memory for the array of
    >>        structures
    >>           in Gary's Example.
    >>
    >>           cheers,
    >>           sainath
    >>
    >>
    >>           On Fri, Jul 24, 2009 at 12:23 AM, sainath l
    >>        <ls_dot_sainath_at_gmail_dot_com <mailto:ls_dot_sainath_at_gmail_dot_com>
    >>           <mailto:ls.sainath_at_gmail_dot_com
    >>        <mailto:ls.sainath_at_gmail_dot_com>>> wrote:
    >>
    >>               Hello ,
    >>
    >>               Thanks again Gary.
    >>
    >>
    >>               Cheers,
    >>               sainath
    >>
    >>
    >>
    >>
    >>
    >>
    >>               On Thu, Jul 23, 2009 at 6:19 AM, Gary Funck
    >>        <gary_at_intrepid_dot_com <mailto:gary_at_intrepid_dot_com>
    >>               <mailto:gary_at_intrepid_dot_com <mailto:gary_at_intrepid_dot_com>>>
    >>
    >>        wrote:
    >>
    >>                   On 07/23/09 02:00:02, sainath l wrote:
    >>                   >    I am very much interested in knowing any
    >>        workaround,
    >>                   if possible, for
    >>                   >    dynamically allocating an array with variable
    >>        block
    >>                   size at runtime.
    >>                   >
    >>                   >    Lets say I want to know if it is possible to
    >>        create
    >>                   the following array
    >>                   >    dynamically where N and M are some variables.
    >>        If yes
    >>                   then how can we do
    >>                   >    it.
    >>                   >
    >>                   >    shared [M] int A[N][M];
    >>
    >>                   Sainath,
    >>                   I'm not sure if this is what you're asking about, but
    >>                   attached is
    >>                   a program that uses a "trick" to ensure that each
    >>        row of
    >>                   the array
    >>                   has affinity to a single thread, in a thread-cyclic
    >>        fashion.
    >>
    >>                   The trick is that by placing the row vector 'y'
    >>        inside of
    >>                   struct, we ensure that y is allocated contiguously on a
    >>                   given thread.  And for each a[i+1] (based upon UPC's
    >>                   indexing rules) we know that it will be allocated on
    >>                   the next thread (in cyclic order) after thread 'i'.
    >>
    >>                   $ upc alloc_row_struct.upc -o alloc_row_struct
    >>                   $ alloc_row_struct -n 4 4 5
    >>                   threadof a[0].y[0] = 0
    >>                   threadof a[0].y[1] = 0
    >>                   threadof a[0].y[2] = 0
    >>                   threadof a[0].y[3] = 0
    >>                   threadof a[0].y[4] = 0
    >>                   threadof a[1].y[0] = 1
    >>                   threadof a[1].y[1] = 1
    >>                   threadof a[1].y[2] = 1
    >>                   threadof a[1].y[3] = 1
    >>                   threadof a[1].y[4] = 1
    >>                   threadof a[2].y[0] = 2
    >>                   threadof a[2].y[1] = 2
    >>                   threadof a[2].y[2] = 2
    >>                   threadof a[2].y[3] = 2
    >>                   threadof a[2].y[4] = 2
    >>                   threadof a[3].y[0] = 3
    >>                   threadof a[3].y[1] = 3
    >>                   threadof a[3].y[2] = 3
    >>                   threadof a[3].y[3] = 3
    >>                   threadof a[3].y[4] = 3
    >>
    >>                   Above '-n 4' indicates that the program will run on
    >>        4 threads.
    >>                   That number was chosen to agree with the value of N
    >>        (also 4)
    >>                   given above, but in fact could be any number.
    >>
    >>                   Whether this is the best method, or even a recommended
    >>                   practice,
    >>                   for accomplishing your objective, I'm not sure.
    >>         Perhaps
    >>                   others
    >>                   on the list can offer some comment or suggest
    >>        alternative
    >>                   methods?
    >>
    >>                   - Gary
    >>
    >>
    >>
    >>
    >>
    >>
    >>    --    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >>    <mailto:PHHargrove_at_lbl_dot_gov>
    >>    Future Technologies Group                 Tel: +1-510-495-2352
    >>    HPC Research Department                   Fax: +1-510-486-6900
    >>    Lawrence Berkeley National Laboratory
    >>
    >>
    >
    > --
    > Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > Future Technologies Group                 Tel: +1-510-495-2352
    > HPC Research Department                   Fax: +1-510-486-6900
    > Lawrence Berkeley National Laboratory
    >
    

  • Next message: Paul H. Hargrove: "Re: Defining block size during runtime"