Re: Dynamic 2D Allocation

From: QMar=EDa_J._Mart=EDn=22?= (maria.martin.santamaria_at_udc.es)
Date: Thu Dec 03 2009 - 05:44:33 PST

  • Next message: Andreev Nikita: "Clock synchronization among threads in BUPC"
    "[ ]"  specifies an indefinite block size. All the array elements  
    should have affinity to the same thread. In the case of upc_alloc, the  
    block size of the space allocated is always the indefinite block size.
    If the shared array is declared with indefinite block size, the result  
    of the pointer-to-shared arithmetic is identical to  normal C pointers.
    
    As regards the performance optimization .... A generic pointer-to- 
    shared contains three fields: thread, block address and phase. When  
    performing pointer arithmetic on a pointer-to-shared all three fields  
    will be updated, making the operation slower than private pointer  
    arithmetic. The Berkeley UPC Compiler implements an optimization  
    called “phaseless” pointers for the common special case of cyclic and  
    indefinite pointers. Cyclic pointers have a block size of one, and  
    their phase is thus always zero; Indefinite pointers have a block size  
    of zero, and their phase is also defined to zero since all elements  
    belong to the same UPC thread. Cyclic and indefinite pointers are thus  
    “phaseless”, and the compiler exploits this knowledge to schedule more  
    efficient operations for them  (see http://www.gwu.edu/~upc/publications/performance.pdf 
       for more details).
    
    Regards,
    
    María
    
    El 03/12/2009, a las 11:37, Oliver Perks escribió:
    
    > Thank you for your reply.
    > This works much better. I had actually "fixed" the problem by using:
    >
    > shared [UPC_MAX_BLOCK_SIZE] int *shared * a;
    >
    > Your solution provides much better performance so thank you, but I  
    > am still confused as to what this then uses as the block size?
    >
    > Regards
    > Oliver
    >
    > María J. Martín wrote:
    >> The a pointer  is incorrectly declarated.
    >>
    >> Try:
    >>
    >> shared[] int *shared * a;
    >>
    >> a = (shared[] int *shared *)upc_all_alloc(10,sizeof(shared int*));
    >>
    >> Regards,
    >>
    >> María
    >>
    >>
    >>
    >> El 02/12/2009, a las 11:25, Oliver Perks escribió:
    >>
    >>> I have been trying to get a simple example working where by a 2D  
    >>> array is striped across multiple processors. Where each column is  
    >>> placed on a different processor in a round robin fashion.
    >>> I assumed that this would be achieved by the code provided by Ben,  
    >>> but the results suggest otherwise. Can anyone shine some light on  
    >>> what I would have considered a rather simple problem.
    >>>
    >>>
    >>> shared int *shared * a;
    >>>
    >>> a = (shared int *shared *)upc_all_alloc(10,sizeof(shared int*));
    >>> upc_forall(int i = 0; i < 10; i++; i)
    >>> {
    >>>  a[i] = upc_alloc(10*sizeof(shared int));
    >>>  for(int j = 0; j < 10; j++)
    >>>  {
    >>>     a[i][j] = i * j;
    >>>     printf("Owner of %d - %d is %d\n", i, j, upc_threadof(&a[i] 
    >>> [j]));
    >>>  }
    >>> }
    >>> return 0;
    >>>
    >>>
    >>> When run on 2 threads:
    >>> I would expect this to put even columns on thread 0, and odd  
    >>> columns on thread 1. Then each column be entirely constrained  
    >>> within that thread.
    >>>
    >>> 0   1   0   1   0   1   .....
    >>> 0   1   0   1   0   1   .....
    >>> 0   1   0   1   0   1   .....
    >>> 0   1   0   1   0   1   .....
    >>> 0   1   0   1   0   1   .....
    >>> .     .   .   .   .   .   .
    >>>
    >>> By what I actually get is that it is striping the column over the  
    >>> processors.
    >>>
    >>> 0   1   0   1   0   1   .....
    >>> 1    0   1   0   1   0    .....
    >>> 0   1   0   1   0   1   .....
    >>> 1    0   1   0   1   0    .....
    >>> 0   1   0   1   0   1   .....
    >>> .     .   .   .   .   .   .
    >>>
    >>> Any ideas.
    >>> Regards Oliver
    >>>
    >>>
    >>>
    >>>
    >>
    >
    

  • Next message: Andreev Nikita: "Clock synchronization among threads in BUPC"