Re: Dynamic 2D Allocation

From: QMar=EDa_J._Mart=EDn=22?= (maria.martin.santamaria_at_udc.es)
Date: Thu Dec 03 2009 - 05:44:33 PST

  • Next message: Andreev Nikita: "Clock synchronization among threads in BUPC"
    "[ ]"  specifies an indefinite block size. All the array elements  
    should have affinity to the same thread. In the case of upc_alloc, the  
    block size of the space allocated is always the indefinite block size.
    If the shared array is declared with indefinite block size, the result  
    of the pointer-to-shared arithmetic is identical to  normal C pointers.
    
    As regards the performance optimization .... A generic pointer-to- 
    shared contains three fields: thread, block address and phase. When  
    performing pointer arithmetic on a pointer-to-shared all three fields  
    will be updated, making the operation slower than private pointer  
    arithmetic. The Berkeley UPC Compiler implements an optimization  
    called �phaseless� pointers for the common special case of cyclic and  
    indefinite pointers. Cyclic pointers have a block size of one, and  
    their phase is thus always zero; Indefinite pointers have a block size  
    of zero, and their phase is also defined to zero since all elements  
    belong to the same UPC thread. Cyclic and indefinite pointers are thus  
    �phaseless�, and the compiler exploits this knowledge to schedule more  
    efficient operations for them  (see http://www.gwu.edu/~upc/publications/performance.pdf 
       for more details).
    
    Regards,
    
    Mar�a
    
    El 03/12/2009, a las 11:37, Oliver Perks escribi�:
    
    > Thank you for your reply.
    > This works much better. I had actually "fixed" the problem by using:
    >
    > shared [UPC_MAX_BLOCK_SIZE] int *shared * a;
    >
    > Your solution provides much better performance so thank you, but I  
    > am still confused as to what this then uses as the block size?
    >
    > Regards
    > Oliver
    >
    > Mar�a J. Mart�n wrote:
    >> The a pointer  is incorrectly declarated.
    >>
    >> Try:
    >>
    >> shared[] int *shared * a;
    >>
    >> a = (shared[] int *shared *)upc_all_alloc(10,sizeof(shared int*));
    >>
    >> Regards,
    >>
    >> Mar�a
    >>
    >>
    >>
    >> El 02/12/2009, a las 11:25, Oliver Perks escribi�:
    >>
    >>> I have been trying to get a simple example working where by a 2D  
    >>> array is striped across multiple processors. Where each column is  
    >>> placed on a different processor in a round robin fashion.
    >>> I assumed that this would be achieved by the code provided by Ben,  
    >>> but the results suggest otherwise. Can anyone shine some light on  
    >>> what I would have considered a rather simple problem.
    >>>
    >>>
    >>> shared int *shared * a;
    >>>
    >>> a = (shared int *shared *)upc_all_alloc(10,sizeof(shared int*));
    >>> upc_forall(int i = 0; i < 10; i++; i)
    >>> {
    >>>  a[i] = upc_alloc(10*sizeof(shared int));
    >>>  for(int j = 0; j < 10; j++)
    >>>  {
    >>>     a[i][j] = i * j;
    >>>     printf("Owner of %d - %d is %d\n", i, j, upc_threadof(&a[i] 
    >>> [j]));
    >>>  }
    >>> }
    >>> return 0;
    >>>
    >>>
    >>> When run on 2 threads:
    >>> I would expect this to put even columns on thread 0, and odd  
    >>> columns on thread 1. Then each column be entirely constrained  
    >>> within that thread.
    >>>
    >>> 0   1   0   1   0   1   .....
    >>> 0   1   0   1   0   1   .....
    >>> 0   1   0   1   0   1   .....
    >>> 0   1   0   1   0   1   .....
    >>> 0   1   0   1   0   1   .....
    >>> .     .   .   .   .   .   .
    >>>
    >>> By what I actually get is that it is striping the column over the  
    >>> processors.
    >>>
    >>> 0   1   0   1   0   1   .....
    >>> 1    0   1   0   1   0    .....
    >>> 0   1   0   1   0   1   .....
    >>> 1    0   1   0   1   0    .....
    >>> 0   1   0   1   0   1   .....
    >>> .     .   .   .   .   .   .
    >>>
    >>> Any ideas.
    >>> Regards Oliver
    >>>
    >>>
    >>>
    >>>
    >>
    >
    

  • Next message: Andreev Nikita: "Clock synchronization among threads in BUPC"