From: Oliver Perks (o.perks_at_warwick.ac.uk)
Date: Wed Dec 02 2009 - 02:25:40 PST
I have been trying to get a simple example working where by a 2D array 
is striped across multiple processors. Where each column is placed on a 
different processor in a round robin fashion.
I assumed that this would be achieved by the code provided by Ben, but 
the results suggest otherwise. Can anyone shine some light on what I 
would have considered a rather simple problem.
  shared int *shared * a;
  a = (shared int *shared *)upc_all_alloc(10,sizeof(shared int*));
  upc_forall(int i = 0; i < 10; i++; i)
  {
    a[i] = upc_alloc(10*sizeof(shared int));
    for(int j = 0; j < 10; j++)
    {
       a[i][j] = i * j;
       printf("Owner of %d - %d is %d\n", i, j, upc_threadof(&a[i][j]));
    }
  }
  return 0;
When run on 2 threads:
I would expect this to put even columns on thread 0, and odd columns on 
thread 1. Then each column be entirely constrained within that thread.
0   1   0   1   0   1   .....
0   1   0   1   0   1   .....
0   1   0   1   0   1   .....
0   1   0   1   0   1   .....
0   1   0   1   0   1   .....
.     .   .   .   .   .   .
By what I actually get is that it is striping the column over the 
processors.
0   1   0   1   0   1   .....
1    0   1   0   1   0    .....
0   1   0   1   0   1   .....
1    0   1   0   1   0    .....
0   1   0   1   0   1   .....
.     .   .   .   .   .   .
Any ideas.
Regards Oliver