From: Venelin Mitov (vmitov_at_gmail_dot_com)
Date: Thu Jan 06 2005 - 11:37:30 PST
I need to write quicksort in upc. Here is the pseudocode of the parallelised
quicksort algorithm:
shared[N/THREADS] int P[N]; //the array to sort
//sorts P[begin..end-1] using threads[t0..tn)
void quicksort(int begin, int end, int t0, int tn){
1: if only one thread (tn - t0 == 1) call the serial qsort on
P[begin..end]; return;
//more than one thread
2: t0 selects one of the elements P[begin..end] as pivot
3: each thread t0..tn-1 rearranges it's local part of P[begin..end]
storing the elements smaller than pivot to the left and the larger or
equal elements to the right
4: global rearrangement (divide P[begin..end] on two parts - smallers
than pivot to the left , larger than pivot to the right:
4.1:once the local rearrangement is done by all threads the positions
of the smaller and larger parts in the "global" array (P[begin..end])
are calculated.
4.2: each thread places its smaller and larger parts approprietely in
the global array
5. let P[begin..send) are the elements smaller than pivot and
P[send..end) are the elements larger or equal to pivot.
let roughly said t0,.., tn0-1 are the threads in affinity with the
smaller (left) part
P[begin..send), and tn0...tn-1 are the trheads in affinity with the
larger(right) part -
P[send..end].
if(MYTHREAD <= tn0)
quicksort(begin, send, t0, tn0);
else
quicksort(send, end, tn0, tn);
}
There are synchronisation points for threads t0..tn-1 between the steps.
For example all threads t1..tn-1 must wait until t0 calculates the
pivot (step 2);
the global rearrangement (step 4) cannot start until all threads have finished
the local rearrangement;
The recursive call cannot be made until the global rearrangement is done.
The problem is that t0..tn-1 is a subrange of 0..THREADS-1, so
upc_barrier cannot be used.
Can you propose me an effective solution (maybe something with
upc_locks which works like "upc_barrier for threads t0..tn-1")?
Regards.
Venelin Mitov