From: Venelin Mitov (vmitov_at_gmail_dot_com)
Date: Thu Jan 06 2005 - 11:37:30 PST
I need to write quicksort in upc. Here is the pseudocode of the parallelised quicksort algorithm: shared[N/THREADS] int P[N]; //the array to sort //sorts P[begin..end-1] using threads[t0..tn) void quicksort(int begin, int end, int t0, int tn){ 1: if only one thread (tn - t0 == 1) call the serial qsort on P[begin..end]; return; //more than one thread 2: t0 selects one of the elements P[begin..end] as pivot 3: each thread t0..tn-1 rearranges it's local part of P[begin..end] storing the elements smaller than pivot to the left and the larger or equal elements to the right 4: global rearrangement (divide P[begin..end] on two parts - smallers than pivot to the left , larger than pivot to the right: 4.1:once the local rearrangement is done by all threads the positions of the smaller and larger parts in the "global" array (P[begin..end]) are calculated. 4.2: each thread places its smaller and larger parts approprietely in the global array 5. let P[begin..send) are the elements smaller than pivot and P[send..end) are the elements larger or equal to pivot. let roughly said t0,.., tn0-1 are the threads in affinity with the smaller (left) part P[begin..send), and tn0...tn-1 are the trheads in affinity with the larger(right) part - P[send..end]. if(MYTHREAD <= tn0) quicksort(begin, send, t0, tn0); else quicksort(send, end, tn0, tn); } There are synchronisation points for threads t0..tn-1 between the steps. For example all threads t1..tn-1 must wait until t0 calculates the pivot (step 2); the global rearrangement (step 4) cannot start until all threads have finished the local rearrangement; The recursive call cannot be made until the global rearrangement is done. The problem is that t0..tn-1 is a subrange of 0..THREADS-1, so upc_barrier cannot be used. Can you propose me an effective solution (maybe something with upc_locks which works like "upc_barrier for threads t0..tn-1")? Regards. Venelin Mitov