From: Benjamin Byington (bbyingto_at_soe_dot_ucsc_dot_edu)
Date: Fri May 08 2009 - 22:23:06 PDT
Hello, So my question comes in two parts. First, what is wrong with the toy code below? (Besides the obvious infinite loop...). When executing this code with two processors on two separate nodes, somehow the tight loop thread 0 is performing is preventing thread 1 from doing the memory allocation. The first print statement is reached, but never the second. If I either remove the loop, or simply switch things around so that thread 1 is in the loop and thread 0 is trying to do the allocation, things proceed as would be expected and the memory allocation is completed. #include <upc.h> #include <stdio.h> int main( int argc, char** argv ) { if(MYTHREAD == 0) { int len; while(1); } else if(MYTHREAD == 1) { fprintf(stderr, "Beginning memory allocation\n"); shared void * t = upc_alloc(1000000); fprintf(stderr, "Finished memory allocation\n"); } upc_barrier; return 0; } The second part of my question is: How should one approach doing event driven programming in upc? The above situation arose when I was trying to write a program that used dynamic scheduling to control when various tasks get performed. Thread 0 sits in a tight loop monitoring a set of flags for each of the worker processors, and gives them new directions any time it detects one is available. The worker nodes also sit in a tight loop any time they are idle, monitoring another flag to see if there is any more work available. I took care to insure that all these rapidly accessed flags were local to the processor sitting on them so as to avoid a million tiny unnecessary messages, but as my first example demonstrates that doesn't seem to be enough. All the processors go through some setup code allocating various shared data structures without a problem, but almost as soon as things enter the meat of the program things hang. Processor 0 hands off the first job to some worker node, and since at this stage there are no other concurrent tasks until the first one finished, processor zero just ends up repeatedly checking all the flags waiting for the job to be finished. The worker node however never completes the task. It always manages to perform a malloc(), a upc_memget(), and a upc_free without a problem, but the first time it hits a upc_alloc() the program just freezes. (The freezing problem goes away if I tell processor zero to just exit the loop and wait at a barrier, but that of course is useless since now it can't detect or do anything once the first task is done). Is there a better way than my flags to take event driven action? Is there a reason processor 0 being in a tight loop affects the execution of other processors? I just realized, this code works on my multicore laptop just fine, and while I presumed the problem had to do with distributed memory verses shared memory, I figured I should provide what details I can about the hardware this program is failing on in case there is a key there... Thanks in advance! Ben Processor: Dual core Opterons 2.2GHz (I only am using one core per node though) Network: Infiniband (using vapi protocol) Output from upcc -version This is upcc (the Berkeley Unified Parallel C compiler), v. 2.8.0 ----------------------+--------------------------------------------------------- UPC Runtime | v. 2.8.0, built on Feb 3 2009 at 14:21:28 ----------------------+--------------------------------------------------------- UPC-to-C translator | v. 2.8.0, built on Feb 3 2009 at 14:08:02 | host jacin04 linux-x86_64/64 ----------------------+--------------------------------------------------------- Translator location | /usr/common/ftg/upc/builds/stable/translator/install/ta | rg ----------------------+--------------------------------------------------------- networks supported | udp mpi smp vapi ----------------------+--------------------------------------------------------- default network | vapi ----------------------+--------------------------------------------------------- pthreads support | available (if used, default is 2 pthreads per process) ----------------------+--------------------------------------------------------- Configured with | '--with-translator=/usr/common/ftg/upc/builds/stable/tr | anslator/install/targ' '--enable-mpi' '--enable-vapi' | '--with-multiconf=+opt,+dbg,+opt_inst,+dbg_gccupc, | +opt_gccupc' '--with-vapi-spawner=mpi' | '--prefix=/usr/common/ftg/upc/builds/stable/runtime/ins | t/opt' '--with-multiconf-magic=opt' | 'CC=/usr/common/usg/pathscale/3.2/bin/pathcc' ----------------------+--------------------------------------------------------- Configure features | berkeleyupc,upcr,gasnet,upc_collective,upc_io, | upc_memcpy_async,upc_ptradd,upc_thread_distance, | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf, | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc, | upc_memcpy_vis,nodebug,notrace,nostats,nogasp, | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_pathscale, | packedsptr ----------------------+--------------------------------------------------------- Configure id | jacin04 Tue Feb 3 14:05:53 PST 2009 hargrove ----------------------+--------------------------------------------------------- Binary interface | 64-bit x86_64-unknown-linux-gnu ----------------------+--------------------------------------------------------- Runtime interface # | Runtime supports 3.0 -> 3.10: Translator uses 3.6 ----------------------+--------------------------------------------------------- | --- BACKEND SETTINGS (for vapi network) --- ----------------------+--------------------------------------------------------- C compiler | /usr/common/usg/pathscale/3.2/bin/pathcc | PATHSCALE/3.2/3.3.3 (SuSE Linux) | PathScale(TM) Compiler Suite: Version 3.2 Built on: | 2008-06-16 16:41:38 -0700 | GNU gcc version 3.3.1 (PathScale 3.2 driver) ----------------------+--------------------------------------------------------- C compiler flags | -O3 -Winline ----------------------+--------------------------------------------------------- linker | /usr/common/nsg/mvapich/pathscale/mvapich-0.9.5-mlx1.0. | 3/bin/mpicc | PATHSCALE/3.2/3.3.3 (SuSE Linux) | PathScale(TM) Compiler Suite: Version 3.2 Built on: | 2008-06-16 16:41:38 -0700 | GNU gcc version 3.3.1 (PathScale 3.2 driver) | mpicc for 1.2.6 (release) of : 2004/08/04 11:10:38 ----------------------+--------------------------------------------------------- linker flags | -O3 -Winline | -L/usr/common/ftg/upc/builds/stable/runtime/inst/opt/li | b -lupcr-vapi-seq -lumalloc | -L/usr/common/ftg/upc/builds/stable/runtime/inst/opt/li | b -L/usr/local/ibgd/driver/infinihost/lib64 | -lgasnet-vapi-seq -lvapi -lmtl_common -lmosal -lmpga | -lpthread -lm ----------------------+--------------------------------------------------------- jacin04 b/bbyingto>