Re: Hanging during upc_alloc()

Date view	Thread view	Subject view	Author view	Attachment view

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Sun May 10 2009 - 10:13:46 PDT

Next message: Nikita Andreev: "Re: PGAS perfomance issues"

Previous message: Benjamin Byington: "Hanging during upc_alloc()"
In reply to: Benjamin Byington: "Hanging during upc_alloc()"

Ben,

   The specific reason you see the "hang" in the upc_alloc() call is that the 
Berkeley UPC runtime library needs to communicate with a central allocation 
manager on thread 0 (not for *every* upc_alloc(), but occasionally to move a 
"high water mark").  However, your "while(1)" is preventing thread 0 from 
responding to the request in the distributed case.  In the shared memory case 
on your multicore laptop, thread 1 "knows" it is in the same address space as 
thread 0 and therefore runs the allocation management code itself, w/o the 
need for communication.
   There is no clear language in the UPC specification about progress 
guarantees and your code demonstrates one of the classes of code that would 
benefit if there was a clear specification.  Our solution/work-around for 
progress is the "upc_poll()" function, which ensures progress of the 
communications library.  Your toy code should probably work with "while(1) 
upc_poll();" or similar.

   I think the addition of upc_poll() to your event loop should allow your 
event-driven loop to make progress.  So, you probably want something like:

   while (1) {
      upc_poll();
      if (have_work) do_the_work();
   }

Note that I am suggesting to make the upc_poll() call every time through the 
loop so that incoming work can't "starve" things like upc_alloc() calls.

Let us know if you need further assistance.

-Paul

Benjamin Byington wrote:
> Hello,
> 
> So my question comes in two parts.  First, what is wrong with the toy code below? 
> (Besides the obvious infinite loop...).  When executing this code with two processors 
> on two separate nodes, somehow the tight loop thread 0 is performing is preventing 
> thread 1 from doing the memory allocation.  The first print statement is reached,
> but never the second.  If I either remove the loop, or simply switch things around 
> so that thread 1 is in the loop and thread 0 is trying to do the allocation, things 
> proceed as would be expected and the memory allocation is completed.  
> 
> #include <upc.h>
> #include <stdio.h>
> 
> int main( int argc, char** argv )
> {
>     if(MYTHREAD == 0)
>     {
>         int len;
>         while(1);
>     }
>     else if(MYTHREAD == 1)
>     {
>         fprintf(stderr, "Beginning memory allocation\n");
>         shared void * t = upc_alloc(1000000);
>         fprintf(stderr, "Finished memory allocation\n");
>     }
> 
>     upc_barrier;
> 
>     return 0;
> }
> 
> The second part of my question is: How should one approach doing event driven 
> programming in upc?  The above situation arose when I was trying to write a program 
> that used dynamic scheduling to control when various tasks get performed.  Thread 0 
> sits in a tight loop monitoring a set of flags for each of the worker processors, 
> and gives them new directions any time it detects one is available.  The worker 
> nodes also sit in a tight loop any time they are idle, monitoring another flag to 
> see if there is any more work available.  I took care to insure that all these 
> rapidly accessed flags were local to the processor sitting on them so as to avoid a 
> million tiny unnecessary messages, but as my first example demonstrates that doesn't 
> seem to be enough.  All the processors go through some setup code allocating various 
> shared data structures without a problem, but almost as soon as things enter the meat 
> of the program things hang.  Processor 0 hands off the first job to some worker 
> node, and since at this stage there are no other concurrent tasks until the first one
> finished, processor zero just ends up repeatedly checking all the flags waiting for 
> the job to be finished.  The worker node however never completes the task.  It always 
> manages to perform a malloc(), a upc_memget(), and a upc_free without a problem, but 
> the first time it hits a upc_alloc() the program just freezes.  (The freezing problem 
> goes away if I tell processor zero to just exit the loop and wait at a barrier, but 
> that of course is useless since now it can't detect or do anything once the first task 
> is done).  Is there a better way than my flags to take event driven action?  Is there 
> a reason processor 0 being in a tight loop affects the execution of other processors?  
> 
> I just realized, this code works on my multicore laptop just fine, and while I presumed 
> the problem had to do with distributed memory verses shared memory, I figured I should 
> provide what details I can about the hardware this program is failing on in case there 
> is a key there...
> 
> Thanks in advance!
> Ben
[...config info removed...]


-- 
Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Next message: Nikita Andreev: "Re: PGAS perfomance issues"

Previous message: Benjamin Byington: "Hanging during upc_alloc()"
In reply to: Benjamin Byington: "Hanging during upc_alloc()"

Date view	Thread view	Subject view	Author view	Attachment view