From: Jason Duell (jcduell_at_lbl_dot_gov)
Date: Wed Sep 20 2006 - 10:56:23 PDT
On Tue, Sep 19, 2006 at 10:06:48PM -0300, Konstantin Kleisouris wrote: > Hi everyone, > > I have two concerns with berkely UPC. > I am trying to run a UPC program on a cluster of linux machines. > However, when I type the command: > > upcrun -n 8 a.out > > I get the message you see below (see after the dashed line). I have > compiled my program with -pthreads=4. I really cannot figure out what > the problem is. I have generated ssh keys so that when you ssh from one > machine to another you don't have to type your password. Also, I have > set the UPC_NODES variable to a list of machines in the cluster. I > believe that the program (a.out) does not even start executing, because > I am supposed to give some arguments to it, but it does not ask me for > them (as it should). > > Also, if I compile my program with -pthreads=1 and then without > -pthreads at all (so, in both cases there is only one thread) I see > difference in performance. In the first case (-pthreads=1) the program > is slower than if I don't use -pthreads at all. I noticed that even > portions of the UPC program where threads access only private data (for > instance arrays that have been generated with malloc) take longer to > execute. I am measuring time with bupc_ticks_now() and > bupc_ticks_to_us(). Does anyone now why? Even if I do -pthreads=1 and > -T=1 this is still slower than if I don't use -pthreads at all. Konstantin, There are some additional runtime overhead when pthreads are used, even if -pthreads=1. In particular, all 'local' global data (i.e., global variables which are *not* 'shared') need to be virtualized so that each pthread sees its own copy of the variable. There is some overhead cost for accessing such a variable. This should not be the case with malloc'ed data, per se, although if you are pointing at the malloc'ed memory with a global private pointer, than there will be some overhead associated with accessing the pointer itself. Can you give me some idea of just how much slower pthreads are for you? Are you seeing a slowdown of 10%, 20%, 90%, etc? We are currently investigating ways of improving our pthreads performance, so hopefully we'll have better performance soon, though it's likely that there may always be *some* cost to using them. -- Jason Duell Future Technologies Group <jcduell_at_lbl_dot_gov> Computational Research Division Tel: +1-510-495-2354 Lawrence Berkeley National Laboratory