Re: Problem running on a cluster & performance...

From: Jason Duell (jcduell_at_lbl_dot_gov)
Date: Wed Sep 20 2006 - 10:56:23 PDT

  • Next message: Jason Duell: "Re: Problem running on a cluster & performance..."
    On Tue, Sep 19, 2006 at 10:06:48PM -0300, Konstantin Kleisouris wrote:
    > Hi everyone,
    >    I have two concerns with berkely UPC.
    >    I am trying to run a UPC program on a cluster of linux machines.
    > However, when I type the command:
    > > upcrun -n 8 a.out 
    > I get the message you see below (see after the dashed line). I have
    > compiled my program with -pthreads=4. I really cannot figure out what
    > the problem is. I have generated ssh keys so that when you ssh from one
    > machine to another you don't have to type your password. Also, I have
    > set the UPC_NODES variable to a list of machines in the cluster. I
    > believe that the program (a.out) does not even start executing, because
    > I am supposed to give some arguments to it, but it does not ask me for
    > them (as it should). 
    >     Also, if I compile my program with -pthreads=1 and then without
    > -pthreads at all (so, in both cases there is only one thread) I see
    > difference in performance. In the first case (-pthreads=1) the program
    > is slower than if I don't use -pthreads at all. I noticed that even
    > portions of the UPC program where threads access only private data (for
    > instance arrays that have been generated with malloc) take longer to
    > execute. I am measuring time with bupc_ticks_now() and
    > bupc_ticks_to_us(). Does anyone now why? Even if I do -pthreads=1 and
    > -T=1 this is still slower than if I don't use -pthreads at all.
    There are some additional runtime overhead when pthreads are used, even if
    -pthreads=1.  In particular, all 'local' global data (i.e., global variables
    which are *not* 'shared') need to be virtualized so that each pthread sees its
    own copy of the variable.  There is some overhead cost for accessing such a
    variable.  This should not be the case with malloc'ed data, per se, although if
    you are pointing at the malloc'ed memory with a global private pointer, than
    there will be some overhead associated with accessing the pointer itself.
    Can you give me some idea of just how much slower pthreads are for you?  Are you
    seeing a slowdown of 10%, 20%, 90%, etc?
    We are currently investigating ways of improving our pthreads performance, so
    hopefully we'll have better performance soon, though it's likely that there may
    always be *some* cost to using them.  
    Jason Duell             Future Technologies Group
    <jcduell_at_lbl_dot_gov>       Computational Research Division
    Tel: +1-510-495-2354    Lawrence Berkeley National Laboratory

  • Next message: Jason Duell: "Re: Problem running on a cluster & performance..."