From: Samy Bahra (sbahra_at_gwu_dot_edu)
Date: Tue Sep 19 2006 - 20:42:42 PDT
Hi Konstantin, First, you ask, "f I compile my program with -pthreads=1 and then without -pthreads at all (so, in both cases there is only one thread) I see difference in performance." Understand threads will run only with locality. Depending on the scheduler of your Linux machines, the threads will be running in the same time-slice. So, for example, if your benchmark is CPU-bound, it could be that threads are saturating the time-slice. For things that are really CPU bound you might as well be spawning new processes unless you are providing a fair spread across CPUs (which is the default behavior of the run-time as far as I understand). Also, note that if the threads are running locally on the same process malloc() itself will have a lot of lock contention. dlmalloc is not known for its scalability with threads so much. If you want to avoid too much contention as you scale with threads please take a look at "jemalloc", a new malloc implementation FreeBSD is using optimized for threaded applications. A paper describing it is available at http://www.bsdcan.org/2006/papers/jemalloc.pdf This is something I would like to look into for the future for UPC's memory allocation stubs (LD_PRELOAD the run-time?). Not too sure of the GASNet issue. Regards. -- Samy Al Bahra `------ http://samy.kerneled.org/ ----- Original Message ----- From: Konstantin Kleisouris <kkonst_at_cs_dot_rutgers_dot_edu> Date: Tuesday, September 19, 2006 9:07 pm Subject: Problem running on a cluster & performance... To: upc-users_at_lbl_dot_gov > Hi everyone, > > I have two concerns with berkely UPC. > I am trying to run a UPC program on a cluster of linux machines. > However, when I type the command: > > upcrun -n 8 a.out > > I get the message you see below (see after the dashed line). I have > compiled my program with -pthreads=4. I really cannot figure out what > the problem is. I have generated ssh keys so that when you ssh from one > machine to another you don't have to type your password. Also, I have > set the UPC_NODES variable to a list of machines in the cluster. I > believe that the program (a.out) does not even start executing, because > I am supposed to give some arguments to it, but it does not ask me for > them (as it should). > > Also, if I compile my program with -pthreads=1 and then without > -pthreads at all (so, in both cases there is only one thread) I see > difference in performance. In the first case (-pthreads=1) the program > is slower than if I don't use -pthreads at all. I noticed that even > portions of the UPC program where threads access only private data (for > instance arrays that have been generated with malloc) take longer to > execute. I am measuring time with bupc_ticks_now() and > bupc_ticks_to_us(). Does anyone now why? Even if I do -pthreads=1 and > -T=1 this is still slower than if I don't use -pthreads at all. > > Sincerely, > Kosta > > > > ---------------------------------------------------------------- > > AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with > requested resource) > from function sendPacket > at amudp_reqrep.cpp:93 > reason: Invalid argument > AMUDP AMUDP_RequestGeneric returning an error code: AM_ERR_RESOURCE > (Problem with requested resource) > at amudp_reqrep.cpp:1200 > > GASNet gasnetc_AMRequestShortM encountered an AM Error: > AM_ERR_RESOURCE(3) > at > /home/kkonst/UPC/berkeley_upc-2.2.2/gasnet/udp-conduit/gasnet_core.c:564 > GASNet gasnetc_AMRequestShortM returning an error code: > GASNET_ERR_RESOURCE (Problem with requested resource) > at > /home/kkonst/UPC/berkeley_upc-2.2.2/gasnet/udp-conduit/gasnet_core.c:568 > *** FATAL ERROR: > GASNet encountered an error: GASNET_ERR_RESOURCE(3) > while calling: gasnet_AMRequestShort4(peer, > gasneti_handleridx(gasnete_ambarrier_notify_reqh), phase, 0, id, flags) > at gasnete_barrier_notify() at > /home/kkonst/UPC/berkeley_upc-2.2.2/gasnet/extended-ref/gasnet_extended_refbarrier.c:197 > *** Caught a fatal signal: SIGABRT(6) on node 0/2