From: Alexandre Chauvin (alexandre.chauvin_at_gmail_dot_com)
Date: Fri Nov 03 2006 - 06:13:42 PST
Thank you Dan for this quick reply. I will give it a try even if I saw that I have to download the Beta version 2.3.16 of UPC to be compatible with PPW. I was trying with version 2.2.2 so far. I will keep you updated. Another quick question is that I am facing some difficulties to allocate more than 1.3GB of RAM per thread with the vapi conduct. Can U see any explanation for that? On 11/3/06, Dan Bonachea <bonachea_at_cs_dot_berkeley_dot_edu> wrote: > > Hi Alexandre - > > This is a common experience for UPC programs that are written in a > shared-memory style without regard for locality, when run for the first > time > in a distributed-memory environment (where locality is extremely important > for > good performance, because communication is orders of magnitude more > expensive > across nodes than within a shared-memory node). > > Chances are you have some communication-related performance bugs in > your > application - possibly due to the layout of your main data structures, but > possibly also just communication "leaks" from accidental or trivial > sharing of > data where things need to be tuned. > > Luckily, we now have a very nice performance tool designed specifically > to > help UPC programmers find and fix such problems, called the Parallel > Performance Wizard: > > http://ppw.hcs.ufl.edu/ > > I strongly encourage you to download it and give it a try. The PPW team > is > very receptive to feedback about the performance tool, and probably would > even > help you to track down your performance issues if they aren't immediately > obvious in the tool output. > > Hope this helps.. > > Dan > > At 01:30 AM 11/3/2006, Alexandre Chauvin wrote: > >Hello All -- > > > >I am facing some issues when trying to run a UPC code on an Opteron > Cluster > >environement. I am quite newbie with UPC so the answer could be very > trivial. > >Could you please have a look? > > > >I would like to use my whole cluster to do a sort. But, if the code goes > very > >fast when using 8GB memory with pthreads within a single node, it goes > very > >slow as soon as I try to use multiple nodes. > > > >I tried both vapi and mpi conducts -- infiniband interconnect -- but > >performance was very bad. It went from 1min on 1 node to more than 20mins > >when using 2nodes! > > > > > >Is it something particular I should do to use multiple nodes mode > >efficiently? > > > >