From: Dan Bonachea (bonachea_at_cs_dot_berkeley_dot_edu)
Date: Fri Nov 03 2006 - 14:31:34 PST
At 06:13 AM 11/3/2006, Alexandre Chauvin wrote: >Thank you Dan for this quick reply. I will give it a try even if I saw that I >have to download the Beta version 2.3.16 of UPC to be compatible with PPW. I >was trying with version 2.2.2 so far. >I will keep you updated. Great - let us know how it goes.. >Another quick question is that I am facing some difficulties to allocate more >than 1.3GB of RAM per thread with the vapi conduct. Can U see any explanation >for that? Did you pass the -shared-heap flag to upcrun to request the amount of shared memory size you need? If it's still not behaving the way you believe it should, please submit a bug report with complete information on your setup and how to reproduce the problem at http://upc-bugs.lbl.gov Thanks.. Dan >On 11/3/06, Dan Bonachea ><<mailto:[email protected]>bonachea_at_cs_dot_berkeley_dot_edu> wrote: >Hi Alexandre - > > This is a common experience for UPC programs that are written in a >shared-memory style without regard for locality, when run for the first time >in a distributed-memory environment (where locality is extremely important >for >good performance, because communication is orders of magnitude more expensive >across nodes than within a shared-memory node). > > Chances are you have some communication-related performance bugs in your >application - possibly due to the layout of your main data structures, but >possibly also just communication "leaks" from accidental or trivial sharing >of >data where things need to be tuned. > > Luckily, we now have a very nice performance tool designed specifically > to >help UPC programmers find and fix such problems, called the Parallel >Performance Wizard: > > <http://ppw.hcs.ufl.edu/>http://ppw.hcs.ufl.edu/ > > I strongly encourage you to download it and give it a try. The PPW team > is >very receptive to feedback about the performance tool, and probably would >even >help you to track down your performance issues if they aren't immediately >obvious in the tool output. > >Hope this helps.. > >Dan > >At 01:30 AM 11/3/2006, Alexandre Chauvin wrote: > >Hello All -- > > > >I am facing some issues when trying to run a UPC code on an Opteron Cluster > >environement. I am quite newbie with UPC so the answer could be very > trivial. > >Could you please have a look? > > > >I would like to use my whole cluster to do a sort. But, if the code goes > very > >fast when using 8GB memory with pthreads within a single node, it goes very > >slow as soon as I try to use multiple nodes. > > > >I tried both vapi and mpi conducts -- infiniband interconnect -- but > >performance was very bad. It went from 1min on 1 node to more than 20mins > >when using 2nodes! > > > > > >Is it something particular I should do to use multiple nodes mode > >efficiently? > > >