From: Dan Bonachea (bonachea_at_cs_dot_berkeley_dot_edu)
Date: Fri Nov 03 2006 - 05:02:40 PST
Hi Alexandre - This is a common experience for UPC programs that are written in a shared-memory style without regard for locality, when run for the first time in a distributed-memory environment (where locality is extremely important for good performance, because communication is orders of magnitude more expensive across nodes than within a shared-memory node). Chances are you have some communication-related performance bugs in your application - possibly due to the layout of your main data structures, but possibly also just communication "leaks" from accidental or trivial sharing of data where things need to be tuned. Luckily, we now have a very nice performance tool designed specifically to help UPC programmers find and fix such problems, called the Parallel Performance Wizard: http://ppw.hcs.ufl.edu/ I strongly encourage you to download it and give it a try. The PPW team is very receptive to feedback about the performance tool, and probably would even help you to track down your performance issues if they aren't immediately obvious in the tool output. Hope this helps.. Dan At 01:30 AM 11/3/2006, Alexandre Chauvin wrote: >Hello All -- > >I am facing some issues when trying to run a UPC code on an Opteron Cluster >environement. I am quite newbie with UPC so the answer could be very trivial. >Could you please have a look? > >I would like to use my whole cluster to do a sort. But, if the code goes very >fast when using 8GB memory with pthreads within a single node, it goes very >slow as soon as I try to use multiple nodes. > >I tried both vapi and mpi conducts -- infiniband interconnect -- but >performance was very bad. It went from 1min on 1 node to more than 20mins >when using 2nodes! > > >Is it something particular I should do to use multiple nodes mode >efficiently? >