Re: Issue using UPC on Opteron Cluster

From: Dan Bonachea (bonachea_at_cs_dot_berkeley_dot_edu)
Date: Fri Nov 03 2006 - 05:02:40 PST

  • Next message: Alexandre Chauvin: "Re: Issue using UPC on Opteron Cluster"
    Hi Alexandre -
    
       This is a common experience for UPC programs that are written in a 
    shared-memory style without regard for locality, when run for the first time 
    in a distributed-memory environment (where locality is extremely important for 
    good performance, because communication is orders of magnitude more expensive 
    across nodes than within a shared-memory node).
    
       Chances are you have some communication-related performance bugs in your 
    application - possibly due to the layout of your main data structures, but 
    possibly also just communication "leaks" from accidental or trivial sharing of 
    data where things need to be tuned.
    
       Luckily, we now have a very nice performance tool designed specifically to 
    help UPC programmers find and fix such problems, called the Parallel 
    Performance Wizard:
    
                         http://ppw.hcs.ufl.edu/
    
       I strongly encourage you to download it and give it a try. The PPW team is 
    very receptive to feedback about the performance tool, and probably would even 
    help you to track down your performance issues if they aren't immediately 
    obvious in the tool output.
    
    Hope this helps..
    
    Dan
    
    At 01:30 AM 11/3/2006, Alexandre Chauvin wrote:
    >Hello All --
    >
    >I am facing some issues when trying to run a UPC code on an Opteron Cluster 
    >environement. I am quite newbie with UPC so the answer could be very trivial. 
    >Could you please have a look?
    >
    >I would like to use my whole cluster to do a sort. But, if the code goes very 
    >fast when using 8GB memory with pthreads within a single node, it goes very 
    >slow as soon as I try to use multiple nodes.
    >
    >I tried both vapi and mpi conducts -- infiniband interconnect -- but 
    >performance was very bad. It went from 1min on 1 node to more than 20mins 
    >when using 2nodes!
    >
    >
    >Is it something particular I should do to use multiple nodes mode 
    >efficiently?
    >
    

  • Next message: Alexandre Chauvin: "Re: Issue using UPC on Opteron Cluster"