jcduell_at_lbl_dot_gov
Date: Tue Dec 06 2005 - 10:06:33 PST
On Tue, Dec 06, 2005 at 10:26:55AM -0500, Eric Frederich wrote: > I plan on doing some benchmarking of various parts of UPC. My test setup at > work consists of about 40 or so Linux workstations each with two Opteron > processors. > I am wondering what is the best way to run parallel programs on these > machines? > > Should I list each workstation twice in $UPC_NODES or just once? > Should I pass -pthreads to upcc to enable pthread support? > What aguments should I use with upcrun (-n -nodes -c and -p)? Generally, the fastest way to run on a cluster of SMPs is to run 1 process per machine, using -pthreads. That way you get shared memory between UPC threads that live on the same node. So, for a 3-node cluster of 2-way SMPs, try upcc -pthreads=2 foo.upc export UPC_NODES="node1 node2 node3" upcrun -n 6 a.out Note that the default value of -pthreads is 2, so it's superfluous here, but note the general idea (that you can embed a default number of pthreads per process into the UPC executable). You could also pass '-p 2' to upcrun to set the number of pthreads per process at run time. There are some issues with using -pthreads that you may run into. Most commonly, some math/science libraries aren't pthread-safe, and will break under -pthreads. See the section "Using pthreaded Berkeley UPC programs" in our User's Guide. If you run into these issues, just use 1 UPC thread per process: upcc foo.upc export UPC_NODES="node1 node1 node2 node2 node3 node3" upcrun -n 6 a.out Cheers, -- Jason Duell Future Technologies Group <jcduell_at_lbl_dot_gov> Computational Research Division Tel: +1-510-495-2354 Lawrence Berkeley National Laboratory