From: Eric Frederich (eric.frederich_at_gmail_dot_com)
Date: Tue Nov 22 2005 - 17:42:04 PST
Dan, First of all, thanks for your quick correspondence. Attached is a file with a list of commands I ran and their outputs. Please let me know if there is anything else I can tell you about my set up. Thanks, ~Eric On 11/22/05, Dan Bonachea <bonachea_at_cs_dot_berkeley_dot_edu> wrote: > > At 03:39 PM 11/22/2005, Eric Frederich wrote: > >Actually, I went and copied the same file over to the other machine so > now > >that shouldn't be the issue. I am still getting what appears to be the > same > >error, a problem with the requested resource, and something about an > invalid > >argument. > > > >I ran md5sum on the binaries and they are exactly the same. > > > >Please help me. I don't know what else could be the problem. Why would > I be > >able to launch 2 processes on one machine, be able to launch 2 processes > on > >another machine, but not be able to launch 1 process on each? > > Please send the information requested in my previous message: > > >Give it another try once you're certain the same binary is present and > >working on all nodes. If it still fails, try appending "-v" to the upcrun > >line to see more details about the startup procedure and send us the > complete > >output. Please also send the output of "uname -a" and "cat /proc/cpuinfo" > on > >each node. > > Dan > > > >~Eric > > > > > >eric@penguin27 build $ ./upcrun -n 2 helloWorld > >UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=17505) > >UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=17514) > >Hello World from thread 2 of 2 ! ! > >Hello World from thread 1 of 2 ! ! > >eric@penguin27 build $ ./upcrun -n 2 helloWorld > >UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=8799) > >UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=8801) > >Hello World from thread 1 of 2 ! ! > >Hello World from thread 2 of 2 ! ! > >eric@penguin27 build $ ./upcrun -n 2 helloWorld > >AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with > >requested resource) > > from function sendPacket > > at > > /home/eric/UPC/berkeley_upc-2.2.1/gasnet/other/amudp/amudp_reqrep.cpp:93 > > reason: Invalid argument > >AMUDP AMUDP_RequestGeneric returning an error code: AM_ERR_RESOURCE > (Problem > >with requested resource) > > at > > /home/eric/UPC/berkeley_upc-2.2.1 > /gasnet/other/amudp/amudp_reqrep.cpp:1200 > > > >GASNet gasnetc_AMRequestShortM encountered an AM Error: > AM_ERR_RESOURCE(3) > > at /home/eric/UPC/berkeley_upc-2.2.1 > /gasnet/udp-conduit/gasnet_core.c:564 > >GASNet gasnetc_AMRequestShortM returning an error code: > GASNET_ERR_RESOURCE > >(Problem with requested resource) > > at /home/eric/UPC/berkeley_upc-2.2.1 > /gasnet/udp-conduit/gasnet_core.c:568 > >*** FATAL ERROR: > >GASNet encountered an error: GASNET_ERR_RESOURCE(3) > > while calling: gasnet_AMRequestShort4(peer, > > gasneti_handleridx(gasnete_ambarrier_notify_reqh), phase, 0, id, flags) > > at gasnete_barrier_notify() at > > /home/eric/UPC/berkeley_upc-2.2.1 > /gasnet/extended-ref/gasnet_extended_refbarrier.c:197 > >*** Caught a fatal signal: SIGABRT(6) on node 1/2 > >eric@penguin27 build $ md5sum /home/eric/UPC/build/helloWorld > >da6cd645cb5562569d38c82173da2ae3 /home/eric/UPC/build/helloWorld > >eric@penguin27 build $ ssh myth md5sum /home/eric/UPC/build/helloWorld > >da6cd645cb5562569d38c82173da2ae3 /home/eric/UPC/build/helloWorld > >eric@penguin27 build $ > > > > > >On 11/22/05, Eric Frederich > ><<mailto:eric_dot_frederich_at_gmail_dot_com>eric_dot_frederich_at_gmail_dot_com> wrote: > >Wow, funny how we need a udp expert to figure out something not related > at > >all. > > > >One of those little subtle things. Later on this evening when I am at > home I > >will be able to test it out with the correct executable. > >Hopefully I will have some good news to report. > > > >Thanks a bunch, > >~Eric > > > > > >On 11/22/05, Dan Bonachea <<mailto:bonachea_at_cs_dot_berkeley_dot_edu> > >bonachea_at_cs_dot_berkeley_dot_edu> wrote: > >Hi Eric - I'm the udp-conduit expert.. > > > >I'm not sure why you're seeing that particular error message, although > based > >on your message below I suspect you have inconsistent copies of the > >executable > >on the two nodes - the penguin27 output is "Hello World from thread 1 of > 2" > >but the myth output is "Hello World" - which probably means the programs > are > >not the same. > > > >Berkeley UPC requires all nodes to be running the *exact* same binary > >executable - if you lack a shared file system then exact copies are fine > >(although error-prone), but it's not OK to recompile one copy and not the > >others. Also, udp-conduit requires all copies of the executable to reside > at > >the same absolute pathname on all clients - so make sure the copies are > all > >mounted or mirrored to the same absolute path. Also, if the nodes may > differ > >in things like shared libraries, you should probably link statically > (upcc > >-Wl,-static) just to be safe. > > > >Give it another try once you're certain the same binary is present and > >working > >on all nodes. If it still fails, try appending "-v" to the upcrun line to > see > >more details about the startup procedure and send us the complete output. > >Please also send the output of "uname -a" and "cat /proc/cpuinfo" on each > >node. > > > >Hope this helps... > >Dan > > > >At 02:35 PM 11/21/2005, Eric Frederich wrote: > > > > It is intersting to note that when the upchostsfile looks like > > > > > > > > <http://192.168.1.207>192.168.1.207 > > <<http://192.168.1.207>http://192.168.1.207> > > > > <http://192.168.1.207>192.168.1.207 < http://192.168.1.207> > > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208> > > > > > > > > and I run it with -n 2 it works fine and I see the following > > > > > > > > UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=12356) > > > > UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=12357) > > > > Hello World from thread 1 of 2 > > > > Hello World from thread 0 of 2 > > > > > > > > Also when I have the file say > > > > > > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208> > > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208> > > > > <http://192.168.1.207>192.168.1.207 < http://192.168.1.207> > > > > > > > > it works fine too and I see the following > > > > > > > > UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=10447) > > > > UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=10446) > > > > Hello World > > > > Hello World > > > > > > > > > >-- > >------------------------ > >Eric L. Frederich > > > > > > > > > >-- > >------------------------ > >Eric L. Frederich > > -- ------------------------ Eric L. Frederich