From: Eric Frederich (eric.frederich_at_gmail_dot_com)
Date: Tue Nov 22 2005 - 17:42:04 PST
Dan,
First of all, thanks for your quick correspondence. Attached is a file
with a list of commands I ran and their outputs. Please let me know if
there is anything else I can tell you about my set up.
Thanks,
~Eric
On 11/22/05, Dan Bonachea <bonachea_at_cs_dot_berkeley_dot_edu> wrote:
>
> At 03:39 PM 11/22/2005, Eric Frederich wrote:
> >Actually, I went and copied the same file over to the other machine so
> now
> >that shouldn't be the issue. I am still getting what appears to be the
> same
> >error, a problem with the requested resource, and something about an
> invalid
> >argument.
> >
> >I ran md5sum on the binaries and they are exactly the same.
> >
> >Please help me. I don't know what else could be the problem. Why would
> I be
> >able to launch 2 processes on one machine, be able to launch 2 processes
> on
> >another machine, but not be able to launch 1 process on each?
>
> Please send the information requested in my previous message:
>
> >Give it another try once you're certain the same binary is present and
> >working on all nodes. If it still fails, try appending "-v" to the upcrun
> >line to see more details about the startup procedure and send us the
> complete
> >output. Please also send the output of "uname -a" and "cat /proc/cpuinfo"
> on
> >each node.
>
> Dan
>
>
> >~Eric
> >
> >
> >eric@penguin27 build $ ./upcrun -n 2 helloWorld
> >UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=17505)
> >UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=17514)
> >Hello World from thread 2 of 2 ! !
> >Hello World from thread 1 of 2 ! !
> >eric@penguin27 build $ ./upcrun -n 2 helloWorld
> >UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=8799)
> >UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=8801)
> >Hello World from thread 1 of 2 ! !
> >Hello World from thread 2 of 2 ! !
> >eric@penguin27 build $ ./upcrun -n 2 helloWorld
> >AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with
> >requested resource)
> > from function sendPacket
> > at
> > /home/eric/UPC/berkeley_upc-2.2.1/gasnet/other/amudp/amudp_reqrep.cpp:93
> > reason: Invalid argument
> >AMUDP AMUDP_RequestGeneric returning an error code: AM_ERR_RESOURCE
> (Problem
> >with requested resource)
> > at
> > /home/eric/UPC/berkeley_upc-2.2.1
> /gasnet/other/amudp/amudp_reqrep.cpp:1200
> >
> >GASNet gasnetc_AMRequestShortM encountered an AM Error:
> AM_ERR_RESOURCE(3)
> > at /home/eric/UPC/berkeley_upc-2.2.1
> /gasnet/udp-conduit/gasnet_core.c:564
> >GASNet gasnetc_AMRequestShortM returning an error code:
> GASNET_ERR_RESOURCE
> >(Problem with requested resource)
> > at /home/eric/UPC/berkeley_upc-2.2.1
> /gasnet/udp-conduit/gasnet_core.c:568
> >*** FATAL ERROR:
> >GASNet encountered an error: GASNET_ERR_RESOURCE(3)
> > while calling: gasnet_AMRequestShort4(peer,
> > gasneti_handleridx(gasnete_ambarrier_notify_reqh), phase, 0, id, flags)
> > at gasnete_barrier_notify() at
> > /home/eric/UPC/berkeley_upc-2.2.1
> /gasnet/extended-ref/gasnet_extended_refbarrier.c:197
> >*** Caught a fatal signal: SIGABRT(6) on node 1/2
> >eric@penguin27 build $ md5sum /home/eric/UPC/build/helloWorld
> >da6cd645cb5562569d38c82173da2ae3 /home/eric/UPC/build/helloWorld
> >eric@penguin27 build $ ssh myth md5sum /home/eric/UPC/build/helloWorld
> >da6cd645cb5562569d38c82173da2ae3 /home/eric/UPC/build/helloWorld
> >eric@penguin27 build $
> >
> >
> >On 11/22/05, Eric Frederich
> ><<mailto:eric_dot_frederich_at_gmail_dot_com>eric_dot_frederich_at_gmail_dot_com> wrote:
> >Wow, funny how we need a udp expert to figure out something not related
> at
> >all.
> >
> >One of those little subtle things. Later on this evening when I am at
> home I
> >will be able to test it out with the correct executable.
> >Hopefully I will have some good news to report.
> >
> >Thanks a bunch,
> >~Eric
> >
> >
> >On 11/22/05, Dan Bonachea <<mailto:bonachea_at_cs_dot_berkeley_dot_edu>
> >bonachea_at_cs_dot_berkeley_dot_edu> wrote:
> >Hi Eric - I'm the udp-conduit expert..
> >
> >I'm not sure why you're seeing that particular error message, although
> based
> >on your message below I suspect you have inconsistent copies of the
> >executable
> >on the two nodes - the penguin27 output is "Hello World from thread 1 of
> 2"
> >but the myth output is "Hello World" - which probably means the programs
> are
> >not the same.
> >
> >Berkeley UPC requires all nodes to be running the *exact* same binary
> >executable - if you lack a shared file system then exact copies are fine
> >(although error-prone), but it's not OK to recompile one copy and not the
> >others. Also, udp-conduit requires all copies of the executable to reside
> at
> >the same absolute pathname on all clients - so make sure the copies are
> all
> >mounted or mirrored to the same absolute path. Also, if the nodes may
> differ
> >in things like shared libraries, you should probably link statically
> (upcc
> >-Wl,-static) just to be safe.
> >
> >Give it another try once you're certain the same binary is present and
> >working
> >on all nodes. If it still fails, try appending "-v" to the upcrun line to
> see
> >more details about the startup procedure and send us the complete output.
> >Please also send the output of "uname -a" and "cat /proc/cpuinfo" on each
> >node.
> >
> >Hope this helps...
> >Dan
> >
> >At 02:35 PM 11/21/2005, Eric Frederich wrote:
> > > > It is intersting to note that when the upchostsfile looks like
> > > >
> > > > <http://192.168.1.207>192.168.1.207
> > <<http://192.168.1.207>http://192.168.1.207>
> > > > <http://192.168.1.207>192.168.1.207 < http://192.168.1.207>
> > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208>
> > > >
> > > > and I run it with -n 2 it works fine and I see the following
> > > >
> > > > UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=12356)
> > > > UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=12357)
> > > > Hello World from thread 1 of 2
> > > > Hello World from thread 0 of 2
> > > >
> > > > Also when I have the file say
> > > >
> > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208>
> > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208>
> > > > <http://192.168.1.207>192.168.1.207 < http://192.168.1.207>
> > > >
> > > > it works fine too and I see the following
> > > >
> > > > UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=10447)
> > > > UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=10446)
> > > > Hello World
> > > > Hello World
> >
> >
> >
> >
> >--
> >------------------------
> >Eric L. Frederich
> >
> >
> >
> >
> >--
> >------------------------
> >Eric L. Frederich
>
>
--
------------------------
Eric L. Frederich