Re: problems running UPC programs

From: Dan Bonachea (bonachea_at_cs_dot_berkeley_dot_edu)
Date: Tue Nov 22 2005 - 16:04:45 PST

  • Next message: Eric Frederich: "Re: problems running UPC programs"
    At 03:39 PM 11/22/2005, Eric Frederich wrote:
    >Actually, I went and copied the same file over to the other machine so now 
    >that shouldn't be the issue.  I am still getting what appears to be the same 
    >error, a problem with the requested resource,  and something about an invalid 
    >argument.
    >
    >I ran md5sum on the binaries and they are exactly the same.
    >
    >Please help me.  I don't know what else could be the problem.  Why would I be 
    >able to launch 2 processes on one machine, be able to launch 2 processes on 
    >another machine, but not be able to launch 1 process on each?
    
    Please send the information requested in my previous message:
    
    >Give it another try once you're certain the same binary is present and 
    >working on all nodes. If it still fails, try appending "-v" to the upcrun 
    >line to see more details about the startup procedure and send us the complete 
    >output. Please also send the output of "uname -a" and "cat /proc/cpuinfo" on 
    >each node.
    
    Dan
    
    
    >~Eric
    >
    >
    >eric@penguin27 build $ ./upcrun -n 2 helloWorld
    >UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=17505)
    >UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=17514)
    >Hello World from thread 2 of 2 ! !
    >Hello World from thread 1 of 2 ! !
    >eric@penguin27 build $ ./upcrun -n 2 helloWorld
    >UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=8799)
    >UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=8801)
    >Hello World from thread 1 of 2 ! !
    >Hello World from thread 2 of 2 ! !
    >eric@penguin27 build $ ./upcrun -n 2 helloWorld
    >AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with 
    >requested resource)
    >   from function sendPacket
    >   at 
    > /home/eric/UPC/berkeley_upc-2.2.1/gasnet/other/amudp/amudp_reqrep.cpp:93
    >   reason: Invalid argument
    >AMUDP AMUDP_RequestGeneric returning an error code: AM_ERR_RESOURCE (Problem 
    >with requested resource)
    >   at 
    > /home/eric/UPC/berkeley_upc-2.2.1/gasnet/other/amudp/amudp_reqrep.cpp:1200
    >
    >GASNet gasnetc_AMRequestShortM encountered an AM Error: AM_ERR_RESOURCE(3)
    >   at /home/eric/UPC/berkeley_upc-2.2.1/gasnet/udp-conduit/gasnet_core.c:564
    >GASNet gasnetc_AMRequestShortM returning an error code: GASNET_ERR_RESOURCE 
    >(Problem with requested resource)
    >   at /home/eric/UPC/berkeley_upc-2.2.1/gasnet/udp-conduit/gasnet_core.c:568
    >*** FATAL ERROR:
    >GASNet encountered an error: GASNET_ERR_RESOURCE(3)
    >   while calling: gasnet_AMRequestShort4(peer, 
    > gasneti_handleridx(gasnete_ambarrier_notify_reqh), phase, 0, id, flags)
    >   at gasnete_barrier_notify() at 
    > /home/eric/UPC/berkeley_upc-2.2.1/gasnet/extended-ref/gasnet_extended_refbarrier.c:197
    >*** Caught a fatal signal: SIGABRT(6) on node 1/2
    >eric@penguin27 build $ md5sum /home/eric/UPC/build/helloWorld
    >da6cd645cb5562569d38c82173da2ae3  /home/eric/UPC/build/helloWorld
    >eric@penguin27 build $ ssh myth md5sum /home/eric/UPC/build/helloWorld
    >da6cd645cb5562569d38c82173da2ae3  /home/eric/UPC/build/helloWorld
    >eric@penguin27 build $
    >
    >
    >On 11/22/05, Eric Frederich 
    ><<mailto:eric_dot_frederich_at_gmail_dot_com>eric_dot_frederich_at_gmail_dot_com> wrote:
    >Wow, funny how we need a udp expert to figure out something not related at 
    >all.
    >
    >One of those little subtle things.  Later on this evening when I am at home I 
    >will be able to test it out with the correct executable.
    >Hopefully I will have some good news to report.
    >
    >Thanks a bunch,
    >~Eric
    >
    >
    >On 11/22/05, Dan Bonachea <<mailto:bonachea_at_cs_dot_berkeley_dot_edu> 
    >bonachea_at_cs_dot_berkeley_dot_edu> wrote:
    >Hi Eric - I'm the udp-conduit expert..
    >
    >I'm not sure why you're seeing that particular error message, although based
    >on your message below I suspect you have inconsistent copies of the 
    >executable
    >on the two nodes - the penguin27 output is "Hello World from thread 1 of 2"
    >but the myth output is "Hello World" - which probably means the programs are
    >not the same.
    >
    >Berkeley UPC requires all nodes to be running the *exact* same binary
    >executable - if you lack a shared file system then exact copies are fine
    >(although error-prone), but it's not OK to recompile one copy and not the
    >others. Also, udp-conduit requires all copies of the executable to reside at
    >the same absolute pathname on all clients - so make sure the copies are all
    >mounted or mirrored to the same absolute path. Also, if the nodes may differ
    >in things like shared libraries, you should probably link statically (upcc
    >-Wl,-static) just to be safe.
    >
    >Give it another try once you're certain the same binary is present and 
    >working
    >on all nodes. If it still fails, try appending "-v" to the upcrun line to see
    >more details about the startup procedure and send us the complete output.
    >Please also send the output of "uname -a" and "cat /proc/cpuinfo" on each
    >node.
    >
    >Hope this helps...
    >Dan
    >
    >At 02:35 PM 11/21/2005, Eric Frederich wrote:
    > > > It is intersting to note that when the upchostsfile looks like
    > > >
    > > > <http://192.168.1.207>192.168.1.207 
    > <<http://192.168.1.207>http://192.168.1.207>
    > > > <http://192.168.1.207>192.168.1.207 < http://192.168.1.207>
    > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208>
    > > >
    > > > and I run it with -n 2 it works fine and I see the following
    > > >
    > > > UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=12356)
    > > > UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=12357)
    > > > Hello World from thread 1 of 2
    > > > Hello World from thread 0 of 2
    > > >
    > > > Also when I have the file say
    > > >
    > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208>
    > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208>
    > > > <http://192.168.1.207>192.168.1.207 < http://192.168.1.207>
    > > >
    > > > it works fine too and I see the following
    > > >
    > > > UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=10447)
    > > > UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=10446)
    > > > Hello World
    > > > Hello World
    >
    >
    >
    >
    >--
    >------------------------
    >Eric L. Frederich
    >
    >
    >
    >
    >--
    >------------------------
    >Eric L. Frederich
    

  • Next message: Eric Frederich: "Re: problems running UPC programs"