Re: problems running UPC programs

From: Eric Frederich (eric.frederich_at_gmail_dot_com)
Date: Tue Nov 22 2005 - 17:42:04 PST

  • Next message: Dan Bonachea: "Re: problems running UPC programs"
    Dan,
         First of all, thanks for your quick correspondence.  Attached is a file
    with a list of commands I ran and their outputs.  Please let me know if
    there is anything else I can tell you about my set up.
    
    Thanks,
    ~Eric
    
    On 11/22/05, Dan Bonachea <bonachea_at_cs_dot_berkeley_dot_edu> wrote:
    >
    > At 03:39 PM 11/22/2005, Eric Frederich wrote:
    > >Actually, I went and copied the same file over to the other machine so
    > now
    > >that shouldn't be the issue.  I am still getting what appears to be the
    > same
    > >error, a problem with the requested resource,  and something about an
    > invalid
    > >argument.
    > >
    > >I ran md5sum on the binaries and they are exactly the same.
    > >
    > >Please help me.  I don't know what else could be the problem.  Why would
    > I be
    > >able to launch 2 processes on one machine, be able to launch 2 processes
    > on
    > >another machine, but not be able to launch 1 process on each?
    >
    > Please send the information requested in my previous message:
    >
    > >Give it another try once you're certain the same binary is present and
    > >working on all nodes. If it still fails, try appending "-v" to the upcrun
    > >line to see more details about the startup procedure and send us the
    > complete
    > >output. Please also send the output of "uname -a" and "cat /proc/cpuinfo"
    > on
    > >each node.
    >
    > Dan
    >
    >
    > >~Eric
    > >
    > >
    > >eric@penguin27 build $ ./upcrun -n 2 helloWorld
    > >UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=17505)
    > >UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=17514)
    > >Hello World from thread 2 of 2 ! !
    > >Hello World from thread 1 of 2 ! !
    > >eric@penguin27 build $ ./upcrun -n 2 helloWorld
    > >UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=8799)
    > >UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=8801)
    > >Hello World from thread 1 of 2 ! !
    > >Hello World from thread 2 of 2 ! !
    > >eric@penguin27 build $ ./upcrun -n 2 helloWorld
    > >AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with
    > >requested resource)
    > >   from function sendPacket
    > >   at
    > > /home/eric/UPC/berkeley_upc-2.2.1/gasnet/other/amudp/amudp_reqrep.cpp:93
    > >   reason: Invalid argument
    > >AMUDP AMUDP_RequestGeneric returning an error code: AM_ERR_RESOURCE
    > (Problem
    > >with requested resource)
    > >   at
    > > /home/eric/UPC/berkeley_upc-2.2.1
    > /gasnet/other/amudp/amudp_reqrep.cpp:1200
    > >
    > >GASNet gasnetc_AMRequestShortM encountered an AM Error:
    > AM_ERR_RESOURCE(3)
    > >   at /home/eric/UPC/berkeley_upc-2.2.1
    > /gasnet/udp-conduit/gasnet_core.c:564
    > >GASNet gasnetc_AMRequestShortM returning an error code:
    > GASNET_ERR_RESOURCE
    > >(Problem with requested resource)
    > >   at /home/eric/UPC/berkeley_upc-2.2.1
    > /gasnet/udp-conduit/gasnet_core.c:568
    > >*** FATAL ERROR:
    > >GASNet encountered an error: GASNET_ERR_RESOURCE(3)
    > >   while calling: gasnet_AMRequestShort4(peer,
    > > gasneti_handleridx(gasnete_ambarrier_notify_reqh), phase, 0, id, flags)
    > >   at gasnete_barrier_notify() at
    > > /home/eric/UPC/berkeley_upc-2.2.1
    > /gasnet/extended-ref/gasnet_extended_refbarrier.c:197
    > >*** Caught a fatal signal: SIGABRT(6) on node 1/2
    > >eric@penguin27 build $ md5sum /home/eric/UPC/build/helloWorld
    > >da6cd645cb5562569d38c82173da2ae3  /home/eric/UPC/build/helloWorld
    > >eric@penguin27 build $ ssh myth md5sum /home/eric/UPC/build/helloWorld
    > >da6cd645cb5562569d38c82173da2ae3  /home/eric/UPC/build/helloWorld
    > >eric@penguin27 build $
    > >
    > >
    > >On 11/22/05, Eric Frederich
    > ><<mailto:eric_dot_frederich_at_gmail_dot_com>eric_dot_frederich_at_gmail_dot_com> wrote:
    > >Wow, funny how we need a udp expert to figure out something not related
    > at
    > >all.
    > >
    > >One of those little subtle things.  Later on this evening when I am at
    > home I
    > >will be able to test it out with the correct executable.
    > >Hopefully I will have some good news to report.
    > >
    > >Thanks a bunch,
    > >~Eric
    > >
    > >
    > >On 11/22/05, Dan Bonachea <<mailto:bonachea_at_cs_dot_berkeley_dot_edu>
    > >bonachea_at_cs_dot_berkeley_dot_edu> wrote:
    > >Hi Eric - I'm the udp-conduit expert..
    > >
    > >I'm not sure why you're seeing that particular error message, although
    > based
    > >on your message below I suspect you have inconsistent copies of the
    > >executable
    > >on the two nodes - the penguin27 output is "Hello World from thread 1 of
    > 2"
    > >but the myth output is "Hello World" - which probably means the programs
    > are
    > >not the same.
    > >
    > >Berkeley UPC requires all nodes to be running the *exact* same binary
    > >executable - if you lack a shared file system then exact copies are fine
    > >(although error-prone), but it's not OK to recompile one copy and not the
    > >others. Also, udp-conduit requires all copies of the executable to reside
    > at
    > >the same absolute pathname on all clients - so make sure the copies are
    > all
    > >mounted or mirrored to the same absolute path. Also, if the nodes may
    > differ
    > >in things like shared libraries, you should probably link statically
    > (upcc
    > >-Wl,-static) just to be safe.
    > >
    > >Give it another try once you're certain the same binary is present and
    > >working
    > >on all nodes. If it still fails, try appending "-v" to the upcrun line to
    > see
    > >more details about the startup procedure and send us the complete output.
    > >Please also send the output of "uname -a" and "cat /proc/cpuinfo" on each
    > >node.
    > >
    > >Hope this helps...
    > >Dan
    > >
    > >At 02:35 PM 11/21/2005, Eric Frederich wrote:
    > > > > It is intersting to note that when the upchostsfile looks like
    > > > >
    > > > > <http://192.168.1.207>192.168.1.207
    > > <<http://192.168.1.207>http://192.168.1.207>
    > > > > <http://192.168.1.207>192.168.1.207 < http://192.168.1.207>
    > > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208>
    > > > >
    > > > > and I run it with -n 2 it works fine and I see the following
    > > > >
    > > > > UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=12356)
    > > > > UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=12357)
    > > > > Hello World from thread 1 of 2
    > > > > Hello World from thread 0 of 2
    > > > >
    > > > > Also when I have the file say
    > > > >
    > > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208>
    > > > > <http://192.168.1.208>192.168.1.208 < http://192.168.1.208>
    > > > > <http://192.168.1.207>192.168.1.207 < http://192.168.1.207>
    > > > >
    > > > > it works fine too and I see the following
    > > > >
    > > > > UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=10447)
    > > > > UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=10446)
    > > > > Hello World
    > > > > Hello World
    > >
    > >
    > >
    > >
    > >--
    > >------------------------
    > >Eric L. Frederich
    > >
    > >
    > >
    > >
    > >--
    > >------------------------
    > >Eric L. Frederich
    >
    >
    
    
    --
    ------------------------
    Eric L. Frederich
    
    
    


  • Next message: Dan Bonachea: "Re: problems running UPC programs"