From: Eric Frederich (eric.frederich_at_gmail_dot_com)
Date: Mon Nov 21 2005 - 14:35:03 PST
It looks like either my e-mail program or your's appended a < http://192.168.1.20X> after each IP address when I was pasting my file. There are no firewalls on my network. I was not calling a barrier in my example program. I think you understand what I had said before but let me reiterate... I only have UPC installed on one machine on my network (penguin27 or 192.168.1.207 <http://192.168.1.207>) I was able to start two threads on penguin27 from penguin27. For the purposes of this very first test I did not set a common shared space with samba yet and I manually copied the executable over to the other machine (myth or 192.168.1.208 <http://192.168.1.208>) I was able to start two threads on myth from penguin27. The error occurs when I try starting 1 process on each machine. Is there a different code path that gets executed when more than one machine will be used? Possibly some initialization or coordination procedures? Thanks, ~Eric On 11/21/05, jcduell_at_lbl_dot_gov <jcduell_at_lbl_dot_gov> wrote: > > Eric, > > Hmm, it is a bit strange that you can run jobs on either machine, but > not both. I can't tell from the output whether this is an error in our > UDP layer, or some sort of configuration issue. > > Are there any firewall limitations between the two machines? That would > be good to know, although since you've already gotten to a barrier call, > I assume basic network connectivity must have been established OK. > > I'm going to let our resident UDP expert have a look at this one, too. > > So did you get a samba-based shared filesystem working, or are you > manually copying the executable to the nodes? > > > So it is actually starting remote processes and comes back with the name > of > > the machine even though I specified the IP address. > > This at least I can explain--we get the machine name to print out from > calling "hostname()", so we get the DNS name even if you've used raw IP > addresses in your hosts file. > > -- > Jason Duell Future Technologies Group > <jcduell_at_lbl_dot_gov> Computational Research Division > Tel: +1-510-495-2354 Lawrence Berkeley National Laboratory > > > On Sat, Nov 19, 2005 at 10:21:34AM -0500, Eric Frederich wrote: > > Hello, > > > > I am having trouble now trying to run it on a remote computer. I made a > file > > /home/eric/upchosts which has 192.168.1.207 <http://192.168.1.207> < > http://192.168.1.207> on one > > line and 192.168.1.208 <http://192.168.1.208> <http://192.168.1.208> on > the next line. Then I did > > "export UPC_NODEFILE=/home/eric/upchosts". > > When I run "./upcrun -n 2 hello" I get the following error... > > > > $ ./upcrun -n 2 hello > > AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with > > requested resource) > > from function sendPacket > > at /home/eric/UPC/berkeley_upc-2.2.1 > /gasnet/other/amudp/amudp_reqrep.cpp:93 > > reason: Invalid argument > > AMUDP AMUDP_RequestGeneric returning an error code: AM_ERR_RESOURCE > (Problem > > with requested resource) > > at /home/eric/UPC/berkeley_upc-2.2.1 > > /gasnet/other/amudp/amudp_reqrep.cpp:1200 > > > > GASNet gasnetc_AMRequestShortM encountered an AM Error: > AM_ERR_RESOURCE(3) > > at /home/eric/UPC/berkeley_upc-2.2.1 > /gasnet/udp-conduit/gasnet_core.c:564 > > GASNet gasnetc_AMRequestShortM returning an error code: > GASNET_ERR_RESOURCE > > (Problem with requested resource) > > at /home/eric/UPC/berkeley_upc-2.2.1 > /gasnet/udp-conduit/gasnet_core.c:568 > > *** FATAL ERROR: > > GASNet encountered an error: GASNET_ERR_RESOURCE(3) > > while calling: gasnet_AMRequestShort4(peer, > > gasneti_handleridx(gasnete_ambarrier_notify_reqh), phase, 0, id, flags) > > at gasnete_barrier_notify() at /home/eric/UPC/berkeley_upc-2.2.1 > > /gasnet/extended-ref/gasnet_extended_refbarrier.c:197 > > *** Caught a fatal signal: SIGABRT(6) on node 1/2 > > > > It is intersting to note that when the upchostsfile looks like > > > > 192.168.1.207 <http://192.168.1.207> <http://192.168.1.207> > > 192.168.1.207 <http://192.168.1.207> <http://192.168.1.207> > > 192.168.1.208 <http://192.168.1.208> <http://192.168.1.208> > > > > and I run it with -n 2 it works fine and I see the following > > > > UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=12356) > > UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=12357) > > Hello World from thread 1 of 2 > > Hello World from thread 0 of 2 > > > > Also when I have the file say > > > > 192.168.1.208 <http://192.168.1.208> <http://192.168.1.208> > > 192.168.1.208 <http://192.168.1.208> <http://192.168.1.208> > > 192.168.1.207 <http://192.168.1.207> <http://192.168.1.207> > > > > it works fine too and I see the following > > > > UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=10447) > > UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=10446) > > Hello World > > Hello World > > > > So it is actually starting remote processes and comes back with the name > of > > the machine even though I specified the IP address. > > > > Any ideas why I can create multiple threads on local host, I can create > > multiple threads on a remote host, but I can't create one on each? > > > > Thanks, > > ~Eric > > -- ------------------------ Eric L. Frederich 321-246-1854