Re: problems running UPC programs

From: Eric Frederich (eric.frederich_at_gmail_dot_com)
Date: Mon Nov 21 2005 - 14:35:03 PST

  • Next message: Dan Bonachea: "Re: problems running UPC programs"
    It looks like either my e-mail program or your's appended a <
    http://192.168.1.20X> after each IP address when I was pasting my file.
    
    There are no firewalls on my network.
    I was not calling a barrier in my example program.
    I think you understand what I had said before but let me reiterate...
    I only have UPC installed on one machine on my network (penguin27 or
    192.168.1.207 <http://192.168.1.207>)
    I was able to start two threads on penguin27 from penguin27.
    For the purposes of this very first test I did not set a common shared space
    with samba yet and I manually copied the executable over to the other
    machine (myth or 192.168.1.208 <http://192.168.1.208>)
    I was able to start two threads on myth from penguin27.
    
    The error occurs when I try starting 1 process on each machine.
    
    Is there a different code path that gets executed when more than one machine
    will be used? Possibly some initialization or coordination procedures?
    
    Thanks,
    ~Eric
    
    
    
    On 11/21/05, jcduell_at_lbl_dot_gov <jcduell_at_lbl_dot_gov> wrote:
    >
    > Eric,
    >
    > Hmm, it is a bit strange that you can run jobs on either machine, but
    > not both. I can't tell from the output whether this is an error in our
    > UDP layer, or some sort of configuration issue.
    >
    > Are there any firewall limitations between the two machines? That would
    > be good to know, although since you've already gotten to a barrier call,
    > I assume basic network connectivity must have been established OK.
    >
    > I'm going to let our resident UDP expert have a look at this one, too.
    >
    > So did you get a samba-based shared filesystem working, or are you
    > manually copying the executable to the nodes?
    >
    > > So it is actually starting remote processes and comes back with the name
    > of
    > > the machine even though I specified the IP address.
    >
    > This at least I can explain--we get the machine name to print out from
    > calling "hostname()", so we get the DNS name even if you've used raw IP
    > addresses in your hosts file.
    >
    > --
    > Jason Duell Future Technologies Group
    > <jcduell_at_lbl_dot_gov> Computational Research Division
    > Tel: +1-510-495-2354 Lawrence Berkeley National Laboratory
    >
    >
    > On Sat, Nov 19, 2005 at 10:21:34AM -0500, Eric Frederich wrote:
    > > Hello,
    > >
    > > I am having trouble now trying to run it on a remote computer. I made a
    > file
    > > /home/eric/upchosts which has 192.168.1.207 <http://192.168.1.207> <
    > http://192.168.1.207> on one
    > > line and 192.168.1.208 <http://192.168.1.208> <http://192.168.1.208> on
    > the next line. Then I did
    > > "export UPC_NODEFILE=/home/eric/upchosts".
    > > When I run "./upcrun -n 2 hello" I get the following error...
    > >
    > > $ ./upcrun -n 2 hello
    > > AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with
    > > requested resource)
    > > from function sendPacket
    > > at /home/eric/UPC/berkeley_upc-2.2.1
    > /gasnet/other/amudp/amudp_reqrep.cpp:93
    > > reason: Invalid argument
    > > AMUDP AMUDP_RequestGeneric returning an error code: AM_ERR_RESOURCE
    > (Problem
    > > with requested resource)
    > > at /home/eric/UPC/berkeley_upc-2.2.1
    > > /gasnet/other/amudp/amudp_reqrep.cpp:1200
    > >
    > > GASNet gasnetc_AMRequestShortM encountered an AM Error:
    > AM_ERR_RESOURCE(3)
    > > at /home/eric/UPC/berkeley_upc-2.2.1
    > /gasnet/udp-conduit/gasnet_core.c:564
    > > GASNet gasnetc_AMRequestShortM returning an error code:
    > GASNET_ERR_RESOURCE
    > > (Problem with requested resource)
    > > at /home/eric/UPC/berkeley_upc-2.2.1
    > /gasnet/udp-conduit/gasnet_core.c:568
    > > *** FATAL ERROR:
    > > GASNet encountered an error: GASNET_ERR_RESOURCE(3)
    > > while calling: gasnet_AMRequestShort4(peer,
    > > gasneti_handleridx(gasnete_ambarrier_notify_reqh), phase, 0, id, flags)
    > > at gasnete_barrier_notify() at /home/eric/UPC/berkeley_upc-2.2.1
    > > /gasnet/extended-ref/gasnet_extended_refbarrier.c:197
    > > *** Caught a fatal signal: SIGABRT(6) on node 1/2
    > >
    > > It is intersting to note that when the upchostsfile looks like
    > >
    > > 192.168.1.207 <http://192.168.1.207> <http://192.168.1.207>
    > > 192.168.1.207 <http://192.168.1.207> <http://192.168.1.207>
    > > 192.168.1.208 <http://192.168.1.208> <http://192.168.1.208>
    > >
    > > and I run it with -n 2 it works fine and I see the following
    > >
    > > UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=12356)
    > > UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=12357)
    > > Hello World from thread 1 of 2
    > > Hello World from thread 0 of 2
    > >
    > > Also when I have the file say
    > >
    > > 192.168.1.208 <http://192.168.1.208> <http://192.168.1.208>
    > > 192.168.1.208 <http://192.168.1.208> <http://192.168.1.208>
    > > 192.168.1.207 <http://192.168.1.207> <http://192.168.1.207>
    > >
    > > it works fine too and I see the following
    > >
    > > UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=10447)
    > > UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=10446)
    > > Hello World
    > > Hello World
    > >
    > > So it is actually starting remote processes and comes back with the name
    > of
    > > the machine even though I specified the IP address.
    > >
    > > Any ideas why I can create multiple threads on local host, I can create
    > > multiple threads on a remote host, but I can't create one on each?
    > >
    > > Thanks,
    > > ~Eric
    >
    >
    
    
    --
    ------------------------
    Eric L. Frederich
    321-246-1854
    

  • Next message: Dan Bonachea: "Re: problems running UPC programs"