Re: problems running UPC programs

jcduell_at_lbl_dot_gov
Date: Mon Nov 21 2005 - 14:15:57 PST

  • Next message: Eric Frederich: "Re: problems running UPC programs"
    Eric,
    
    Hmm, it is a bit strange that you can run jobs on either machine, but
    not both.  I can't tell from the output whether this is an error in our
    UDP layer, or some sort of configuration issue.
    
    Are there any firewall limitations between the two machines?  That would
    be good to know, although since you've already gotten to a barrier call,
    I assume basic network connectivity must have been established OK.
    
    I'm going to let our resident UDP expert have a look at this one, too.
    
    So did you get a samba-based shared filesystem working, or are you
    manually copying the executable to the nodes?
    
    > So it is actually starting remote processes and comes back with the name of
    > the machine even though I specified the IP address.
    
    This at least I can explain--we get the machine name to print out from
    calling "hostname()", so we get the DNS name even if you've used raw IP
    addresses in your hosts file.
    
    -- 
    Jason Duell             Future Technologies Group
    <jcduell_at_lbl_dot_gov>       Computational Research Division
    Tel: +1-510-495-2354    Lawrence Berkeley National Laboratory
    
    
    On Sat, Nov 19, 2005 at 10:21:34AM -0500, Eric Frederich wrote:
    > Hello,
    > 
    > I am having trouble now trying to run it on a remote computer. I made a file
    > /home/eric/upchosts which has 192.168.1.207 <http://192.168.1.207> on one
    > line and 192.168.1.208 <http://192.168.1.208> on the next line. Then I did
    > "export UPC_NODEFILE=/home/eric/upchosts".
    > When I run "./upcrun -n 2 hello" I get the following error...
    > 
    > $ ./upcrun -n 2 hello
    > AMUDP sendPacket returning an error code: AM_ERR_RESOURCE (Problem with
    > requested resource)
    > from function sendPacket
    > at /home/eric/UPC/berkeley_upc-2.2.1/gasnet/other/amudp/amudp_reqrep.cpp:93
    > reason: Invalid argument
    > AMUDP AMUDP_RequestGeneric returning an error code: AM_ERR_RESOURCE (Problem
    > with requested resource)
    > at /home/eric/UPC/berkeley_upc-2.2.1
    > /gasnet/other/amudp/amudp_reqrep.cpp:1200
    > 
    > GASNet gasnetc_AMRequestShortM encountered an AM Error: AM_ERR_RESOURCE(3)
    > at /home/eric/UPC/berkeley_upc-2.2.1/gasnet/udp-conduit/gasnet_core.c:564
    > GASNet gasnetc_AMRequestShortM returning an error code: GASNET_ERR_RESOURCE
    > (Problem with requested resource)
    > at /home/eric/UPC/berkeley_upc-2.2.1/gasnet/udp-conduit/gasnet_core.c:568
    > *** FATAL ERROR:
    > GASNet encountered an error: GASNET_ERR_RESOURCE(3)
    > while calling: gasnet_AMRequestShort4(peer,
    > gasneti_handleridx(gasnete_ambarrier_notify_reqh), phase, 0, id, flags)
    > at gasnete_barrier_notify() at /home/eric/UPC/berkeley_upc-2.2.1
    > /gasnet/extended-ref/gasnet_extended_refbarrier.c:197
    > *** Caught a fatal signal: SIGABRT(6) on node 1/2
    > 
    > It is intersting to note that when the upchostsfile looks like
    > 
    > 192.168.1.207 <http://192.168.1.207>
    > 192.168.1.207 <http://192.168.1.207>
    > 192.168.1.208 <http://192.168.1.208>
    > 
    > and I run it with -n 2 it works fine and I see the following
    > 
    > UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=12356)
    > UPCR: UPC thread 1 of 2 on penguin27 (process 1 of 2, pid=12357)
    > Hello World from thread 1 of 2
    > Hello World from thread 0 of 2
    > 
    > Also when I have the file say
    > 
    > 192.168.1.208 <http://192.168.1.208>
    > 192.168.1.208 <http://192.168.1.208>
    > 192.168.1.207 <http://192.168.1.207>
    > 
    > it works fine too and I see the following
    > 
    > UPCR: UPC thread 0 of 2 on myth (process 0 of 2, pid=10447)
    > UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=10446)
    > Hello World
    > Hello World
    > 
    > So it is actually starting remote processes and comes back with the name of
    > the machine even though I specified the IP address.
    > 
    > Any ideas why I can create multiple threads on local host, I can create
    > multiple threads on a remote host, but I can't create one on each?
    > 
    > Thanks,
    > ~Eric
    

  • Next message: Eric Frederich: "Re: problems running UPC programs"