Re: problems running UPC programs

From: Eric Frederich (eric.frederich_at_gmail_dot_com)
Date: Wed Nov 23 2005 - 14:35:17 PST

  • Next message: Eric Frederich: "upc_barrier"
    Hooray.  That fixed it.  I am pretty sure
    
    Before my /etc/hosts file looked like
    
    127.0.0.1       localhost penguin27.tuxnetwork penguin27
    192.168.1.208   myth.tuxnetwork myth
    
    now it looks like
    
    127.0.0.1       localhost
    192.168.1.207   penguin27.tuxnetwork penguin27
    192.168.1.208   myth.tuxnetwork myth
    
    and I get the following results ;-)
    
    eric@penguin27 build $ ./upcrun -n 2 hello
    UPCR: UPC thread 0 of 2 on penguin27 (process 0 of 2, pid=4762)
    UPCR: UPC thread 1 of 2 on myth (process 1 of 2, pid=11362)
    Hello World from thread 1 of 2 ! !
    Hello World from thread 2 of 2 ! !
    
    Thanks a lot.  Now that it is working, I am going away for the holiday
    weekend.  Hopefully the power still stay alive and I'll be able to ssh in if
    I get bored and want to play around.
    
    Thanks again,
    ~Eric
    
    
    On 11/23/05, Dan Bonachea <bonachea_at_cs_dot_berkeley_dot_edu> wrote:
    >
    > At 05:42 PM 11/22/2005, Eric Frederich wrote:
    > >Dan,
    > >      First of all, thanks for your quick correspondence.  Attached is a
    > file
    > > with a list of commands I ran and their outputs.  Please let me know if
    > > there is anything else I can tell you about my set up.
    > >
    > >Thanks,
    > >~Eric
    >
    > Hi Eric - the problem is shown in the log snippet below - it appears that
    > one
    > of the nodes (the one local to the spawning console) is binding to the
    > localhost (loopback) ethernet interface (127.0.0.1) instead of to the real
    > external IP interface, and consequently the compute node processes cannot
    > reach each other.
    >
    > I suspect the hostname 'penguin27' is incorrectly resolving to 127.0.0.1when
    > queried from penguin27, instead of resolving to the external IP address on
    > the
    > LAN shared by both compute nodes, as it should. You can confirm this DNS
    > misconfiguration by typing 'ping penguin27' on the penguin27 machine - it
    > should resolve to pinging 192.168.1.207, but I suspect it will instead
    > ping
    > 127.0.0.1.
    >
    > There are several possible solutions to try:
    >
    > 1. fix DNS resolution on penguin27 to resolve to the external interface
    > (check
    > /etc/hosts)
    > 2. spawn jobs from a console on a third node, which should force both
    > compute
    > nodes to bind to an external interface in order to reach the spawning
    > console
    > 3. change USE_NUMERIC_MASTER_ADDR to 1 in
    > gasnet/other/amudp/amudp_internal.h
    > and recompile the UPC runtime.
    >
    > Hope this helps..
    > Dan
    >
    > system(ssh -f -o 'StrictHostKeyChecking no' -o 'FallBackToRsh
    > no'  192.168.1.207 " echo connected to \$HOST... ; cd
    > '/home/eric/UPC/build' ;
    > './hello' '__AMUDP_SLAVE_PROCESS_VERBOSE__' 'penguin27:33197' "  || ( echo
    > "connection to 192.168.1.207 failed." ; kill 4249 ) &)
    > system(ssh -f -o 'StrictHostKeyChecking no' -o 'FallBackToRsh
    > no'  192.168.1.208 " echo connected to \$HOST... ; cd
    > '/home/eric/UPC/build' ;
    > './hello' '__AMUDP_SLAVE_PROCESS_VERBOSE__' 'penguin27:33197' "  || ( echo
    > "connection to 192.168.1.208 failed." ; kill 4249 ) &)
    > connected to ...
    > slave connecting to 192.168.1.207:33197
    > connected to ...
    > Endpoint table (nproc=2):
    >   P#0:   (192.168.1.208:32795)   tag: 0x7f00000100001099
    >   P#1:   (127.0.0.1:32795)       tag: 0x7f00000100011099
    > Slave 1/2 starting (tag=0x7f00000100011099)...
    > UDP recv buffer successfully set to 139704 bytes
    > slave connecting to 127.0.0.1:33197
    >
    >
    
    
    --
    ------------------------
    Eric L. Frederich
    

  • Next message: Eric Frederich: "upc_barrier"