Re: upc with mpich2

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed May 07 2008 - 15:02:24 PDT

  • Next message: David J. Biesack: "Re: upc with mpich2"
    David,
      I am sorry things aren't working as expected.
      First, you are correct about needing the %N, etc in MPIRUN_CMD.  I am
    sorry I answered too quickly.
      Secondly, you are also correct that MPIRUN_CMD is not mentioned in the
    INSTALL file.  However, it is fully documented in
    gasnet/mpi-coduit/README and INSTALL does suggest reading the
    per-conduit README files.  However, we'll consider making a more direct
    reference to mpi-conduit's README in the section describing the MPI_CC
    variable.
    
      As for what you are seeing with respect to mpicc, I am not entirely
    certain what could be happening, but can think of a acouple possibilities.
    
    1) Some versions of autoconf are not using the variable setting forms
    any more.  If that is the case for you, then try
    --with-mpi-cc='/acl/usr/local/mpich2/bin/mpicc'
    --with-mpirun-cmd='/acl/usr/local/mpich2/bin/mpirun -np %N %P %A' on the
    configure command line.
    
    2) The settings given at configure time may be cached.  Please ensure
    that "config.cache" and "gasnet/config.cache" are removed before trying
    to reconfigure.  Otherwise the old value might get reused.  We've tried
    to detect this case, but perhaps that detection is failing.
    
    The value of MPI_CC in particular will not be subject to any runtime
    overrides such as multiconf.conf or your environment (though MPIRUN_CMD
    is taken from the environment if set).  So, if neither of the options
    above fixes the problem for you, we'll need to examine the output from
    configure to see how/which mpicc command was selected.
    
    -Paul
    
    David J. Biesack wrote:
    >> Date: Wed, 07 May 2008 12:23:47 -0700
    >> From: "Paul H. Hargrove" <PHHargrove_at_lbl_dot_gov>
    >> CC: upc-users_at_lbl_dot_gov
    >>
    >> David,
    >>
    >>   Based on your description of the problem, it looks like the correct
    >> "mpicc" *is* getting used, but the incorrect "mpirun". 
    > 
    > I still believe it is using the wrong mpicc.
    > The upcc -v output pretty clearly runs /usr/bin/mpicc :
    > 
    >>>   /usr/bin/mpicc   -o 'cpi' ,,,,
    > 
    > and not /acl/usr/local/mpich2/bin/mpicc as configured (or from my path or MPICC)
    > 
    >> I believe you simply need to reconfigure adding
    >> MIRUN_CMD=/acl/usr/local/mpich2/bin/mpirun to the configure command.
    >>   Please let us know if that does not solve your problem.
    >>
    >> -Paul
    > 
    > Thanks for the tip. I did not see any information about setting
    > MPIRUN_CMD in http://upc.lbl.gov/download/dist/INSTALL
    > and it is not mentioned in http://upc.lbl.gov/docs/user/upcrun.html either.
    > Perhaps someone can add mention of it?
    > 
    > I tried setting 
    > 
    >   MPIRUN_CMD=/acl/usr/local/mpich2/bin/mpirun
    > 
    > as an environment variable. I got a diagnostic telling me that I must
    > include %P and %A and %N in the command, so I tried:
    > 
    >   $ MPIRUN_CMD="/acl/usr/local/mpich2/bin/mpirun -np %N '%P' '%A'"
    >   $ export MPIRUN_CMD
    > 
    > I'm closer; this tries to run now on four nodes, and I get four
    > diagnostics about LAM/MPI not running. Thus, I think this still
    > indicates bindings introduced by /usr/bin/mpicc instead
    > of /acl/usr/local/mpich2/bin/mpicc  :
    > 
    >   $ upcrun -np 4 cpi
    >   -----------------------------------------------------------------------------
    >   -----------------------------------------------------------------------------
    > 
    >   It seems that there is no lamd running on the host acl211.unx.sas.com.
    > 
    >   This indicates that the LAM/MPI runtime environment is not operating.
    >   The LAM/MPI runtime environment is necessary for MPI programs to run
    >   (the MPI program tired to invoke the "MPI_Init" function).
    > 
    >   Please run the "lamboot" command the start the LAM/MPI runtime
    >   environment.  See the LAM/MPI documentation for how to invoke
    >   "lamboot" across multiple machines.
    >   -----------------------------------------------------------------------------
    > 
    >   It seems that there is no lamd running on the host acl211.unx.sas.com.
    > 
    >   This indicates that the LAM/MPI runtime environment is not operating.
    >   The LAM/MPI runtime environment is necessary for MPI programs to run
    >   (the MPI program tired to invoke the "MPI_Init" function).
    > 
    >   Please run the "lamboot" command the start the LAM/MPI runtime
    >   environment.  See the LAM/MPI documentation for how to invoke
    >   "lamboot" across multiple machines.
    >   -----------------------------------------------------------------------------
    >   -----------------------------------------------------------------------------
    > 
    >   It seems that there is no lamd running on the host acl210.
    > 
    >   This indicates that the LAM/MPI runtime environment is not operating.
    >   The LAM/MPI runtime environment is necessary for MPI programs to run
    >   -----------------------------------------------------------------------------
    > 
    >   It seems that there is no lamd running on the host acl210.
    > 
    >   This indicates that the LAM/MPI runtime environment is not operating.
    >   The LAM/MPI runtime environment is necessary for MPI programs to run
    >   (the MPI program tired to invoke the "MPI_Init" function).
    > 
    >   Please run the "lamboot" command the start the LAM/MPI runtime
    >   environment.  See the LAM/MPI documentation for how to invoke
    >   "lamboot" across multiple machines.
    >   -----------------------------------------------------------------------------
    >   (the MPI program tired to invoke the "MPI_Init" function).
    > 
    >   Please run the "lamboot" command the start the LAM/MPI runtime
    >   environment.  See the LAM/MPI documentation for how to invoke
    >   "lamboot" across multiple machines.
    >   -----------------------------------------------------------------------------
    >   $ 
    > 
    > I also changed the ./configure options to specify the mpich2 mpirun
    > 
    >   configure CC=cc CXX=c++ MPI_CC=/acl/usr/local/mpich2/bin/mpicc MPIRUN_CMD="/acl/usr/local/mpich2/bin/mpirun -np %N '%P' '%A'"
    > 
    > and rebuilt, but I think mpicc is still incorrect; I get the same errors above.
    > 
    > (Note that on my first attempt, I ran
    > 
    >   configure CC=cc CXX=c++ MPI_CC=mpicc
    > 
    > and that appeared to be ignored as well, instead it ran /usr/bin/mpicc.)
    > 
    >> -- 
    >> Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >> Future Technologies Group
    >> HPC Research Department                   Tel: +1-510-495-2352
    >> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    > 
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: David J. Biesack: "Re: upc with mpich2"