Re: trouble using ibv-conduit

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Jan 11 2008 - 14:12:28 PST

  • Next message: Steven D. Vormwald: "Re: trouble using ibv-conduit"
    Steven D. Vormwald wrote:
    > Paul H. Hargrove wrote:
    >> Let us see what GASNet is seeing when it probes the hardware.  Please
    >> follow the following steps, sending the output of the final command:
    >>
    >> $ cd [YOUR_BERKELY_UPC_BUILD_DIR]
    >> $ cd dbg/gasnet/vapi-conduit
    >> $ make testgasnet-seq
    >> [...output omitted...]
    >> $ env GASNET_TRACEMASK=C GASNET_TRACEFILE=stdout
    >> ./contrib/gasnetrun_vapi -n1 ./testgasnet  | grep HCA
    >>
    >> When things are working correctly, you should expect output roughly
    >> like the following:
    >>
    >> GASNet reporting enabled - tracing and statistical output directed to
    >> stdout
    >> 0 0.001157s> (C) Probing HCAs for active ports
    >> 0 0.001887s> (C) Probe found HCA 'mthca0'
    >> 0 0.001976s> (C) Probe found HCA 'mthca0', port 1
    >> 0 0.001985s> (C) Probe found 1 active port(s) on 1 HCA(s)
    >> 0 0.001997s> (C) vapi-conduit HCA properties (1 of 1) = {
    >> 0 0.002004s> (C)   HCA id                   = 'mthca0'
    >> 0 0.002006s> (C)   HCA vendor id            = 0x2c9
    >> 0 0.002008s> (C)   HCA vendor part id       = 0x6274
    >> 0 0.002010s> (C)   HCA hardware version     = 0xa0
    >> 0 0.002012s> (C)   HCA firmware version     =      
    >> -Paul
    > 
    > I should note that these cards (to the best of my knowledge) do not
    > support the Mellanox VAPI, and thus I didn't enable support for it when
    > building the compiler:
    > 
    > [sdvormwa@gilbert vapi-conduit]$ make testgasnet-seq
    > ../other/Makefile-conduit.mak:245: warning: overriding commands for
    > target `Makefile'
    > Makefile:512: warning: ignoring old commands for target `Makefile'
    > make[1]: Entering directory
    > `/usr/local/build/berkeley_upc-2.6.0-dbg/dbg/gasnet/vapi-conduit'
    > ../other/Makefile-conduit.mak:245: warning: overriding commands for
    > target `Makefile'
    > Makefile:512: warning: ignoring old commands for target `Makefile'
    > make[2]: Entering directory
    > `/usr/local/build/berkeley_upc-2.6.0-dbg/dbg/gasnet/vapi-conduit'
    > ../other/Makefile-conduit.mak:245: warning: overriding commands for
    > target `Makefile'
    > Makefile:512: warning: ignoring old commands for target `Makefile'
    > ERROR: vapi-conduit support was not detected at configure time
    >       try re-running configure with --enable-vapi
    > make[2]: *** [do-error] Error 1
    > make[2]: Leaving directory
    > `/usr/local/build/berkeley_upc-2.6.0-dbg/dbg/gasnet/vapi-conduit'
    > make[1]: *** [testgasnet] Error 2
    > make[1]: Leaving directory
    > `/usr/local/build/berkeley_upc-2.6.0-dbg/dbg/gasnet/vapi-conduit'
    > make: *** [testgasnet-seq] Error 2
    > [sdvormwa@gilbert vapi-conduit]$
    > 
    > Running the same series of commands in ibv-conduit produced the following:
    > 
    > [sdvormwa@gilbert ibv-conduit]$ env GASNET_SSH_NODEFILE=~/.mpihosts
    > GASNET_TRACEMASK=C GASNET_TRACEFILE=stdout ./contrib/gasnetrun_ibv -n1
    > ./testgasnet  | grep HCA
    > GASNet reporting enabled - tracing and statistical output directed to
    > stdout
    > libibverbs: Warning: no userspace device-specific driver found for
    > /sys/class/infiniband_verbs/uverbs0
    > GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE
    > (Problem with requested resource)
    >  at /usr/local/src/berkeley_upc-2.6.0/gasnet/vapi-conduit/gasnet_core.c:986
    >  reason: unable to open any HCA ports
    > GASNet gasnet_init_GASNET_SEQFASTdebugtracestatssrclines returning an
    > error code: GASNET_ERR_RESOURCE (Problem with requested resource)
    >  at
    > /usr/local/src/berkeley_upc-2.6.0/gasnet/vapi-conduit/gasnet_core.c:1546
    > ERROR calling: gasnet_init(&argc, &argv)
    > at: /usr/local/src/berkeley_upc-2.6.0/gasnet/tests/testgasnet.c:185
    > error: GASNET_ERR_RESOURCE (Problem with requested resource)
    > 0 0.000897s> (C) Probing HCAs for active ports
    > 0 0.001658s> (C) Probe failed to locate any HCAs
    > gasnet_exit(): ERROR: signal 11 received during exit... goodbye. 
    > [initiating collective exit]
    > Cleaning up orphaned processes...
    > [sdvormwa@gilbert ibv-conduit]$
    > 
    > Steven Vormwald
    
    
    Steven,
    
      I know that the QLogic HCAs don't support the Mellanox VAPI interface.
      My instructions should have asked you to perform those steps in the
    "ibv-conduit" directory, as you have done, not "vapi-conduit".  I
    apologize for any confusion.
    
      The output you provided tells me that the call we make to
    "ibv_get_device_list()" has indicated that there are no devices
    available.  That appears to be in direct contradiction with the
    "ibv_devinfo" output you provided previously.  This leaves me with very
    little to go on.
    
      You noted that when trying to troubleshoot the message "libibverbs:
    Warning: no userspace device-specific driver found for
    /sys/class/infiniband_verbs/uverbs0" you verified that "mthca.so"
    existed and as executable.  However, it now occurs to me that this is
    the filename for Mellanox HCAs, and that for the QLogic HCAs you should
    be verifying that "ipathverbs.so" is present and executable.  At least
    in ODED 1.0, this file is part of the "libipathverbs" RPM.
    
      I'd also like to see the output from "ls -l
    /sys/class/infiniband_verbs/uverbs0/"
    
    -Paul
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Steven D. Vormwald: "Re: trouble using ibv-conduit"