Re: trouble using ibv-conduit

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Feb 15 2008 - 10:35:59 PST

  • Next message: Stephen Olivier: "remote translator system down?"
    Steven D. Vormwald wrote:
    > Paul H. Hargrove wrote:
    >> Steven,
    >>
    >>  Whatever problem you are encountering is beyond my own experience.  
    >> Attached is a very small program that is intended to list the names 
    >> of your HCAs as queried from the OpenIB verbs interface.  Compile as 
    >> follows, substituting your correct paths for "/opt/ofed" (and 
    >> possibly adding -ldl):
    >>  $ cc -o ibvls ibvls.c -I/opt/ofed/include -L/opt/ofed/lib64 -libverbs
    >> and run with no arguments:
    >>  $ ./ibvls
    >>  ibv_get_device_list: list=0x501eb0 num_hcas=1
    >>  HCA[0] = 'mthca0'
    >>
    >> If this "ibvls" returns a non-empty list of HCAs, then the OpenIB 
    >> verbs support from QLogic is working and GASNet is somehow at fault.  
    >> However, if this simple test program doesn't find any HCAs, then I 
    >> suggest you contact QLogic (or whichever vendor provides support for 
    >> your cluster) for help in getting this small test program working.  
    >> Once this small test program works, I believe GASNet should probably 
    >> work as well.
    >>
    >> -Paul
    >
    > Paul,
    >
    > After working with QLogic, we've gotten the ibvls program working.
    >
    > $ ./ibvls
    > ibv_get_device_list: list=0x501ea0 num_hcas=1
    > HCA[0] = 'ipath0'
    >
    > However, GASNet is still not working with the cards.  Running the 
    > tests that you mentioned earlier is now giving the error "Probe failed 
    > to open HCA 'ipath0'", so it is at least finding the card now.
    >
    > $ env GASNET_TRACEMASK=C GASNET_TRACEFILE=stdout 
    > ./contrib/gasnetrun_ibv -n1 ./testgasnet | grep HCA
    > GASNet reporting enabled - tracing and statistical output directed to 
    > stdout
    > 0 0.000895s> (C) Probing HCAs for active ports
    > 0 0.002365s> (C) Probe failed to open HCA 'ipath0'
    > GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE 
    > (Problem with requested resource)
    >   at 
    > /usr/local/src/berkeley_upc-2.6.0/gasnet/vapi-conduit/gasnet_core.c:986
    >   reason: unable to open any HCA ports
    > GASNet gasnet_init_GASNET_SEQFASTdebugtracestatssrclines returning an 
    > error code: GASNET_ERR_RESOURCE (Problem with requested resource)
    >   at 
    > /usr/local/src/berkeley_upc-2.6.0/gasnet/vapi-conduit/gasnet_core.c:1546
    > ERROR calling: gasnet_init(&argc, &argv)
    >  at: /usr/local/src/berkeley_upc-2.6.0/gasnet/tests/testgasnet.c:185
    >  error: GASNET_ERR_RESOURCE (Problem with requested resource)
    > 0 0.002413s> (C) Probe found 0 active port(s) on 0 HCA(s)
    > gasnet_exit(): ERROR: signal 11 received during exit... goodbye. 
    > [initiating collective exit]
    > Abort on node n1 due to MPI_Abort (type 2)
    > $
    >
    > Steven Vormwald
    
    Steven,
      I am glad you've made progress, but sorry to hear that you are stuck 
    again.
      I am afraid I don't have time to look into the problem in much detail 
    until the middle of next week, but wanted to reply so you knew I was not 
    ignoring or forgetting you.
     
       Until I am able to look in more detail, I'd suggest you verify that 
    /dev/infiniband/uverbs0 exists and has modes like the following:
    crw-rw-rw-  1 root root 231, 192 Feb  6 10:54 uverbs0
    
    -Paul
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Stephen Olivier: "remote translator system down?"