From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Feb 15 2008 - 10:35:59 PST
Steven D. Vormwald wrote: > Paul H. Hargrove wrote: >> Steven, >> >> Whatever problem you are encountering is beyond my own experience. >> Attached is a very small program that is intended to list the names >> of your HCAs as queried from the OpenIB verbs interface. Compile as >> follows, substituting your correct paths for "/opt/ofed" (and >> possibly adding -ldl): >> $ cc -o ibvls ibvls.c -I/opt/ofed/include -L/opt/ofed/lib64 -libverbs >> and run with no arguments: >> $ ./ibvls >> ibv_get_device_list: list=0x501eb0 num_hcas=1 >> HCA[0] = 'mthca0' >> >> If this "ibvls" returns a non-empty list of HCAs, then the OpenIB >> verbs support from QLogic is working and GASNet is somehow at fault. >> However, if this simple test program doesn't find any HCAs, then I >> suggest you contact QLogic (or whichever vendor provides support for >> your cluster) for help in getting this small test program working. >> Once this small test program works, I believe GASNet should probably >> work as well. >> >> -Paul > > Paul, > > After working with QLogic, we've gotten the ibvls program working. > > $ ./ibvls > ibv_get_device_list: list=0x501ea0 num_hcas=1 > HCA[0] = 'ipath0' > > However, GASNet is still not working with the cards. Running the > tests that you mentioned earlier is now giving the error "Probe failed > to open HCA 'ipath0'", so it is at least finding the card now. > > $ env GASNET_TRACEMASK=C GASNET_TRACEFILE=stdout > ./contrib/gasnetrun_ibv -n1 ./testgasnet | grep HCA > GASNet reporting enabled - tracing and statistical output directed to > stdout > 0 0.000895s> (C) Probing HCAs for active ports > 0 0.002365s> (C) Probe failed to open HCA 'ipath0' > GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE > (Problem with requested resource) > at > /usr/local/src/berkeley_upc-2.6.0/gasnet/vapi-conduit/gasnet_core.c:986 > reason: unable to open any HCA ports > GASNet gasnet_init_GASNET_SEQFASTdebugtracestatssrclines returning an > error code: GASNET_ERR_RESOURCE (Problem with requested resource) > at > /usr/local/src/berkeley_upc-2.6.0/gasnet/vapi-conduit/gasnet_core.c:1546 > ERROR calling: gasnet_init(&argc, &argv) > at: /usr/local/src/berkeley_upc-2.6.0/gasnet/tests/testgasnet.c:185 > error: GASNET_ERR_RESOURCE (Problem with requested resource) > 0 0.002413s> (C) Probe found 0 active port(s) on 0 HCA(s) > gasnet_exit(): ERROR: signal 11 received during exit... goodbye. > [initiating collective exit] > Abort on node n1 due to MPI_Abort (type 2) > $ > > Steven Vormwald Steven, I am glad you've made progress, but sorry to hear that you are stuck again. I am afraid I don't have time to look into the problem in much detail until the middle of next week, but wanted to reply so you knew I was not ignoring or forgetting you. Until I am able to look in more detail, I'd suggest you verify that /dev/infiniband/uverbs0 exists and has modes like the following: crw-rw-rw- 1 root root 231, 192 Feb 6 10:54 uverbs0 -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900