From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Jan 11 2008 - 14:12:28 PST
Steven D. Vormwald wrote: > Paul H. Hargrove wrote: >> Let us see what GASNet is seeing when it probes the hardware. Please >> follow the following steps, sending the output of the final command: >> >> $ cd [YOUR_BERKELY_UPC_BUILD_DIR] >> $ cd dbg/gasnet/vapi-conduit >> $ make testgasnet-seq >> [...output omitted...] >> $ env GASNET_TRACEMASK=C GASNET_TRACEFILE=stdout >> ./contrib/gasnetrun_vapi -n1 ./testgasnet | grep HCA >> >> When things are working correctly, you should expect output roughly >> like the following: >> >> GASNet reporting enabled - tracing and statistical output directed to >> stdout >> 0 0.001157s> (C) Probing HCAs for active ports >> 0 0.001887s> (C) Probe found HCA 'mthca0' >> 0 0.001976s> (C) Probe found HCA 'mthca0', port 1 >> 0 0.001985s> (C) Probe found 1 active port(s) on 1 HCA(s) >> 0 0.001997s> (C) vapi-conduit HCA properties (1 of 1) = { >> 0 0.002004s> (C) HCA id = 'mthca0' >> 0 0.002006s> (C) HCA vendor id = 0x2c9 >> 0 0.002008s> (C) HCA vendor part id = 0x6274 >> 0 0.002010s> (C) HCA hardware version = 0xa0 >> 0 0.002012s> (C) HCA firmware version = >> -Paul > > I should note that these cards (to the best of my knowledge) do not > support the Mellanox VAPI, and thus I didn't enable support for it when > building the compiler: > > [sdvormwa@gilbert vapi-conduit]$ make testgasnet-seq > ../other/Makefile-conduit.mak:245: warning: overriding commands for > target `Makefile' > Makefile:512: warning: ignoring old commands for target `Makefile' > make[1]: Entering directory > `/usr/local/build/berkeley_upc-2.6.0-dbg/dbg/gasnet/vapi-conduit' > ../other/Makefile-conduit.mak:245: warning: overriding commands for > target `Makefile' > Makefile:512: warning: ignoring old commands for target `Makefile' > make[2]: Entering directory > `/usr/local/build/berkeley_upc-2.6.0-dbg/dbg/gasnet/vapi-conduit' > ../other/Makefile-conduit.mak:245: warning: overriding commands for > target `Makefile' > Makefile:512: warning: ignoring old commands for target `Makefile' > ERROR: vapi-conduit support was not detected at configure time > try re-running configure with --enable-vapi > make[2]: *** [do-error] Error 1 > make[2]: Leaving directory > `/usr/local/build/berkeley_upc-2.6.0-dbg/dbg/gasnet/vapi-conduit' > make[1]: *** [testgasnet] Error 2 > make[1]: Leaving directory > `/usr/local/build/berkeley_upc-2.6.0-dbg/dbg/gasnet/vapi-conduit' > make: *** [testgasnet-seq] Error 2 > [sdvormwa@gilbert vapi-conduit]$ > > Running the same series of commands in ibv-conduit produced the following: > > [sdvormwa@gilbert ibv-conduit]$ env GASNET_SSH_NODEFILE=~/.mpihosts > GASNET_TRACEMASK=C GASNET_TRACEFILE=stdout ./contrib/gasnetrun_ibv -n1 > ./testgasnet | grep HCA > GASNet reporting enabled - tracing and statistical output directed to > stdout > libibverbs: Warning: no userspace device-specific driver found for > /sys/class/infiniband_verbs/uverbs0 > GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE > (Problem with requested resource) > at /usr/local/src/berkeley_upc-2.6.0/gasnet/vapi-conduit/gasnet_core.c:986 > reason: unable to open any HCA ports > GASNet gasnet_init_GASNET_SEQFASTdebugtracestatssrclines returning an > error code: GASNET_ERR_RESOURCE (Problem with requested resource) > at > /usr/local/src/berkeley_upc-2.6.0/gasnet/vapi-conduit/gasnet_core.c:1546 > ERROR calling: gasnet_init(&argc, &argv) > at: /usr/local/src/berkeley_upc-2.6.0/gasnet/tests/testgasnet.c:185 > error: GASNET_ERR_RESOURCE (Problem with requested resource) > 0 0.000897s> (C) Probing HCAs for active ports > 0 0.001658s> (C) Probe failed to locate any HCAs > gasnet_exit(): ERROR: signal 11 received during exit... goodbye. > [initiating collective exit] > Cleaning up orphaned processes... > [sdvormwa@gilbert ibv-conduit]$ > > Steven Vormwald Steven, I know that the QLogic HCAs don't support the Mellanox VAPI interface. My instructions should have asked you to perform those steps in the "ibv-conduit" directory, as you have done, not "vapi-conduit". I apologize for any confusion. The output you provided tells me that the call we make to "ibv_get_device_list()" has indicated that there are no devices available. That appears to be in direct contradiction with the "ibv_devinfo" output you provided previously. This leaves me with very little to go on. You noted that when trying to troubleshoot the message "libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0" you verified that "mthca.so" existed and as executable. However, it now occurs to me that this is the filename for Mellanox HCAs, and that for the QLogic HCAs you should be verifying that "ipathverbs.so" is present and executable. At least in ODED 1.0, this file is part of the "libipathverbs" RPM. I'd also like to see the output from "ls -l /sys/class/infiniband_verbs/uverbs0/" -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900