From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Mon Jan 14 2008 - 14:01:20 PST
Steven,
Whatever problem you are encountering is beyond my own experience.
Attached is a very small program that is intended to list the names of
your HCAs as queried from the OpenIB verbs interface. Compile as
follows, substituting your correct paths for "/opt/ofed" (and possibly
adding -ldl):
$ cc -o ibvls ibvls.c -I/opt/ofed/include -L/opt/ofed/lib64 -libverbs
and run with no arguments:
$ ./ibvls
ibv_get_device_list: list=0x501eb0 num_hcas=1
HCA[0] = 'mthca0'
If this "ibvls" returns a non-empty list of HCAs, then the OpenIB verbs
support from QLogic is working and GASNet is somehow at fault. However,
if this simple test program doesn't find any HCAs, then I suggest you
contact QLogic (or whichever vendor provides support for your cluster)
for help in getting this small test program working. Once this small
test program works, I believe GASNet should probably work as well.
-Paul
Steven D. Vormwald wrote:
> Paul H. Hargrove wrote:
>> Steven,
>>
>> I know that the QLogic HCAs don't support the Mellanox VAPI interface.
>> My instructions should have asked you to perform those steps in the
>> "ibv-conduit" directory, as you have done, not "vapi-conduit". I
>> apologize for any confusion.
>>
> I expected as much, but it never hurts to make sure.
>
>> You noted that when trying to troubleshoot the message "libibverbs:
>> Warning: no userspace device-specific driver found for
>> /sys/class/infiniband_verbs/uverbs0" you verified that "mthca.so"
>> existed and as executable. However, it now occurs to me that this is
>> the filename for Mellanox HCAs, and that for the QLogic HCAs you should
>> be verifying that "ipathverbs.so" is present and executable. At least
>> in ODED 1.0, this file is part of the "libipathverbs" RPM.
>>
> [sdvormwa@n1 ~]$ file /usr/lib/libipathverbs.so
> /usr/lib64/libipathverbs.so
> /usr/lib/libipathverbs.so: symbolic link to `libipathverbs-rdmav2.so'
> /usr/lib64/libipathverbs.so: symbolic link to `libipathverbs-rdmav2.so'
> [sdvormwa@n1 ~]$ file /usr/lib/libipathverbs-rdmav2.so
> /usr/lib64/libipathverbs-rdmav2.so
> /usr/lib/libipathverbs-rdmav2.so: ELF 32-bit LSB shared object,
> Intel 80386, version 1 (SYSV), not stripped
> /usr/lib64/libipathverbs-rdmav2.so: ELF 64-bit LSB shared object, AMD
> x86-64, version 1 (SYSV), not stripped
> [sdvormwa@n1 ~]$ ls -l /usr/lib/libipathverbs-rdmav2.so
> /usr/lib64/libipathverbs-rdmav2.so
> -rwxr-xr-x 1 root root 67843 Jun 18 2007
> /usr/lib64/libipathverbs-rdmav2.so
> -rwxr-xr-x 1 root root 57606 Jun 18 2007
> /usr/lib/libipathverbs-rdmav2.so
>
> One thing that seems odd to me is that all the other infiniband
> libraries are 64-bit only, and installed in /usr/ofed/lib64, while
> libipathverbs has both 32 and 64-bit versions installed in /lib[64].
>
>> I'd also like to see the output from "ls -l
>> /sys/class/infiniband_verbs/uverbs0/"
>>
> [sdvormwa@gilbert UPC]$ ls -l /sys/class/infiniband_verbs/uverbs0/
> total 0
> -r--r--r-- 1 root root 4096 Jan 11 20:58 abi_version
> -r--r--r-- 1 root root 4096 Jan 11 20:58 dev
> lrwxrwxrwx 1 root root 0 Jan 11 20:58 device ->
> ../../../devices/pci0000:00/0000:00:06.0/0000:0a:00.0
> lrwxrwxrwx 1 root root 0 Jan 11 20:58 driver ->
> ../../../bus/pci/drivers/ib_ipath
> -r--r--r-- 1 root root 4096 Jan 11 20:58 ibdev
>
> Steven Vormwald
--
Paul H. Hargrove PHHargrove_at_lbl_dot_gov
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
#include <stdio.h>
#include <infiniband/verbs.h>
int main(void) {
struct ibv_device **hca_list;
int num_hcas;
int i;
num_hcas = 0; /* call will overwrite with actual count */
hca_list = ibv_get_device_list(&num_hcas);
printf("ibv_get_device_list: list=%p num_hcas=%d\n", hca_list, num_hcas);
if ((hca_list == NULL) || (num_hcas == 0)) {
return 1;
}
for (i=0; i<num_hcas; ++i) {
const char *hca_name = ibv_get_device_name(hca_list[i]);
printf("HCA[%d] = '%s'\n", i, hca_name);
}
return 0;
}