[Gluster-users] RDMA Problems with GlusterFS 3.1.1

Jeremy Stout stout.jeremy at gmail.com
Wed Dec 1 20:30:20 PST 2010


As an update to my situation, I think I have GlusterFS 3.1.1 working
now. I was able to create and mount RDMA volumes without any errors.

To fix the problem, I had to make the following changes on lines 3562
and 3563 in rdma.c:
options->send_count = 32;
options->recv_count = 32;

The values were set to 128.

I'll run some tests tomorrow to verify that it is working correctly.
Assuming it does, what would be the expected side-effect of changing
the values from 128 to 32? Will there be a decrease in performance?


On Wed, Dec 1, 2010 at 10:07 AM, Jeremy Stout <stout.jeremy at gmail.com> wrote:
> Here are the results of the test:
> submit-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong
>  local address:  LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
>  local address:  LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
>  local address:  LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
>  local address:  LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
>  local address:  LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
>  local address:  LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
>  local address:  LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
>  local address:  LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
>  local address:  LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
>  local address:  LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
>  local address:  LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
>  local address:  LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
>  local address:  LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
>  local address:  LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
>  local address:  LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
>  local address:  LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
>  remote address: LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
>  remote address: LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
>  remote address: LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
>  remote address: LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
>  remote address: LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
>  remote address: LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
>  remote address: LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
>  remote address: LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
>  remote address: LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
>  remote address: LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
>  remote address: LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
>  remote address: LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
>  remote address: LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
>  remote address: LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
>  remote address: LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
>  remote address: LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
> 8192000 bytes in 0.01 seconds = 5917.47 Mbit/sec
> 1000 iters in 0.01 seconds = 11.07 usec/iter
>
> fs-1:/usr/local/glusterfs/3.1.1/var/log/glusterfs # ibv_srq_pingpong submit-1
>  local address:  LID 0x000b, QPN 0x000406, PSN 0x3b644e, GID ::
>  local address:  LID 0x000b, QPN 0x000407, PSN 0x173320, GID ::
>  local address:  LID 0x000b, QPN 0x000408, PSN 0xc105ea, GID ::
>  local address:  LID 0x000b, QPN 0x000409, PSN 0x5e5ff1, GID ::
>  local address:  LID 0x000b, QPN 0x00040a, PSN 0xff15b0, GID ::
>  local address:  LID 0x000b, QPN 0x00040b, PSN 0xf0b152, GID ::
>  local address:  LID 0x000b, QPN 0x00040c, PSN 0x4ced49, GID ::
>  local address:  LID 0x000b, QPN 0x00040d, PSN 0x01da0e, GID ::
>  local address:  LID 0x000b, QPN 0x00040e, PSN 0x69459a, GID ::
>  local address:  LID 0x000b, QPN 0x00040f, PSN 0x197c14, GID ::
>  local address:  LID 0x000b, QPN 0x000410, PSN 0xd50228, GID ::
>  local address:  LID 0x000b, QPN 0x000411, PSN 0xbc9b9b, GID ::
>  local address:  LID 0x000b, QPN 0x000412, PSN 0x0870eb, GID ::
>  local address:  LID 0x000b, QPN 0x000413, PSN 0xfb1fbc, GID ::
>  local address:  LID 0x000b, QPN 0x000414, PSN 0x3eefca, GID ::
>  local address:  LID 0x000b, QPN 0x000415, PSN 0xbd64c6, GID ::
>  remote address: LID 0x0002, QPN 0x000406, PSN 0x703b96, GID ::
>  remote address: LID 0x0002, QPN 0x000407, PSN 0x618cc8, GID ::
>  remote address: LID 0x0002, QPN 0x000408, PSN 0xd62272, GID ::
>  remote address: LID 0x0002, QPN 0x000409, PSN 0x5db5d9, GID ::
>  remote address: LID 0x0002, QPN 0x00040a, PSN 0xc51978, GID ::
>  remote address: LID 0x0002, QPN 0x00040b, PSN 0x05fd7a, GID ::
>  remote address: LID 0x0002, QPN 0x00040c, PSN 0xaa4a51, GID ::
>  remote address: LID 0x0002, QPN 0x00040d, PSN 0xb7a676, GID ::
>  remote address: LID 0x0002, QPN 0x00040e, PSN 0x56bde2, GID ::
>  remote address: LID 0x0002, QPN 0x00040f, PSN 0xa662bc, GID ::
>  remote address: LID 0x0002, QPN 0x000410, PSN 0xee27b0, GID ::
>  remote address: LID 0x0002, QPN 0x000411, PSN 0x89c683, GID ::
>  remote address: LID 0x0002, QPN 0x000412, PSN 0xd025b3, GID ::
>  remote address: LID 0x0002, QPN 0x000413, PSN 0xcec8e4, GID ::
>  remote address: LID 0x0002, QPN 0x000414, PSN 0x37e5d2, GID ::
>  remote address: LID 0x0002, QPN 0x000415, PSN 0x29562e, GID ::
> 8192000 bytes in 0.01 seconds = 7423.65 Mbit/sec
> 1000 iters in 0.01 seconds = 8.83 usec/iter
>
> Based on the output, I believe it ran correctly.
>
> On Wed, Dec 1, 2010 at 9:51 AM, Anand Avati <anand.avati at gmail.com> wrote:
>> Can you verify that ibv_srq_pingpong works from the server where this log
>> file is from?
>>
>> Thanks,
>> Avati
>>
>> On Wed, Dec 1, 2010 at 7:44 PM, Jeremy Stout <stout.jeremy at gmail.com> wrote:
>>>
>>> Whenever I try to start or mount a GlusterFS 3.1.1 volume that uses
>>> RDMA, I'm seeing the following error messages in the log file on the
>>> server:
>>> [2010-11-30 18:37:53.51270] I [nfs.c:652:init] nfs: NFS service started
>>> [2010-11-30 18:37:53.51362] W [dict.c:1204:data_to_str] dict: @data=(nil)
>>> [2010-11-30 18:37:53.51375] W [dict.c:1204:data_to_str] dict: @data=(nil)
>>> [2010-11-30 18:37:53.59628] E [rdma.c:2066:rdma_create_cq]
>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed
>>> [2010-11-30 18:37:53.59851] E [rdma.c:3771:rdma_get_device]
>>> rpc-transport/rdma: testdir-client-0: could not create CQ
>>> [2010-11-30 18:37:53.59925] E [rdma.c:3957:rdma_init]
>>> rpc-transport/rdma: could not create rdma device for mthca0
>>> [2010-11-30 18:37:53.60009] E [rdma.c:4789:init] testdir-client-0:
>>> Failed to initialize IB Device
>>> [2010-11-30 18:37:53.60030] E [rpc-transport.c:971:rpc_transport_load]
>>> rpc-transport: 'rdma' initialization failed
>>>
>>> On the client, I see:
>>> [2010-11-30 18:43:49.653469] W [io-stats.c:1644:init] testdir:
>>> dangling volume. check volfile
>>> [2010-11-30 18:43:49.653573] W [dict.c:1204:data_to_str] dict: @data=(nil)
>>> [2010-11-30 18:43:49.653607] W [dict.c:1204:data_to_str] dict: @data=(nil)
>>> [2010-11-30 18:43:49.736275] E [rdma.c:2066:rdma_create_cq]
>>> rpc-transport/rdma: testdir-client-0: creation of send_cq failed
>>> [2010-11-30 18:43:49.736651] E [rdma.c:3771:rdma_get_device]
>>> rpc-transport/rdma: testdir-client-0: could not create CQ
>>> [2010-11-30 18:43:49.736689] E [rdma.c:3957:rdma_init]
>>> rpc-transport/rdma: could not create rdma device for mthca0
>>> [2010-11-30 18:43:49.736805] E [rdma.c:4789:init] testdir-client-0:
>>> Failed to initialize IB Device
>>> [2010-11-30 18:43:49.736841] E
>>> [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma'
>>> initialization failed
>>>
>>> This results in an unsuccessful mount.
>>>
>>> I created the mount using the following commands:
>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume create testdir
>>> transport rdma submit-1:/exports
>>> /usr/local/glusterfs/3.1.1/sbin/gluster volume start testdir
>>>
>>> To mount the directory, I use:
>>> mount -t glusterfs submit-1:/testdir /mnt/glusterfs
>>>
>>> I don't think it is an Infiniband problem since GlusterFS 3.0.6 and
>>> GlusterFS 3.1.0 worked on the same systems. For GlusterFS 3.1.0, the
>>> commands listed above produced no error messages.
>>>
>>> If anyone can provide help with debugging these error messages, it
>>> would be appreciated.
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>>
>


More information about the Gluster-users mailing list