[Gluster-users] Why does this setup not survive a node crash?

Burnash, James jburnash at knight.com
Wed Mar 16 11:07:42 PDT 2011

So - answering myself with the (apparent) solution. The configuration IS correct as shown - the problems were elsewhere.

Primary cause for this seems to be performing the gluster native client mount on a virtual machine WITHOUT using the " -O --disable-direct-io-mode" parameter.

So I was mounting like this:

	mount -t glusterfs jc1letgfs5:/test-pfs-ro1 /test-pfs2

When I should have been doing this:

	mount -t glusterfs -O --disable-direct-io-mode jc1letgfs5:/test-pfs-ro1 /test-pfs2

Secondly, I changed the volume parameter "network.ping-timeout" from its default of 43 to 10 seconds, in order to get faster recovery from a downed storage node:

	gluster volume set pfs-rw1 network.ping-timeout 10

This configuration now survives the loss of either node of the two storage server mirrors. There is a noticeable delay before commands on the mount point complete the first time a command is issued after one of the nodes have gone done - but then they return at the same speed as when all nodes were present.

Thanks especially to all who helped, and Anush who helped me troubleshoot it from a different angle.

James Burnash, Unix Engineering

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
Sent: Friday, March 11, 2011 11:31 AM
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] Why does this setup not survive a node crash?

Could anyone else please take a peek at this an sanity check my configuration. I'm quite frankly at a loss and tremendously under the gun ...

Thanks in advance to any kind souls.

James Burnash, Unix Engineering

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
Sent: Thursday, March 10, 2011 3:55 PM
To: gluster-users at gluster.org
Subject: [Gluster-users] Why does this setup not survive a node crash?

Perhaps someone will see immediately, given the data below, why this configuration will not survive a crash of one node - it appears that any node crashed out of this set will cause gluster native clients to hang until the node comes back.

Given (2) initial storage servers (CentOS 5.5, Gluster 3.1.1):

Starting out by creating a Replicated-Distributed pair with this command:
gluster volume create test-pfs-ro1 replica 2 jc1letgfs5:/export/read-only/g01 jc1letgfs6:/export/read-only/g01 jc1letgfs5:/export/read-only/g02 jc1letgfs6:/export/read-only/g02

Which ran fine (thought I did not attempt to crash 1 of the pair)

And then adding (2) more servers, identically configured, with this command:
gluster volume add-brick test-pfs-ro1 jc1letgfs7:/export/read-only/g01 jc1letgfs8:/export/read-only/g01 jc1letgfs7:/export/read-only/g02 jc1letgfs8:/export/read-only/g02
Add Brick successful

root at jc1letgfs5:~# gluster volume info

Volume Name: test-pfs-ro1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Brick1: jc1letgfs5:/export/read-only/g01
Brick2: jc1letgfs6:/export/read-only/g01
Brick3: jc1letgfs5:/export/read-only/g02
Brick4: jc1letgfs6:/export/read-only/g02
Brick5: jc1letgfs7:/export/read-only/g01
Brick6: jc1letgfs8:/export/read-only/g01
Brick7: jc1letgfs7:/export/read-only/g02
Brick8: jc1letgfs8:/export/read-only/g02

And this volfile info out of the log file /var/log/glusterfs/etc-glusterd-mount-test-pfs-ro1.log:

[2011-03-10 14:38:26.310807] W [dict.c:1204:data_to_str] dict: @data=(nil)
Given volfile:
  1: volume test-pfs-ro1-client-0
  2:     type protocol/client
  3:     option remote-host jc1letgfs5
  4:     option remote-subvolume /export/read-only/g01
  5:     option transport-type tcp
  6: end-volume
  8: volume test-pfs-ro1-client-1
  9:     type protocol/client
 10:     option remote-host jc1letgfs6
 11:     option remote-subvolume /export/read-only/g01
 12:     option transport-type tcp
 13: end-volume
 15: volume test-pfs-ro1-client-2
 16:     type protocol/client
 17:     option remote-host jc1letgfs5
 18:     option remote-subvolume /export/read-only/g02
 19:     option transport-type tcp
 20: end-volume
 22: volume test-pfs-ro1-client-3
 23:     type protocol/client
 24:     option remote-host jc1letgfs6
 25:     option remote-subvolume /export/read-only/g02
 26:     option transport-type tcp
 27: end-volume
 29: volume test-pfs-ro1-client-4
 30:     type protocol/client
 31:     option remote-host jc1letgfs7
 32:     option remote-subvolume /export/read-only/g01
 33:     option transport-type tcp
 34: end-volume
36: volume test-pfs-ro1-client-5
 37:     type protocol/client
 38:     option remote-host jc1letgfs8
 39:     option remote-subvolume /export/read-only/g01
 40:     option transport-type tcp
 41: end-volume
 43: volume test-pfs-ro1-client-6
 44:     type protocol/client
 45:     option remote-host jc1letgfs7
 46:     option remote-subvolume /export/read-only/g02
 47:     option transport-type tcp
 48: end-volume
 50: volume test-pfs-ro1-client-7
 51:     type protocol/client
 52:     option remote-host jc1letgfs8
 53:     option remote-subvolume /export/read-only/g02
 54:     option transport-type tcp
 55: end-volume
 57: volume test-pfs-ro1-replicate-0
 58:     type cluster/replicate
 59:     subvolumes test-pfs-ro1-client-0 test-pfs-ro1-client-1
 60: end-volume
 62: volume test-pfs-ro1-replicate-1
 63:     type cluster/replicate
 64:     subvolumes test-pfs-ro1-client-2 test-pfs-ro1-client-3
 65: end-volume
 67: volume test-pfs-ro1-replicate-2
 68:     type cluster/replicate
 69:     subvolumes test-pfs-ro1-client-4 test-pfs-ro1-client-5
 70: end-volume
 72: volume test-pfs-ro1-replicate-3
 73:     type cluster/replicate
 74:     subvolumes test-pfs-ro1-client-6 test-pfs-ro1-client-7
 75: end-volume
 77: volume test-pfs-ro1-dht
 78:     type cluster/distribute
 79:     subvolumes test-pfs-ro1-replicate-0 test-pfs-ro1-replicate-1 test-pfs-ro1-replicate-2 test-pfs-ro1-replicate-3
 80: end-volume
 82: volume test-pfs-ro1-write-behind
 83:     type performance/write-behind
 84:     subvolumes test-pfs-ro1-dht
 85: end-volume
 87: volume test-pfs-ro1-read-ahead
 88:     type performance/read-ahead
 89:     subvolumes test-pfs-ro1-write-behind
 90: end-volume
 92: volume test-pfs-ro1-io-cache
 93:     type performance/io-cache
 94:     subvolumes test-pfs-ro1-read-ahead
 95: end-volume
 97: volume test-pfs-ro1-quick-read
 98:     type performance/quick-read
 99:     subvolumes test-pfs-ro1-io-cache
100: end-volume
102: volume test-pfs-ro1-stat-prefetch
103:     type performance/stat-prefetch
104:     subvolumes test-pfs-ro1-quick-read
105: end-volume
107: volume test-pfs-ro1
108:     type debug/io-stats
109:     subvolumes test-pfs-ro1-stat-prefetch
110: end-volume

Any input would be greatly appreciated. I'm working beyond my deadline already, and I'm guessing that I'm not seeing the forest for the trees here.

James Burnash, Unix Engineering

This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com
Gluster-users mailing list
Gluster-users at gluster.org
Gluster-users mailing list
Gluster-users at gluster.org

More information about the Gluster-users mailing list