[Gluster-users] Possible new bug in 3.1.5 discovered

Burnash, James jburnash at knight.com
Wed Jun 29 15:48:39 UTC 2011


Sorry - I just discovered that I appended the wrong log lines to my original post - here are the correct ones, for completeness:

[2011-06-29 10:21:32.959956] E [rpc-clnt.c:199:call_bail] 0-pfs-rw1-client-2: bailing out frame type(GlusterFS 3.1) op(ENTRYLK(31)) xid = 0x61874x sent = 2011-06-29 09:51:31.447474. timeout = 1800
[2011-06-29 10:51:34.781215] E [rpc-clnt.c:199:call_bail] 0-pfs-rw1-client-3: bailing out frame type(GlusterFS 3.1) op(ENTRYLK(31)) xid = 0x62358x sent = 2011-06-29 10:21:32.960048. timeout = 1800

James Burnash
Unix Engineer
Knight Capital Group

From: Anand Avati [mailto:anand.avati at gmail.com]
Sent: Wednesday, June 29, 2011 11:17 AM
To: Burnash, James
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Possible new bug in 3.1.5 discovered

Compatibility was broken between 3.1.4 (and pre) servers and 3.1.5 clients (results in a hang when replicate translator is used). This compat breakage was "necessary" in order to fix a hang issue which was present in all 3.1.x till then. New servers should work fine with old clients. Upgrade all your servers before upgrading the clients.

Avati
On Wed, Jun 29, 2011 at 8:23 PM, Burnash, James <jburnash at knight.com<mailto:jburnash at knight.com>> wrote:
I'm sorry - I think I wasn't clear.

The problem is that a 3.1.5 client used to write a file to GlusterFS native mount point on a server running 3.1.3 hangs.

Are you saying that the clients are known to not be backward compatible within the 3.1.x series?

James Burnash
Unix Engineer
Knight Capital Group

From: Anand Avati [mailto:anand.avati at gmail.com<mailto:anand.avati at gmail.com>]
Sent: Wednesday, June 29, 2011 10:46 AM
To: Burnash, James
Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] Possible new bug in 3.1.5 discovered

James,
  Both in 3.1.5 and 3.2.1 there were necessary locks hang fixes which went in and as a side effect clients and servers result in a hang when used across versions. Please upgrade your clients to 3.1.5 as well. This is a known, and hard to fix compatibility issue.

Avati
On Wed, Jun 29, 2011 at 8:05 PM, Burnash, James <jburnash at knight.com<mailto:jburnash at knight.com>> wrote:
"May you live in interesting times"

Is this a curse or a blessing? :)

I've just tested a 3.1.5 GlusterFS native client against a 3.1.3 storage pool using this volume:

Volume Name: pfs-rw1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: jc1letgfs16-pfs1:/export/read-write/g01
Brick2: jc1letgfs13-pfs1:/export/read-write/g01
Brick3: jc1letgfs16-pfs1:/export/read-write/g02
Brick4: jc1letgfs13-pfs1:/export/read-write/g02
Options Reconfigured:
performance.cache-size: 2GB
performance.stat-prefetch: 0
network.ping-timeout: 10
diagnostics.client-log-level: ERROR

Any attempt to write to that volume mounted on a native client using version 3.1.5 results in a hang at the command line, which I can only break out of by killing my ssh session into the client. Upon logging back into the same client, I see a zombie process from the attempt to write:

21172 ?        D      0:00 touch /pfs1/test/junk1

Anybody else run into this situation?

Client mount log (/var/log/glusterfs/pfs2.log) below:

[2011-06-29 10:28:07.860519] E [afr-self-heal-metadata.c:522:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-29 10:28:07.860668] E [afr-self-heal-metadata.c:522:afr_sh_metadata_fix] 0-pfs-ro1-replicate-1: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
ns/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
data self-heal failed on /
data self-heal failed on /
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
ns/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
data self-heal failed on /
data self-heal failed on /
data self-heal failed on /
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
data self-heal failed on /
data self-heal failed on /
data self-heal failed on /
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
data self-heal failed on /
data self-heal failed on /
data self-heal failed on /
data self-heal failed on /

James Burnash
Unix Engineer
Knight Capital Group



DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s)named herein and
may contain legally privileged and/or confidential information. If you are not the intended recipient of this
e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachments
thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently
delete the original and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free.
The sender therefore does not accept liability for any errors or omissions in the contents of this message which
arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY
Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications.

http://www.knight.com<http://www.knight.com/>


_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users





DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s)named herein and
may contain legally privileged and/or confidential information. If you are not the intended recipient of this
e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachments
thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently
delete the original and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free.
The sender therefore does not accept liability for any errors or omissions in the contents of this message which
arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY
Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications.

http://www.knight.com<http://www.knight.com/>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110629/e64b8a3f/attachment.html>


More information about the Gluster-users mailing list