[Gluster-users] Files present on the backend but have become invisible from clients

Burnash, James jburnash at knight.com
Fri Jun 10 18:11:11 UTC 2011


Hi Amar.

Is there a projected release date for 3.1.5 and 3.2.1?

From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
Sent: Friday, May 27, 2011 1:42 PM
To: 'Amar Tumballi'
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Files present on the backend but have become invisible from clients

Thank you Amar – this is much appreciated and gives me a better understanding of the meaning of some of these attributes.

I would like to suggest that something at least on this level be added to the Gluster documentation for future use by others, as well as myself (I forget sometimes ☺)

So – as far as what I can do about these incorrect Replicate attributes – it appears that the answer is “nothing until the next release”? Or will triggering a self heal on those directories specifically clean things up?

Thanks as always,

James

From: Amar Tumballi [mailto:amar at gluster.com]
Sent: Friday, May 27, 2011 12:23 PM
To: Burnash, James
Cc: Mohit Anchlia; gluster-users at gluster.org
Subject: Re: [Gluster-users] Files present on the backend but have become invisible from clients

James,

Replies inline.
The directories are all still visible to the users, but scanning for attributes of 0sAAAAAAAAAAAAAAAA still yielded matches on the set of GlusterFS servers.

http://pastebin.com/mxvFnFj4

I tried running this command, but as you can see it wasn't happy, even though the syntax was correct:

root at jc1letgfs17:~# gluster volume rebalance pfs-ro1 fix-layout start
Usage: volume rebalance <VOLNAME> [fix-layout|migrate-data] {start|stop|status}

I suspect this is a bug because of the "-" in my volume name. I'll test and confirm and file when I get a chance.

This seems to be an bug with the 'fix-layout' CLI option itself (as i assume the version in 3.1.3, its fixed in 3.1.4+ or 3.2.0), please use just 'rebalance <VOLNAME> start'.


So I just did the standard rebalance command:
 gluster volume rebalance pfs-ro1 start

and it trundled along for a while and then one time when checked it's status, it failed:
 date; gluster volume rebalance pfs-ro1 status
 Thu May 26 09:02:00 EDT 2011
 rebalance failed

I re-ran it FOUR times getting a little farther with each attempt, and it eventually completed and then started doing the actual file migration part of the rebalance:
 Thu May 26 12:22:25 EDT 2011
 rebalance step 1: layout fix in progress: fixed layout 779
 Thu May 26 12:23:25 EDT 2011
 rebalance step 2: data migration in progress: rebalanced 71 files of size 136518704 (total files scanned 57702)

Now scanning for attributes of 0sAAAAAAAAAAAAAAAA yields less results, but some are still present:

Now, doing a 'rebalance' is surely not the way to heal the 'replicate' related attributes. 'rebalance' is all about fixing the 'distribute' related 'layout's and rebalancing the data within the servers.

It could have helped in resolving some of the attributes of 'replicate' as issuing a rebalance triggers a directory traversal on the volume (which is infact same as doing a 'ls -lR' or 'find' on volume).

http://pastebin.com/x4wYq8ic

As a possible sanity check, I did this command on my Read-Write GlusterFS storage servers (2 boxes, Distributed-Replicate), and got no "bad" attributes:
 jc1ladmin1:~/projects/gluster  loop_check ' getfattr -dm - /export/read-only/g*' jc1letgfs{13,16} | egrep "jc1letgfs|0sAAAAAAAAAAAAAAAA$|file:" | less
 getfattr: /export/read-only/g*: No such file or directory
 getfattr: /export/read-only/g*: No such file or directory
 jc1letgfs13
 jc1letgfs16

One difference in these two Storage server groups - the Read-Only group of 4 servers have their backend file systems formatted as XFS, while the Read-Write group of 2 are formatted with EXT4.

Suggestions, critiques, etc gratefully solicited.

Please, next time while looking at the GlusterFS attributes use '-e hex' for 'getfattr' command. Anyways, I think the issue here is mostly due to some sort of bug which resulted in writing attributes saying 'split-brain' happened, and if that is the attribute, 'replicate' module doesn't heal anything and leaves the file as is (without even fixing the attribute).

We are currently working on fixing these meta-data self-heal related issues right now and hope to fix many of them by 3.2.1 (and 3.1.5).

Regards,
Amar

James Burnash
Unix Engineer.



DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s)named herein and
may contain legally privileged and/or confidential information. If you are not the intended recipient of this
e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachments
thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently
delete the original and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free.
The sender therefore does not accept liability for any errors or omissions in the contents of this message which
arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY
Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications.

http://www.knight.com<http://www.knight.com/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110610/1c42997c/attachment.html>


More information about the Gluster-users mailing list