[Gluster-users] Targeted fix-layout?

Wed Jan 16 14:56:09 UTC 2013

On 01/15/2013 08:17 PM, gluster-users-request at gluster.org wrote:
> Date: Tue, 15 Jan 2013 15:17:00 -0500
> From: Jeff Darcy <jdarcy at redhat.com>
> To: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Targeted fix-layout?
> Message-ID: <50F5B93C.5040802 at redhat.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 01/15/2013 01:10 PM, Dan Bretherton wrote:
>> I am running a fix-layout operation on a volume after seeing errors mentioning
>> "anomalies" and "holes" in the logs.  There is a particular directory that is
>> giving trouble and I would like to be able to run the layout fix on that
>> first.  Users are experiencing various I/O errors including "invalid argument"
>> and "Unknown error 526", but after running for a week the volume wide
>> fix-layout doesn't seem to have reached this particular directory yet.
>> Fix-layout takes a long time because there are millions of files in the volume
>> and the CPU load is consistently very high on all the servers while it is
>> running, sometimes over 20. Therefore I really need to find a way to target
>> particular directories or speed up the volume wide fix-layout.
> You should be able to do the following command on a client to fix the layout
> for just one directory (it's the same xattr used by the rebalance command).
>
> 	setfattr -n distribute.fix.layout -v "anything" /bad/directory
>> I have no idea what caused these errors but it could be related to the previous
>> fix-layout operation, which I started following the addition of a new pair of
>> bricks, not having completed successfully.  The problem is that the rebalance
>> operation on one or more servers often fails before completing and there is no
>> way (that I know of) to restart or resume the process on one server.  Every
>> time this happens I stop the fix-layout and start it again, but it has never
>> completed successfully on every server despite sometimes running for several
>> weeks.
>>
>> One other possible cause I can think of is my recent policy of using XFS for
>> new bricks instead of ext4.  The reason I think this might be causing the
>> problem is that none of the other volumes have any XFS bricks yet and they
>> aren't experiencing any I/O errors.  Are there any special mount options
>> required for XFS, and is there any reason why a volume shouldn't contain a
>> mixture of ext4 and XFS bricks?
> It doesn't seem like that should be a problem, but maybe someone else knows
> something about ext4/XFS differences that could shed some light.
Thanks Jeff, I'll give that a try.

Should the xattr name be trusted.distribute.fix.layout by the way? When 
I try with distribute.fix.layout I get the error "Operation not supported".

-Dan.