[Gluster-users] "remove-brick" command SHOULD migrate data

Thu Feb 17 09:57:51 UTC 2011

> Date: Wed, 16 Feb 2011 12:40:35 -0500
> From: "William L. Sebok"<wls at astro.umd.edu>
> Subject: [Gluster-users]  "remove-brick" command SHOULD migrate data
> To: Rahul C S<rahul at gluster.com>
> Cc: Mark Wolfire<mwolfire at astro.umd.edu>, gluster-users at gluster.org,
> 	Kwang-Ho Park<kpark at astro.umd.edu>,	"Derek C. Richardson"
> 	<dcr at astro.umd.edu>,	Randall Perrine<perrine at astro.umd.edu>
> Message-ID:<20110216174035.GA24390 at earth.astro.umd.edu>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, Feb 15, 2011 at 11:49:06PM -0600, Rahul C S wrote:
>> For the last question,
>> "remove-brick" command does not migrate data, the data in that brick cannot
>> be accessed from the client unlike "replace-brick" which actually migrates
>> data from one brick to the another.
> I strongly suggest for an enhancement a version of remove-brick that actually
> does migrate the data.  This would be *extremely* useful in dealing with a
> distributed/replicated filesystem with a computer and/or brick that is dead or
> likely to be down for an extended period (I configure bricks to be replicated
> between different computers). The remove-brick command on startup
> could make an estimate whether the data would fit on the remaining bricks.
> If after the migration started it turned out that the data really does not all
> fit there would still would not be any loss as long as the last file movement
> wasn't completely committed.  The command then could be aborted.
>
> This would be no different in concept and risks (and usefulness) than reducing
> the size of a partition with gparted.  I would make the remove-brick command
> only remove a brick without migrating it if there were some "force" option in
> effect.  I have trouble seeing why one would otherwise want to use the
> remove-brick command and throw away the data except in some dire emergency.
>
> Bill Sebok      Computer Software Manager, Univ. of Maryland, Astronomy
> 	Internet: wls at astro.umd.edu	URL: http://furo.astro.umd.edu/
>
>
> -----------------------------

Hello All-
I would like to add my support to this feature request. I assumed that 
remove-brick did migrate data at first, until I looked into it more 
carefully.  Unless I have got the wrong end of the stick, the only way 
to shrink a distributed or distributed-replicated volume at the moment 
is to perform the following steps.

1) Tell the users to stop using the volume, even though they will still 
be able to mount it and write to it
2) Remove the brick (and its mirror if appropriate) with remove-brick
3) Remove all the link files from the backend filesystem of the brick 
that has just been removed, using "find /brick/path -size 0b -perm 1000 
-exec /bin/rm -v {} \;" or similar
4) Copy the files from the backend filesystem to the mount point of the 
volume
5) Tell the users it is safe to carry on using the volume again.

That would be quite risky in my department because the volumes are 
auto-mounted on many different clients, and even if everyone has 
bothered to read my email telling them to stop using the volume they 
might accidentally leave a process running that is writing files to it.  
If some of those files have the same names as files that were on the 
removed brick, we could end up with a situation where the new files from 
the running process are overwritten by old versions being copied from 
the removed brick.  To avoid this scenario, and other potential 
disasters I haven't thought of yet, I would have to do the following to 
safely shrink a volume.

1) Take the volume off line and then delete it
2) Create a new volume with a temporary name, containing all the bricks 
from the original volume except the brick (and its mirror if 
appropriate) I want to remove
3) Remove the link files from the backend filesystem of the brick that 
has just been removed
4) Copy the files from the backend filesystem to the mount point of the 
temporary volume
5) Delete the temporary volume, and re-create it using the original name 
so it can be auto-mounted.
6) Put the volume on line again.

If there is an easier way of safely shrinking a distributed or 
distributed-replicated volume please let me know.

Regards,
-Dan.