[Gluster-users] Gluster scalability [Was: Adding new storage nodes to existing GlusterFS network]

Emmanuel Noobadmin centos.admin at gmail.com
Tue Sep 28 17:26:25 UTC 2010


After following Roland's thread
(http://gluster.org/pipermail/gluster-users/2010-September/005311.html),
I'm wondering if this means there's a limit to how scalable gluster is
if we start small.

It seems that every time a new brick is added, the scale and defrag
script must be ran. Since we're going over the network, for those of
us starting on low budget interconnect, i.e. Gigabit Ethernet, it
would take a long while.

Let's say I'm using 4x1.5TB drives for 4.5TB RAID 5 storage brick.
Starting with four in replicate/distribute. So effectively 9TB of
space for the gluster network. Now if we hit 90% capacity and add four
new 4.5TB bricks. Am I correct to understand the scale and defrag
script would cause say around 6TB of data to be spread around, twice
since it's replicate and assuming the remaining 2TB get to stay where
they were.

If the network was able to sustain 30MB/s, that would take around 48
hours of continuous operation to complete. Since the cluster is
unlikely to be idle and there is bound to be some overheads, would
that be closer to 72hrs in reality?

Now it seems to me that since the scale and defrag would redistribute
the chunks all over the new nodes, the next set of four would take 2x
(97~145hrs) as long since there are more data/files now. Then the next
group of four would take 3x (146~220hrs) or about a week.

At some point, it seems that adding a new set of nodes may cause a
scale/defrag time so long that the organisation may have to add a new
set before it finishes?

It doesn't seem to make sense so what am I actually getting wrong?



More information about the Gluster-users mailing list