[Gluster-users] small write speed problem on EBS, distributed replica

Wed Mar 23 17:51:00 UTC 2011

It will be great if gluster developers can pitch in for help. It will
be good to know how to troubleshoot this.

What did you use to calculate latency? I was thinking of either using
ping test with large packet size or use iperf.

How did you get these nos?

>>> Bandwidths:
>>> dfs01: 54 MB/s
>>> dfs02: 62.5 MB/s
>>> dfs03: 64 MB/s
>>> dfs04: 91.5 MB/s

Are you using nfs or gluster native?

Please look at:

http://www.gluster.com/community/documentation/index.php/Guide_to_Optimizing_GlusterFS

You might want to try some setting there. Also worst case you can do
tcpdump and see where it's spending most of the time for 1GB file. At
least it will tell you if it's something not related to client.

Also look at: http://www.gluster.com/community/documentation/index.php/Gluster_3.1:_Setting_Volume_Options

On Wed, Mar 23, 2011 at 10:26 AM, karol skocik <karol.skocik at gmail.com> wrote:
> With FIO, raw write speed to EBS volume is like this:
>
> test: (g=0): rw=write, bs=128K-128K/128K-128K, ioengine=sync, iodepth=8
> Starting 1 process
> Jobs: 1 (f=1): [W] [100.0% done] [0K/43124K /s] [0 /329  iops] [eta 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=6406
>  write: io=1024.0MB, bw=37118KB/s, iops=289 , runt= 28250msec
>    clat (usec): min=58 , max=2222 , avg=78.20, stdev=25.17
>     lat (usec): min=59 , max=2223 , avg=78.89, stdev=25.19
>    bw (KB/s) : min= 7828, max=60416, per=104.72%, avg=38870.65, stdev=10659.43
>
> Average bandwidth 38.8 MB/s
> and average completion latency for IO request 78 microsecs.
>
> Since FUSE module uses the same blocksize (128 KB) as I set up to use
> in FIO, I would say the bandwidth for 2x2 replica could be around 15
> MB/s and more, when 1 client wants to write 1GB file.
>
> Currently, Gluster can go up to 22 MB/s - without replication, with 1 client.
> But with distributed replica 2x2 on 4 machines, the number for 1
> client writing 1 GB file goes down to 6.5 MB/s - that's the thing I
> don't understand.
>
>> I also suggest calculating network latency.
>
> I measured individual latencies to server machines here:
> dfs01: 402 microseconds
> dfs02: 322 microseconds
> dfs03: 445 microseconds
> dfs04: 378 microseconds
>
> I guess you mean some other - cumulative latency of a set of nodes? In
> that case, how do I calculate it?
>
> Karol
>
> On Wed, Mar 23, 2011 at 5:56 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>> What were you really expecting the numbers to be? What no. do you get
>> when you write directly to the ext3 file system bypassing GFS?
>>
>> I also suggest calculating network latency.
>>
>> On Wed, Mar 23, 2011 at 4:17 AM, karol skocik <karol.skocik at gmail.com> wrote:
>>> I see my email to the list was truncated - sending it again.
>>>
>>> Hi,
>>>  here are the measurements - the client machine is KS, and server
>>> machines are DFS0[1-4].
>>> First, the setup now is:
>>>
>>> Volume Name: EBSOne
>>> Type: Distribute
>>> Status: Started
>>> Number of Bricks: 1
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: dfs01:/mnt/ebs
>>>
>>> With just one client machine writing 1GB file to EBSOne, averaged from 3 runs:
>>>
>>> Bandwidth (mean): 22441.84 KB/s
>>> Bandwidth (deviation): 6059.24 KB/s
>>> Completion latency (mean): 1274.47 KB/s
>>> Completion latency (deviation): 1814.58 KB/s
>>>
>>> Now, the latencies:
>>>
>>> From KS (client machine) to DFS (server machines), averages of 3 runs.
>>>
>>> Latencies:
>>> dfs01: 402 microseconds
>>> dfs02: 322 microseconds
>>> dfs03: 445 microseconds
>>> dfs04: 378 microseconds
>>>
>>> Bandwidths:
>>> dfs01: 54 MB/s
>>> dfs02: 62.5 MB/s
>>> dfs03: 64 MB/s
>>> dfs04: 91.5 MB/s
>>>
>>> Every server machine has just 1 EBS drive, ext3 filesystem,
>>> 2.6.18-xenU-ec2-v1.0 - CFQ IO scheduler.
>>>
>>> Any ideas? From the numbers above - does it have any sense to try to
>>> make sw RAID0 with mdadm, or eventually use another filesystem?
>>>
>>> Thank you for help.
>>> Regards Karol
>>>
>>> On Wed, Mar 23, 2011 at 11:31 AM, karol skocik <karol.skocik at gmail.com> wrote:
>>>> Hi,
>>>>  here are the measurements - the client machine is KS, and server
>>>> machines are DFS0[1-4].
>>>> First, the setup now is:
>>>>
>>>> Volume Name: EBSOne
>>>> Type: Distribute
>>>> Status: Started
>>>> Number of Bricks: 1
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: dfs01:/mnt/ebs
>>>>
>>>> With just one client machine writing 1GB file to EBSOne, averaged from 3 runs:
>>>>
>>>> Bandwidth (mean): 22441.84 KB/s
>>>> Bandwidth (deviation): 6059.24 KB/s
>>>> Completion latency (mean): 1274.47 KB/s
>>>> Completion latency (deviation): 1814.58 KB/s
>>>>
>>>> Now, the latencies:
>>>>
>>>> From KS (client machine) to DFS (server machines), averages of 3 runs.
>>>>
>>>> Latencies:
>>>> dfs01: 402 microseconds
>>>> dfs02: 322 microseconds
>>>> dfs03: 445 microseconds
>>>> dfs04: 378 microseconds
>>>>
>>>> Bandwidths:
>>>> dfs01: 54 MB/s
>>>> dfs02: 62.5 MB/s
>>>> dfs03: 64 MB/s
>>>> dfs04: 91.5 MB/s
>>>>
>>>> Every server machine has just 1 EBS drive, ext3 filesystem,
>>>> 2.6.18-xenU-ec2-v1.0 - CFQ IO scheduler.
>>>>
>>>> Any ideas? From the numbers above - does it have any sense to try to
>>>> make sw RAID0 with mdadm, or eventually use another filesystem?
>>>>
>>>> Thank you for help.
>>>> Regards Karol
>>>>
>>>> On Tue, Mar 22, 2011 at 6:08 PM, Mohit Anchlia <mohitanchlia at gmail.com> wrote:
>>>>> Can you first run some test with no replica and see what results you
>>>>> get? Also, can you look at network latency from client to each of your
>>>>> 4 servers and post the results?
>>>>>
>>>>> On Mon, Mar 21, 2011 at 1:27 AM, karol skocik <karol.skocik at gmail.com> wrote:
>>>>>> Hi,
>>>>>>  I am in the process of evaluation of Gluster for major BI company,
>>>>>> but I was surprised by very small write performance on Amazon EBS.
>>>>>> Our setup is Gluster 3.1.2, distributed replica 2x2 on 64-bit m1.large
>>>>>> instances. Every server node has 1 EBS volume attached to it.
>>>>>> The configuration of the distributed replica is a default one, my
>>>>>> small attemps to improve performance (io-threads, disabled io-stats
>>>>>> and latency-measurement):
>>>>>>
>>>>>> volume EBSVolume-posix
>>>>>>    type storage/posix
>>>>>>    option directory /mnt/ebs
>>>>>> end-volume
>>>>>>
>>>>>> volume EBSVolume-access-control
>>>>>>    type features/access-control
>>>>>>    subvolumes EBSVolume-posix
>>>>>> end-volume
>>>>>>
>>>>>> volume EBSVolume-locks
>>>>>>    type features/locks
>>>>>>    subvolumes EBSVolume-access-control
>>>>>> end-volume
>>>>>>
>>>>>> volume EBSVolume-io-threads
>>>>>>    type performance/io-threads
>>>>>>    option thread-count 4
>>>>>>    subvolumes EBSVolume-locks
>>>>>> end-volume
>>>>>>
>>>>>> volume /mnt/ebs
>>>>>>    type debug/io-stats
>>>>>>    option log-level NONE
>>>>>>    option latency-measurement off
>>>>>>    subvolumes EBSVolume-io-threads
>>>>>> end-volume
>>>>>>
>>>>>> volume EBSVolume-server
>>>>>>    type protocol/server
>>>>>>    option transport-type tcp
>>>>>>    option auth.addr./mnt/ebs.allow *
>>>>>>    subvolumes /mnt/ebs
>>>>>> end-volume
>>>>>>
>>>>>> In our test, all clients starts writing to different 1GB file at the same time.
>>>>>> The measured write bandwidth, with 2x2 servers:
>>>>>>
>>>>>> 1 client: 6.5 MB/s
>>>>>> 2 clients: 4.1 MB/s
>>>>>> 3 clients: 2.4 MB/s
>>>>>> 4 clients: 4.3 MB/s
>>>>>>
>>>>>> This is not acceptable for our needs. With PVFS2 (I know it's
>>>>>> stripping which is very different from replica) we can get up to 35
>>>>>> MB/s.
>>>>>> 2-3 times slower than that would be understandable. But 5-15 times
>>>>>> slower is not, and I would like to know whether there is something we
>>>>>> could try out.
>>>>>>
>>>>>> Could anybody publish their write speeds on similar setup, and tips
>>>>>> how to achieve better performance?
>>>>>>
>>>>>> Thank you,
>>>>>>  Karol
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>>
>