[Gluster-users] gluster local vs local = gluster x4 slower

Steven Truelove truelove at array.ca
Tue Mar 30 16:33:28 UTC 2010


What you are likely seeing is the OS saving dirty pages in the disk 
cache before writing them.  If you were untarring a file that was 
significantly larger than available memory on the server, the server 
would be forced to write to disk and you would likely see performance 
fall more into line with the results you get when you call sync.

Gluster is probably flushing data to disk more aggressively than the OS 
would on its own.  This may be intended for reducing the loss of data in 
server failure scenarios.  Someone on the Gluster team can probably 
comment on any settings that may exist for controlling Gluster's data 
flushing behaviour.

Steven Truelove


On 29/03/2010 5:09 PM, Jeremy Enos wrote:
> I've already determined that && sync  brings the values at least to 
> the same order (gluster is about 75% of direct disk there).  I could 
> accept that for the benefit of having a parallel fileystem.
> What I'm actually trying to achieve now is exactly what leaving out 
> the && sync yields in perceived performance, which translates to real 
> performance if the user can continue on to another task instead of 
> blocking because Gluster isn't utilizing cache.  How, with Gluster, 
> can I achieve the same cache benefit that direct disk gets?  Will a 
> user ever be able to untar a moderately sized (below physical memory) 
> file on to a Gluster filesystem as fast as to a single disk?  (as I 
> did in my initial comparison)  Is there something fundamentally 
> preventing that in Gluster's design, or am I misconfiguring it?
> thx-
>
>     Jeremy
>
> On 3/29/2010 2:00 PM, Bryan Whitehead wrote:
>> heh, don't forget the&&  sync
>>
>> :)
>>
>> On Mon, Mar 29, 2010 at 11:21 AM, Jeremy Enos<jenos at ncsa.uiuc.edu>  
>> wrote:
>>> Got a chance to run your suggested test:
>>>
>>> ##############GLUSTER SINGLE DISK##############
>>>
>>> [root at ac33 gjenos]# dd bs=4096 count=32768 if=/dev/zero 
>>> of=./filename.test
>>> 32768+0 records in
>>> 32768+0 records out
>>> 134217728 bytes (134 MB) copied, 8.60486 s, 15.6 MB/s
>>> [root at ac33 gjenos]#
>>> [root at ac33 gjenos]# cd /export/jenos/
>>>
>>> ##############DIRECT SINGLE DISK##############
>>>
>>> [root at ac33 jenos]# dd bs=4096 count=32768 if=/dev/zero 
>>> of=./filename.test
>>> 32768+0 records in
>>> 32768+0 records out
>>> 134217728 bytes (134 MB) copied, 0.21915 s, 612 MB/s
>>> [root at ac33 jenos]#
>>>
>>> If doing anything that can see a cache benefit, the performance of 
>>> Gluster
>>> can't compare.  Is it even using cache?
>>>
>>> This is the client vol file I used for that test:
>>>
>>> [root at ac33 jenos]# cat /etc/glusterfs/ghome.vol
>>> #-----------IB remotes------------------
>>> volume ghome
>>>   type protocol/client
>>>   option transport-type tcp/client
>>>   option remote-host ac33
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> #------------Performance Options-------------------
>>>
>>> volume readahead
>>>   type performance/read-ahead
>>>   option page-count 4           # 2 is default option
>>>   option force-atime-update off # default is off
>>>   subvolumes ghome
>>> end-volume
>>>
>>> volume writebehind
>>>   type performance/write-behind
>>>   option cache-size 1MB
>>>   subvolumes readahead
>>> end-volume
>>>
>>> volume cache
>>>   type performance/io-cache
>>>   option cache-size 2GB
>>>   subvolumes writebehind
>>> end-volume
>>>
>>>
>>> Any suggestions appreciated.  thx-
>>>
>>>     Jeremy
>>>
>>> On 3/26/2010 6:09 PM, Bryan Whitehead wrote:
>>>> One more thought, looks like (from your emails) you are always running
>>>> the gluster test first. Maybe the tar file is being read from disk
>>>> when you do the gluster test, then being read from cache when you run
>>>> for the disk.
>>>>
>>>> What if you just pull a chunk of 0's off /dev/zero?
>>>>
>>>> dd bs=4096 count=32768 if=/dev/zero of=./filename.test
>>>>
>>>> or stick the tar in a ramdisk?
>>>>
>>>> (or run the benchmark 10 times for each, drop the best and the worse,
>>>> and average the remaining 8)
>>>>
>>>> Would also be curious if you add another node if the time would be
>>>> halved, then add another 2... then it would be halved again? I guess
>>>> that depends on if striping or just replicating is being used.
>>>> (unfortunately I don't have access to more than 1 test box right now).
>>>>
>>>> On Wed, Mar 24, 2010 at 11:06 PM, Jeremy 
>>>> Enos<jenos at ncsa.uiuc.edu>    wrote:
>>>>
>>>>> For completeness:
>>>>>
>>>>> ##############GLUSTER SINGLE DISK NO PERFORMANCE 
>>>>> OPTIONS##############
>>>>> [root at ac33 gjenos]# time (tar xzf
>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&    sync )
>>>>>
>>>>> real    0m41.052s
>>>>> user    0m7.705s
>>>>> sys     0m3.122s
>>>>> ##############DIRECT SINGLE DISK##############
>>>>> [root at ac33 gjenos]# cd /export/jenos
>>>>> [root at ac33 jenos]# time (tar xzf
>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&    sync )
>>>>>
>>>>> real    0m22.093s
>>>>> user    0m6.932s
>>>>> sys     0m2.459s
>>>>> [root at ac33 jenos]#
>>>>>
>>>>> The performance options don't appear to be the problem.  So the 
>>>>> question
>>>>> stands- how do I get the disk cache advantage through the Gluster 
>>>>> mounted
>>>>> filesystem?  It seems to be key in the large performance difference.
>>>>>
>>>>>     Jeremy
>>>>>
>>>>> On 3/24/2010 4:47 PM, Jeremy Enos wrote:
>>>>>
>>>>>> Good suggestion- I hadn't tried that yet.  It brings them much 
>>>>>> closer.
>>>>>>
>>>>>> ##############GLUSTER SINGLE DISK##############
>>>>>> [root at ac33 gjenos]# time (tar xzf
>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&    sync )
>>>>>>
>>>>>> real    0m32.089s
>>>>>> user    0m6.516s
>>>>>> sys     0m3.177s
>>>>>> ##############DIRECT SINGLE DISK##############
>>>>>> [root at ac33 gjenos]# cd /export/jenos/
>>>>>> [root at ac33 jenos]# time (tar xzf
>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&    sync )
>>>>>>
>>>>>> real    0m25.089s
>>>>>> user    0m6.850s
>>>>>> sys     0m2.058s
>>>>>> ##############DIRECT SINGLE DISK CACHED##############
>>>>>> [root at ac33 jenos]# time (tar xzf
>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz )
>>>>>>
>>>>>> real    0m8.955s
>>>>>> user    0m6.785s
>>>>>> sys     0m1.848s
>>>>>>
>>>>>>
>>>>>> Oddly, I'm seeing better performance on the gluster system than 
>>>>>> previous
>>>>>> tests too (used to be ~39 s).  The direct disk time is obviously
>>>>>> benefiting
>>>>>> from cache.  There is still a difference, but it appears most of the
>>>>>> difference disappears w/ the cache advantage removed.  That said- 
>>>>>> the
>>>>>> relative performance issue then still exists with Gluster.  What 
>>>>>> can be
>>>>>> done
>>>>>> to make it benefit from cache the same way direct disk does?
>>>>>> thx-
>>>>>>
>>>>>>     Jeremy
>>>>>>
>>>>>> P.S.
>>>>>> I'll be posting results w/ performance options completely removed 
>>>>>> from
>>>>>> gluster as soon as I get a chance.
>>>>>>
>>>>>>     Jeremy
>>>>>>
>>>>>> On 3/24/2010 4:23 PM, Bryan Whitehead wrote:
>>>>>>
>>>>>>> I'd like to see results with this:
>>>>>>>
>>>>>>> time ( tar xzf 
>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz&&
>>>>>>>   sync )
>>>>>>>
>>>>>>> I've found local filesystems seem to use cache very heavily. The
>>>>>>> untarred file could mostly be sitting in ram with local fs vs going
>>>>>>> though fuse (which might do many more sync'ed flushes to disk?).
>>>>>>>
>>>>>>> On Wed, Mar 24, 2010 at 2:25 AM, Jeremy Enos<jenos at ncsa.uiuc.edu>
>>>>>>>   wrote:
>>>>>>>
>>>>>>>> I also neglected to mention that the underlying filesystem is 
>>>>>>>> ext3.
>>>>>>>>
>>>>>>>> On 3/24/2010 3:44 AM, Jeremy Enos wrote:
>>>>>>>>
>>>>>>>>> I haven't tried all performance options disabled yet- I can 
>>>>>>>>> try that
>>>>>>>>> tomorrow when the resource frees up.  I was actually asking first
>>>>>>>>> before
>>>>>>>>> blindly trying different configuration matrices in case there's a
>>>>>>>>> clear
>>>>>>>>> direction I should take with it.  I'll let you know.
>>>>>>>>>
>>>>>>>>>     Jeremy
>>>>>>>>>
>>>>>>>>> On 3/24/2010 2:54 AM, Stephan von Krawczynski wrote:
>>>>>>>>>
>>>>>>>>>> Hi Jeremy,
>>>>>>>>>>
>>>>>>>>>> have you tried to reproduce with all performance options 
>>>>>>>>>> disabled?
>>>>>>>>>> They
>>>>>>>>>> are
>>>>>>>>>> possibly no good idea on a local system.
>>>>>>>>>> What local fs do you use?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> Regards,
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 23 Mar 2010 19:11:28 -0500
>>>>>>>>>> Jeremy Enos<jenos at ncsa.uiuc.edu>        wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Stephan is correct- I primarily did this test to show a
>>>>>>>>>>> demonstrable
>>>>>>>>>>> overhead example that I'm trying to eliminate.  It's pronounced
>>>>>>>>>>> enough
>>>>>>>>>>> that it can be seen on a single disk / single node 
>>>>>>>>>>> configuration,
>>>>>>>>>>> which
>>>>>>>>>>> is good in a way (so anyone can easily repro).
>>>>>>>>>>>
>>>>>>>>>>> My distributed/clustered solution would be ideal if it were 
>>>>>>>>>>> fast
>>>>>>>>>>> enough
>>>>>>>>>>> for small block i/o as well as large block- I was hoping that
>>>>>>>>>>> single
>>>>>>>>>>> node systems would achieve that, hence the single node test.
>>>>>>>>>>>   Because
>>>>>>>>>>> the single node test performed poorly, I eventually reduced 
>>>>>>>>>>> down to
>>>>>>>>>>> single disk to see if it could still be seen, and it clearly 
>>>>>>>>>>> can
>>>>>>>>>>> be.
>>>>>>>>>>> Perhaps it's something in my configuration?  I've pasted my 
>>>>>>>>>>> config
>>>>>>>>>>> files
>>>>>>>>>>> below.
>>>>>>>>>>> thx-
>>>>>>>>>>>
>>>>>>>>>>>       Jeremy
>>>>>>>>>>>
>>>>>>>>>>> ######################glusterfsd.vol######################
>>>>>>>>>>> volume posix
>>>>>>>>>>>     type storage/posix
>>>>>>>>>>>     option directory /export
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> volume locks
>>>>>>>>>>>     type features/locks
>>>>>>>>>>>     subvolumes posix
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> volume disk
>>>>>>>>>>>     type performance/io-threads
>>>>>>>>>>>     option thread-count 4
>>>>>>>>>>>     subvolumes locks
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> volume server-ib
>>>>>>>>>>>     type protocol/server
>>>>>>>>>>>     option transport-type ib-verbs/server
>>>>>>>>>>>     option auth.addr.disk.allow *
>>>>>>>>>>>     subvolumes disk
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> volume server-tcp
>>>>>>>>>>>     type protocol/server
>>>>>>>>>>>     option transport-type tcp/server
>>>>>>>>>>>     option auth.addr.disk.allow *
>>>>>>>>>>>     subvolumes disk
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> ######################ghome.vol######################
>>>>>>>>>>>
>>>>>>>>>>> #-----------IB remotes------------------
>>>>>>>>>>> volume ghome
>>>>>>>>>>>     type protocol/client
>>>>>>>>>>>     option transport-type ib-verbs/client
>>>>>>>>>>> #  option transport-type tcp/client
>>>>>>>>>>>     option remote-host acfs
>>>>>>>>>>>     option remote-subvolume raid
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>>
>>>>>>>>>>> volume readahead
>>>>>>>>>>>     type performance/read-ahead
>>>>>>>>>>>     option page-count 4           # 2 is default option
>>>>>>>>>>>     option force-atime-update off # default is off
>>>>>>>>>>>     subvolumes ghome
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> volume writebehind
>>>>>>>>>>>     type performance/write-behind
>>>>>>>>>>>     option cache-size 1MB
>>>>>>>>>>>     subvolumes readahead
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> volume cache
>>>>>>>>>>>     type performance/io-cache
>>>>>>>>>>>     option cache-size 1GB
>>>>>>>>>>>     subvolumes writebehind
>>>>>>>>>>> end-volume
>>>>>>>>>>>
>>>>>>>>>>> ######################END######################
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/23/2010 6:02 AM, Stephan von Krawczynski wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, 23 Mar 2010 02:59:35 -0600 (CST)
>>>>>>>>>>>> "Tejas N. Bhise"<tejas at gluster.com>         wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Out of curiosity, if you want to do stuff only on one 
>>>>>>>>>>>>> machine,
>>>>>>>>>>>>> why do you want to use a distributed, multi node, clustered,
>>>>>>>>>>>>> file system ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Because what he does is a very good way to show the overhead
>>>>>>>>>>>> produced
>>>>>>>>>>>> only by
>>>>>>>>>>>> glusterfs and nothing else (i.e. no network involved).
>>>>>>>>>>>> A pretty relevant test scenario I would say.
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Am I missing something here ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Tejas.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>> From: "Jeremy Enos"<jenos at ncsa.uiuc.edu>
>>>>>>>>>>>>> To: gluster-users at gluster.org
>>>>>>>>>>>>> Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai,
>>>>>>>>>>>>> Kolkata,
>>>>>>>>>>>>> Mumbai, New Delhi
>>>>>>>>>>>>> Subject: [Gluster-users] gluster local vs local = gluster x4
>>>>>>>>>>>>> slower
>>>>>>>>>>>>>
>>>>>>>>>>>>> This test is pretty easy to replicate anywhere- only takes 1
>>>>>>>>>>>>> disk,
>>>>>>>>>>>>> one
>>>>>>>>>>>>> machine, one tarball.  Untarring to local disk directly vs 
>>>>>>>>>>>>> thru
>>>>>>>>>>>>> gluster
>>>>>>>>>>>>> is about 4.5x faster.  At first I thought this may be due 
>>>>>>>>>>>>> to a
>>>>>>>>>>>>> slow
>>>>>>>>>>>>> host
>>>>>>>>>>>>> (Opteron 2.4ghz).  But it's not- same configuration, on a 
>>>>>>>>>>>>> much
>>>>>>>>>>>>> faster
>>>>>>>>>>>>> machine (dual 3.33ghz Xeon) yields the performance below.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK THRU GLUSTER####
>>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>
>>>>>>>>>>>>> real    0m41.290s
>>>>>>>>>>>>> user    0m14.246s
>>>>>>>>>>>>> sys     0m2.957s
>>>>>>>>>>>>>
>>>>>>>>>>>>> ####THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER)####
>>>>>>>>>>>>> [root at ac33 jenos]# cd /export/jenos/
>>>>>>>>>>>>> [root at ac33 jenos]# time tar xzf
>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>>
>>>>>>>>>>>>> real    0m8.983s
>>>>>>>>>>>>> user    0m6.857s
>>>>>>>>>>>>> sys     0m1.844s
>>>>>>>>>>>>>
>>>>>>>>>>>>> ####THESE ARE TEST FILE DETAILS####
>>>>>>>>>>>>> [root at ac33 jenos]# tar tzvf
>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz  |wc -l
>>>>>>>>>>>>> 109
>>>>>>>>>>>>> [root at ac33 jenos]# ls -l
>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>> -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32
>>>>>>>>>>>>> /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz
>>>>>>>>>>>>> [root at ac33 jenos]#
>>>>>>>>>>>>>
>>>>>>>>>>>>> These are the relevant performance options I'm using in my 
>>>>>>>>>>>>> .vol
>>>>>>>>>>>>> file:
>>>>>>>>>>>>>
>>>>>>>>>>>>> #------------Performance Options-------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> volume readahead
>>>>>>>>>>>>>      type performance/read-ahead
>>>>>>>>>>>>>      option page-count 4           # 2 is default option
>>>>>>>>>>>>>      option force-atime-update off # default is off
>>>>>>>>>>>>>      subvolumes ghome
>>>>>>>>>>>>> end-volume
>>>>>>>>>>>>>
>>>>>>>>>>>>> volume writebehind
>>>>>>>>>>>>>      type performance/write-behind
>>>>>>>>>>>>>      option cache-size 1MB
>>>>>>>>>>>>>      subvolumes readahead
>>>>>>>>>>>>> end-volume
>>>>>>>>>>>>>
>>>>>>>>>>>>> volume cache
>>>>>>>>>>>>>      type performance/io-cache
>>>>>>>>>>>>>      option cache-size 1GB
>>>>>>>>>>>>>      subvolumes writebehind
>>>>>>>>>>>>> end-volume
>>>>>>>>>>>>>
>>>>>>>>>>>>> What can I do to improve gluster's performance?
>>>>>>>>>>>>>
>>>>>>>>>>>>>        Jeremy
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>
>>>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>

-- 
Steven Truelove
Array Systems Computing, Inc.
1120 Finch Avenue West, 7th Floor
Toronto, Ontario
M3J 3H7
CANADA
http://www.array.ca
truelove at array.ca
Phone: (416) 736-0900 x307
Fax: (416) 736-4715




More information about the Gluster-users mailing list