[Gluster-users] very bad performance on small files

Pan, Henry Henry.Pan at ironmountain.com
Sat Jan 15 15:31:05 UTC 2011


Hello Gluster Gurus,

I'm trying to find out what performance data you could get while trying eDiscovery searching application in a namespace with over 3 billins small files on GlusterFS...

Thanks & Good w/e

Henry PAN
Sr. Data Storage Eng/Adm
Iron Mountain
650-962-6184 (o)
650-930-6544 (c)
Henry.pan at ironmountain.com


-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of gluster-users-request at gluster.org
Sent: Saturday, January 15, 2011 1:20 AM
To: gluster-users at gluster.org
Subject: Gluster-very bad performance on small files

Send Gluster-users mailing list submissions to
        gluster-users at gluster.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
or, via email, send a message with subject or body 'help' to
        gluster-users-request at gluster.org

You can reach the person managing the list at
        gluster-users-owner at gluster.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Gluster-users digest..."


Today's Topics:

   1. Re: very bad performance on small files (Marcus Bointon)
   2. Re: very bad performance on small files (Joe Landman)
   3. Re: very bad performance on small files (Max Ivanov)
   4. Re: very bad performance on small files (Joe Landman)
   5. Re: very bad performance on small files (Marcus Bointon)
   6. Re: very bad performance on small files (Joe Landman)
   7. Re: very bad performance on small files (Max Ivanov)
   8. Re: very bad performance on small files (Rudi Ahlers)


----------------------------------------------------------------------

Message: 1
Date: Fri, 14 Jan 2011 22:50:37 +0100
From: Marcus Bointon <marcus at synchromedia.co.uk>
Subject: Re: [Gluster-users] very bad performance on small files
To: Gluster General Discussion List <gluster-users at gluster.org>
Message-ID: <C438BF2F-7B15-497B-BA0A-60E1311F43D4 at synchromedia.co.uk>
Content-Type: text/plain; charset=us-ascii

On 14 Jan 2011, at 18:58, Jacob Shucart wrote:

> This kind of thing is fine on local disks, but when you're talking about a
> distributed filesystem the network latency starts to add up since 1
> request to the web server results in a bunch of file requests.

I think the main objection is that it takes a huge amount of network latency to explain a > 1,500% overhead with only 2 machines.

On 14 Jan 2011, at 15:20, Joe Landman wrote:

> MB size or larger

So does gluster become faster abruptly when file sizes cross some threshold? Or are average speeds are proportional to file size? Would be good to see a wider spread of values on benchmarks of throughput vs file size for the same overall volume (like Max's data but with more intermediate values)

Marcus

------------------------------

Message: 2
Date: Fri, 14 Jan 2011 17:12:01 -0500
From: Joe Landman <landman at scalableinformatics.com>
Subject: Re: [Gluster-users] very bad performance on small files
To: gluster-users at gluster.org
Message-ID: <4D30CA31.9060001 at scalableinformatics.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 01/14/2011 04:50 PM, Marcus Bointon wrote:
> On 14 Jan 2011, at 18:58, Jacob Shucart wrote:
>
>> This kind of thing is fine on local disks, but when you're talking
>> about a distributed filesystem the network latency starts to add up
>> since 1 request to the web server results in a bunch of file
>> requests.
>
> I think the main objection is that it takes a huge amount of network
> latency to explain a>  1,500% overhead with only 2 machines.

If most of your file access times are dominated by latency (e.g. small,
seeky like loads), and you are going over a gigabit connection, yeah,
your performance is going to crater on any cluster file system.

Local latency to traverse the storage stack is on the order of 10's of
microseconds.  Physical latency of the disk medium is on the order of
10's of microseconds for RAMdisk, 100's of microseconds for flash/ssd,
and 1000's of microseconds (e.g. milliseconds) for spinning rust.

Now take 1 million small file writes.  Say 1024 bytes.  These million
writes have to traverse the storage stack in the kernel to get to disk.

Now add in a network latency event on the order of 1000's of
microseconds for the remote storage stack and network stack to respond.

I haven't measured it yet in a methodical manner, but I wouldn't be
surprised to see IOP rates within a factor of 2 of the bare metal for a
sufficiently fast network such as Infiniband, and within a factor of 4
or 5 for a slow network like Gigabit.

Our own experience has been generally that you are IOP constrained
because of the stack you have to traverse.  If you add more latency into
this stack, you have more to traverse, and therefore, you have more you
need to wait.  Which will have a magnification effect upon times for
small IO ops which are seeky (stat, small writes, random ops).

>
> On 14 Jan 2011, at 15:20, Joe Landman wrote:
>
>> MB size or larger
>
> So does gluster become faster abruptly when file sizes cross some
> threshold? Or are average speeds are proportional to file size? Would

Its a continuous curve, and very much user load specific.  The fewer
seeky operations you can do the better (true of all cluster file systems).

> be good to see a wider spread of values on benchmarks of throughput
> vs file size for the same overall volume (like Max's data but with
> more intermediate values)

I haven't seen Max's data, so I can't comment on this.  Understand that
performance is going to be bound by many things.  One of many things is
the speed of the spinning disk if thats what you use.  Another will be
network.



--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


------------------------------

Message: 3
Date: Fri, 14 Jan 2011 22:19:58 +0000
From: Max Ivanov <ivanov.maxim at gmail.com>
Subject: Re: [Gluster-users] very bad performance on small files
To: gluster-users at gluster.org
Message-ID:
        <AANLkTi=u6Ycfb_sTWSphGLv2J+9HJMNmf80Zgs84b0fy at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

> I haven't seen Max's data, so I can't comment on this.  Understand that
> performance is going to be bound by many things.  One of many things is the
> speed of the spinning disk if thats what you use.  Another will be network.
>

It is very similair to kernel source tree - tons of small (2-20kb)
files. 1.1G in total.


------------------------------

Message: 4
Date: Fri, 14 Jan 2011 17:20:58 -0500
From: Joe Landman <landman at scalableinformatics.com>
Subject: Re: [Gluster-users] very bad performance on small files
To: gluster-users at gluster.org
Message-ID: <4D30CC4A.2010907 at scalableinformatics.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 01/14/2011 05:19 PM, Max Ivanov wrote:
>> I haven't seen Max's data, so I can't comment on this.  Understand that
>> performance is going to be bound by many things.  One of many things is the
>> speed of the spinning disk if thats what you use.  Another will be network.
>>
>
> It is very similair to kernel source tree - tons of small (2-20kb)
> files. 1.1G in total.

Ok, worth looking into

> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


------------------------------

Message: 5
Date: Sat, 15 Jan 2011 00:26:53 +0100
From: Marcus Bointon <marcus at synchromedia.co.uk>
Subject: Re: [Gluster-users] very bad performance on small files
To: Gluster General Discussion List <gluster-users at gluster.org>
Message-ID: <2D8604F3-AED5-4C30-AE15-A798D4775765 at synchromedia.co.uk>
Content-Type: text/plain; charset=us-ascii

On 14 Jan 2011, at 23:12, Joe Landman wrote:

> If most of your file access times are dominated by latency (e.g. small, seeky like loads), and you are going over a gigabit connection, yeah, your performance is going to crater on any cluster file system.
>
> Local latency to traverse the storage stack is on the order of 10's of microseconds.  Physical latency of the disk medium is on the order of 10's of microseconds for RAMdisk, 100's of microseconds for flash/ssd, and 1000's of microseconds (e.g. milliseconds) for spinning rust.
>
> Now take 1 million small file writes.  Say 1024 bytes.  These million writes have to traverse the storage stack in the kernel to get to disk.
>
> Now add in a network latency event on the order of 1000's of microseconds for the remote storage stack and network stack to respond.
>
> I haven't measured it yet in a methodical manner, but I wouldn't be surprised to see IOP rates within a factor of 2 of the bare metal for a sufficiently fast network such as Infiniband, and within a factor of 4 or 5 for a slow network like Gigabit.
>
> Our own experience has been generally that you are IOP constrained because of the stack you have to traverse.  If you add more latency into this stack, you have more to traverse, and therefore, you have more you need to wait.  Which will have a magnification effect upon times for small IO ops which are seeky (stat, small writes, random ops).

Sure, and all that applies equally to both NFS and gluster, yet in Max's example NFS was ~50x faster than gluster for an identical small-file workload. So what's gluster doing over and above what NFS is doing that's taking so long, given that network and disk factors are equal? I'd buy a factor of 2 for replication, but not 50.

In case you missed what I'm on about, it was these stats that Max posted:

> Here is the results per command:
> dd if=/dev/zero of=M/tmp bs=1M count=16384 69.2 MB/se (Native) 69.2
> MB/sec(FUSE) 52 MB/sec (NFS)
> dd if=/dev/zero of=M/tmp bs=1K count=163840000  88.1 MB/sec  (Native)
> 1.1MB/sec (FUSE) 52.4 MB/sec (NFS)
> time tar cf - M | pv > /dev/null 15.8 MB/sec (native) 3.48MB/sec
> (FUSE) 254 Kb/sec (NFS)

In my case I'm running 30kiops SSDs over gigabit. At the moment my problem (running 3.0.6) isn't performance but reliability - files are occasionally reported as 'vanished' by front-end apps (like rsync) even though they are present on both backing stores; no errors in gluster logs, self-heal doesn't help.

Marcus

------------------------------

Message: 6
Date: Fri, 14 Jan 2011 18:51:39 -0500
From: Joe Landman <landman at scalableinformatics.com>
Subject: Re: [Gluster-users] very bad performance on small files
To: gluster-users at gluster.org
Message-ID: <4D30E18B.90500 at scalableinformatics.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 01/14/2011 06:26 PM, Marcus Bointon wrote:

>> Our own experience has been generally that you are IOP constrained
>> because of the stack you have to traverse.  If you add more latency
>> into this stack, you have more to traverse, and therefore, you have
>> more you need to wait.  Which will have a magnification effect upon
>> times for small IO ops which are seeky (stat, small writes, random
>> ops).
>
> Sure, and all that applies equally to both NFS and gluster, yet in
> Max's example NFS was ~50x faster than gluster for an identical
> small-file workload. So what's gluster doing over and above what NFS
> is doing that's taking so long, given that network and disk factors
> are equal? I'd buy a factor of 2 for replication, but not 50.

If the NFS was doing attribute caching and the GlusterFS implementation
had stat prefetch and other caching turned off, this could explain it.

> In case you missed what I'm on about, it was these stats that Max
> posted:
>
>> Here is the results per command: dd if=/dev/zero of=M/tmp bs=1M
>> count=16384 69.2 MB/se (Native) 69.2 MB/sec(FUSE) 52 MB/sec (NFS)
>> dd if=/dev/zero of=M/tmp bs=1K count=163840000  88.1 MB/sec
>> (Native) 1.1MB/sec (FUSE) 52.4 MB/sec (NFS) time tar cf - M | pv>
>> /dev/null 15.8 MB/sec (native) 3.48MB/sec (FUSE) 254 Kb/sec (NFS)

Ok, I am not sure if I saw the numbers before.  Thanks.

>
> In my case I'm running 30kiops SSDs over gigabit. At the moment my
> problem (running 3.0.6) isn't performance but reliability - files are
> occasionally reported as 'vanished' by front-end apps (like rsync)
> even though they are present on both backing stores; no errors in
> gluster logs, self-heal doesn't help.

Check your stat-prefetch settings, and your time base.  We've had some
strange issues that seem to be correlated with time bases drifting.
Including files disappearing.  We have a few open tickets on this.

The way we've worked around this problem is to abandon the NFS client
and use the glusterfs client.  Not our preferred option, but it provides
a workaround for the moment.  The NFS translator does appear to have a
few issues.  I am hoping we get more tuning knobs for it soon so we can
see if we can work around this.

Regards,

Joe

>
> Marcus _______________________________________________ Gluster-users
> mailing list Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


------------------------------

Message: 7
Date: Sat, 15 Jan 2011 00:30:15 +0000
From: Max Ivanov <ivanov.maxim at gmail.com>
Subject: Re: [Gluster-users] very bad performance on small files
To: Marcus Bointon <marcus at synchromedia.co.uk>
Cc: Gluster General Discussion List <gluster-users at gluster.org>
Message-ID:
        <AANLkTik+_j1fMW5u+EC9DMQLu2FgYSUk4ZJwqbs6U1Wr at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

> Sure, and all that applies equally to both NFS and gluster, yet in Max's example NFS was ~50x faster than gluster for an identical small-file workload. So what's gluster doing over and above what NFS is doing that's taking so long, given that network and disk factors are equal? I'd buy a factor of 2 for replication, but not 50.
>

Sorry If I didnt make it clear but both NFS in my tests is not well
known classic NFS but glusterfs in NFS mode.


------------------------------

Message: 8
Date: Sat, 15 Jan 2011 11:18:22 +0200
From: Rudi Ahlers <Rudi at SoftDux.com>
Subject: Re: [Gluster-users] very bad performance on small files
To: Jacob Shucart <jacob at gluster.com>
Cc: gluster-users at gluster.org
Message-ID:
        <sig.3996530d0f.AANLkTinY=zUbjGhto470YGTwhd_vzBb6fpj4-WE+m+B- at mail.gmail.com>

Content-Type: text/plain; charset=ISO-8859-1

On Fri, Jan 14, 2011 at 7:58 PM, Jacob Shucart <jacob at gluster.com> wrote:
> For web hosting it is best to put user generated content(images, etc) on
> Gluster but to leave application files like PHP files on the local disk.
> This is because a single application file request could result in 20 other
> file requests since applications like PHP use includes/inherits, etc.
> This kind of thing is fine on local disks, but when you're talking about a
> distributed filesystem the network latency starts to add up since 1
> request to the web server results in a bunch of file requests.
>
> -----Original Message-----
> From: gluster-users-bounces at gluster.org
> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Max Ivanov
> Sent: Friday, January 14, 2011 6:09 AM
> To: Burnash, James
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] very bad performance on small files
>
>> Gluster - and in fact most (all?) parallel filesystems are optimized for
> very large files. That being the case, small files are not retrieved as
> efficiently, and result in a larger number of file operations in total
> because there are a fixed number for each file accessed.
>
>
> Which makes glusterfs perfomance unacceptable for web hosting purposes =(
> _______________________________________________


So what can one use for webhosting purposes?

We use XEN / KVM virtual machines, hosted on NAS devices but the NAS
devices doesn't have an easy upgrade path. We literally have to rsync
all the data to the new device and then shutdown all the machines on
the old one and restart them on the new one. They don't provide  100%
uptime either. So I'm looking for something with easier upgrade
(GlusterFS can do this) and better uptime (again, GlusterFS can do
this).

But it's clear that GlusterFS isn't made for small files, so what else
could work well for us?
--
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532


------------------------------

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


End of Gluster-users Digest, Vol 33, Issue 23
*********************************************



The information contained in this email message and its attachments is intended only for the private and confidential use of the recipient(s) named above, unless the sender expressly agrees otherwise. Transmission of email over the Internet is not a secure communications medium. If you are requesting or have requested the transmittal of personal data, as defined in applicable privacy laws by means of email or in an attachment to email, you must select a more secure alternate means of transmittal that supports your obligations to protect such personal data. If the reader of this message is not the intended recipient and/or you have received this email in error, you must take no action based on the information in this email and you are hereby notified that any dissemination, misuse or copying or disclosure of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email and delete the original message. 




More information about the Gluster-users mailing list