[Gluster-users] Problem no. 1

Keith Freedman freedman at FreeFormIT.com
Wed Jan 14 08:08:18 PST 2009

At 05:16 AM 1/14/2009, artur.k wrote:
>I have 6 www servers with lighttpd. Gluster resource is mounted on 
>those servers. 2 gluster servers are using AFR. Everything works 
>great until one of the gluster servers goes down. When this happens 
>everything works fine using one glusterfs server but when the other 
>one goes back on-line then after a few hours gluster starts working 
>slowly for 20 - 30 minutes. After that time period everything starts 
>to work normally however lighttpd tends to have problems when files 
>are not available to it "fast enough" (which happens during the 20 - 
>30 minutes time period after the second gluster servers is back). 
>Lighttpd simply shows HTTP 500 when it cannot access the file during 
>a certain time frame. What is problem ?

During this time, most likely, gluster is auto-healing the server 
that was down.
Unfortunately, it seems, the process for it doing so has changed in 2.0.
I guess it's more robust, but it's also more time consuming.

Previously, files were only healed when you accessed that file.  now, 
it seems files are healed when you access a directory.

So---- when lighthttp accesses a file x in directory Y,
gluster not only auto-heals file x, but also ALL the other files in Y.
It blocks the IO request until it's healed the entire directory.
This is the safest thing, but what it should do is heal the file we 
need, return back to the application, then continue auto-healing the 
rest of the files.

I've no idea if they're going to change this or not (or if it's too 
difficult), but it is kind of a pain having processes sit waiting 
while unrelated files are being dealt with.

>glusterfs 2.0.0qa1 built on Jan  9 2009 14:14:17
>Repository revision: glusterfs--mainline--3.0--patch-840
>Gluster-users mailing list
>Gluster-users at gluster.org

More information about the Gluster-users mailing list