[Gluster-users] Fwd: files not syncing up with glusterfs 3.1.2

Mon Feb 21 19:06:13 UTC 2011

thanks for all responding - at least i know i'm not all alone.  i am shocked
to think that so many on this list are having serious fundamental issues
with glusterfs - and seemingly for a long time.  so, without wanting to
troll - my  question is: "is gluster a serious stable general purpose file
system"?  or, is it more a good "caching system" for a specific narrow
domain?

i'd really like to hear from any official gluster people out there - right
now the silence is deafening.  is this issue know? is it viewed a serious?
it is being worked on? i'm all up for volunteering to help by sending in a
test case, sending logs - whatever is asked of me.  i want to believe
gluster is going to work - as do many other sys-admins i know of in the
post/film industry.  however, i'm rapidly loosing confidence in gluster with
each passing day of silence...

in hope - paul

On Mon, Feb 21, 2011 at 6:47 PM, Joe Landman <
landman at scalableinformatics.com> wrote:

> On 02/21/2011 01:39 PM, Kon Wilms wrote:
>
>> On Mon, Feb 21, 2011 at 9:45 AM, Steve Wilson<stevew at purdue.edu>  wrote:
>>
>>> We had trouble with reliability for small, actively-accessed files on a
>>> distribute-replicate volume in both GlusterFS 3.11 and 3.12.  It seems
>>> that
>>> the replicated servers would eventually get out of sync with each other
>>> on
>>> these kinds of files.  For a while, we dropped replication and only ran
>>> the
>>> volume as distributed.  This has worked reliably for the past week or so
>>> without any errors that we were seeing before: no such file, invalid
>>> argument, etc.
>>>
>>
>> I'm running thousands of small files over NFSv3 through NGINX with
>> distribute and have had the opposite experience. Unfortunately when
>> NGINX can't access a file over NFS it means a customer calling us, so
>> right now gluster is basically sitting idle (posted my output to the
>> list a while back with no response).
>>
>
> We've had lots of issues with files disappearing or being inaccessible
> prior to 3.1.2 with the NFS client and server translator.  After 3.1.2, many
> of these problems *seem* to have been resolved, though all this means in
> this instance is that the customer hasn't submitted a ticket yet.
>
> I had thought it was originally a timebase issue ... as we had a minute or
> two drift on some of the nodes (since fixed).  But we had a pretty
> consistent error in this regard.
>
> We did open problem reports.  Unfortunately, no action so far (they just
> closed them this morning, though nothing has been solved per se, the issue
> simply has not yet resurfaced).  I'll leave those reports closed for now.
>
> This said, this error, or one with a very similar signature, has been in
> the code since the 2.x series.  I really ... really want to track it down,
> but I can't create a simple replicator for it to present to the team.  If
> you have what you think is a simple replicator, please, email me offline.
>  We'll try it here, and if we can get it down to a very simple replication
> case and test, we'll re-open the bugs.
>
> I'd hate to think its a heisenbug, but that is where I am leaning now.
>
>
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics Inc.
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
>       http://scalableinformatics.com/sicluster
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>