[Gluster-users] Geo-rep failing

Csaba Henk csaba at gluster.com
Thu Jun 30 14:27:09 UTC 2011


It seems that the connection gets dropped (or not even able to
establish). Is the ssh auth set up properly from the second volume?

Csaba

On Thu, Jun 30, 2011 at 4:22 PM, Adrian Carpenter <tac12 at wbic.cam.ac.uk> wrote:
> Hi Csaba,
>
> I'm now seeing consistent errors with a second volume:
>
> [2011-06-30 06:08:48.299174] I [monitor(monitor):19:set_state] Monitor: new state: OK
> [2011-06-30 09:27:46.875745] E [syncdutils:131:exception] <top>: FAIL:
> Traceback (most recent call last):
>  File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
>    tf(*aa)
>  File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen
>    rid, exc, res = recv(self.inf)
>  File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv
>    return pickle.load(inf)
> EOFError
> [2011-06-30 09:27:58.413588] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
> [2011-06-30 09:27:58.413830] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
> [2011-06-30 09:27:58.479687] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:user-volume -> file:///geo-tank/user-volume
> [2011-06-30 09:28:03.963303] I [master:181:crawl] GMaster: new master is a747062e-1caa-4cb3-9f86-34d03486a842
> [2011-06-30 09:28:03.963587] I [master:187:crawl] GMaster: primary master with volume id a747062e-1caa-4cb3-9f86-34d03486a842 ...
> [2011-06-30 09:34:35.592005] E [syncdutils:131:exception] <top>: FAIL:
> Traceback (most recent call last):
>  File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
>    tf(*aa)
>  File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen
>    rid, exc, res = recv(self.inf)
>  File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv
>    return pickle.load(inf)
> EOFError
> [2011-06-30 09:34:45.595258] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
> [2011-06-30 09:34:45.595668] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
> [2011-06-30 09:34:45.661334] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:user-volume -> file:///geo-tank/user-volume
> [2011-06-30 09:34:51.145607] I [master:181:crawl] GMaster: new master is a747062e-1caa-4cb3-9f86-34d03486a842
> [2011-06-30 09:34:51.145898] I [master:187:crawl] GMaster: primary master with volume id a747062e-1caa-4cb3-9f86-34d03486a842 ...
> [2011-06-30 12:35:54.394453] E [syncdutils:131:exception] <top>: FAIL:
> Traceback (most recent call last):
>  File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
>    tf(*aa)
>  File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen
>    rid, exc, res = recv(self.inf)
>  File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv
>    return pickle.load(inf)
> UnpicklingError: invalid load key, 'ÔøΩ'.
> [2011-06-30 12:36:05.839510] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------
> [2011-06-30 12:36:05.839916] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
> [2011-06-30 12:36:05.905232] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:user-volume -> file:///geo-tank/user-volume
> [2011-06-30 12:36:11.413764] I [master:181:crawl] GMaster: new master is a747062e-1caa-4cb3-9f86-34d03486a842
> [2011-06-30 12:36:11.414047] I [master:187:crawl] GMaster: primary master with volume id a747062e-1caa-4cb3-9f86-34d03486a842 ...
>
>
> Adrian
> On 28 Jun 2011, at 11:16, Csaba Henk wrote:
>
>> Hi Adrian,
>>
>>
>> On Tue, Jun 28, 2011 at 12:04 PM, Adrian Carpenter <tac12 at wbic.cam.ac.uk> wrote:
>>> Thanks Csaba,
>>>
>>> So far as I am aware nothing tampered with the xattrs,  and all the bricks etc are time synchronised.  Anyway I did as you suggest,  now for one volume  (I have three being geo-rep'd) I consistently get this:
>>>
>>> OSError: [Errno 12] Cannot allocate memory
>>
>> do you get this consistently, or randomly-but-recurring, or spotted
>> once/a few times then gone?
>>
>>> File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 26, in _query_xattr
>>  cls.raise_oserr()
>>> File "/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 16, in raise_oserr
>>  raise OSError(errn, os.strerror(errn))
>>> OSError: [Errno 12] Cannot allocate memory
>>
>> If seen more than once, how much does the stack trace vary? Exactly
>> the same, or not exactly but crashes in the same function (just on a
>> different code path), or not exactly but at least in libcxattr module,
>> or quite different?
>>
>> What python version do you use? If you use python 2.4.*, with external
>> ctypes, then what source you've taken ctypes from, what version?
>>
>> Thanks,
>> Csaba
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>



More information about the Gluster-users mailing list