[Gluster-users] 4 node replica 2 crash

rickytato rickytato rickytato at r2consulting.it
Wed Jan 12 17:12:02 UTC 2011


This is stack trace I found syslog:

Jan 10 18:08:24 www3 kernel: [2773721.043130] INFO: task nginx:22664 blocked
for more than 120 seconds.
Jan 10 18:08:24 www3 kernel: [2773721.043152] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 18:08:24 www3 kernel: [2773721.043176] nginx         D
00000001108733fc     0 22664   3107 0x00000004
Jan 10 18:08:24 www3 kernel: [2773721.043179]  ffff880058d7db68
0000000000000082 ffff880000000000 0000000000015980
Jan 10 18:08:24 www3 kernel: [2773721.043181]  ffff880058d7dfd8
0000000000015980 ffff880058d7dfd8 ffff880018492dc0
Jan 10 18:08:24 www3 kernel: [2773721.043184]  0000000000015980
0000000000015980 ffff880058d7dfd8 0000000000015980
Jan 10 18:08:24 www3 kernel: [2773721.043186] Call Trace:
Jan 10 18:08:24 www3 kernel: [2773721.043192]  [<ffffffff81241525>]
request_wait_answer+0x85/0x240
Jan 10 18:08:24 www3 kernel: [2773721.043196]  [<ffffffff8107f050>] ?
autoremove_wake_function+0x0/0x40
Jan 10 18:08:24 www3 kernel: [2773721.043199]  [<ffffffff8124175c>]
fuse_request_send+0x7c/0x90
Jan 10 18:08:24 www3 kernel: [2773721.043202]  [<ffffffff81243799>]
fuse_dentry_revalidate+0x179/0x2b0
Jan 10 18:08:24 www3 kernel: [2773721.043204]  [<ffffffff8115f414>]
do_lookup+0x84/0x280
Jan 10 18:08:24 www3 kernel: [2773721.043206]  [<ffffffff8115febe>]
link_path_walk+0x12e/0xab0
Jan 10 18:08:24 www3 kernel: [2773721.043208]  [<ffffffff81161b63>]
do_filp_open+0x143/0x660
Jan 10 18:08:24 www3 kernel: [2773721.043212]  [<ffffffff81036db9>] ?
default_spin_lock_flags+0x9/0x10
Jan 10 18:08:24 www3 kernel: [2773721.043216]  [<ffffffff8149a321>] ?
sys_recvfrom+0xe1/0x170
Jan 10 18:08:24 www3 kernel: [2773721.043220]  [<ffffffff8159f01e>] ?
_raw_spin_lock+0xe/0x20
Jan 10 18:08:24 www3 kernel: [2773721.043222]  [<ffffffff8116d22a>] ?
alloc_fd+0x10a/0x150
Jan 10 18:08:24 www3 kernel: [2773721.043226]  [<ffffffff811513e9>]
do_sys_open+0x69/0x170
Jan 10 18:08:24 www3 kernel: [2773721.043229]  [<ffffffff81151530>]
sys_open+0x20/0x30
Jan 10 18:08:24 www3 kernel: [2773721.043232]  [<ffffffff8100a0f2>]
system_call_fastpath+0x16/0x1b


2011/1/12 rickytato rickytato <rickytato at r2consulting.it>

> Some other info:
> S.O. Ubuntu 10.10 64bit
> GlusterFS compiled from source
>
> Client and server are the same machine; the machine are simple webserver
> with Nginx + PHP-FPM and only one directory for static contents are exported
> by GlusterFS; the PHP core are only local.
>
> The server are 2 NIC 1GBit in bonding.
>
> Other?
>
> The very strange problem is that only after about 4 hours to add new node
> Nginx stop to response.
>
> Any suggestions?
>
>
> rr
>
> 2011/1/11 rickytato rickytato <rickytato at r2consulting.it>
>
> Hi,
>> I'm using for about 4 weeks a simple 2 node replica 2 cluster; I'm
>> using glusterfs 3.1.1 built on Dec  9 2010 15:41:32 Repository revision:
>> v3.1.1 .
>> I use it to provide images trought Nginx.
>> All works well.
>>
>> Today i've added 2 new brick, and rebalance volume. For about 4 hours
>> work, after the Nginx hang; i've rebooted all server but nothings to do.
>>
>> When I removed two brick all returns ok (I've manually copied file from
>> "old" brick to the original).
>>
>>
>> What's wrong?
>>
>
>


More information about the Gluster-users mailing list