[Gluster-users] Recovery of a split brain senario

Niclas Hughes Niclas.Hughes at securetrading.com
Wed Feb 9 11:40:24 UTC 2011


This has been tested using the most recent build of 3.1.2 (built Jan 18 2011 11:19:54)
System setup:
Volume Name: brick
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: linguest2:/data/exp
Brick2: linguest3:/data/exp
Brick3: linguest4:/data/exp
Brick4: linguest5:/data/exp
If you have a split brain situation and file1.txt with contents "content from split 1" is copied to the left side split and file1.txt with contents "contents from split 2" is copied to the right side split then the split is recovered the files are left on the machines that they were copied to. (which is fine as gluster have already said that the new version 3.1.2 does not cope with split brain anymore).  But if you go and read the file on either of the machines you get the log:
[2011-02-09 09:36:15.432679] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] brick-replicate-0: background  data self-heal completed on /file1.txt

This log continues every time you access the file.
Then to try and fix it i changed the file1.txt and copied that file to the machine that would have been in the left side split, my expectations are that this would just replicate out to all the machines and override the file.
But all that happened was that the file1.txt was on the machine and not replicated out, also the date of the file had been changed to 1970-01-01 ?
I have also run a rebalance which did nothing to fix this issue.
I now have two machines that are inconsistent and cannot see how to fix this or how I would get a monitoring system to monitor this because there are no errors in the log as "data self-heal completed" can happen on files in different scenarios.

Thanks
Nick


More information about the Gluster-users mailing list