<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi,<br>
<br>
On 03/03/2016 11:14 AM, ABHISHEK PALIWAL wrote:<br>
</div>
<blockquote
cite="mid:CA+15cFNH+218-4AUag=bW2GRDQ8jbd+G6GiuFsZFAXNkjaw3hA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>Hi Ravi,<br>
<br>
</div>
As I discussed earlier this issue, I investigated this
issue and find that healing is not triggered because
the "gluster volume heal c_glusterfs info split-brain"
command not showing any entries as a outcome of this
command even though the file in split brain case.<br>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Couple of observations from the 'commands_output' file.<br>
<br>
getfattr -d -m . -e hex
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml<br>
The afr xattrs do not indicate that the file is in split brain:<br>
# file:
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml<br>
trusted.afr.c_glusterfs-client-1=0x000000000000000000000000<br>
trusted.afr.dirty=0x000000000000000000000000<br>
trusted.bit-rot.version=0x000000000000000b56d6dd1d000ec7a9<br>
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae<br>
<br>
<br>
<br>
getfattr -d -m . -e hex
opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml<br>
trusted.afr.c_glusterfs-client-0=0x000000080000000000000000<br>
trusted.afr.c_glusterfs-client-2=0x000000020000000000000000<br>
trusted.afr.c_glusterfs-client-4=0x000000020000000000000000<br>
trusted.afr.c_glusterfs-client-6=0x000000020000000000000000<br>
trusted.afr.dirty=0x000000000000000000000000<br>
trusted.bit-rot.version=0x000000000000000b56d6dcb7000c87e7<br>
trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae<br>
<br>
1. There doesn't seem to be a split-brain going by the trusted.afr*
xattrs.<br>
2. You seem to have re-used the bricks from another volume/setup.
For replica 2, only trusted.afr.c_glusterfs-client-0 and
trusted.afr.c_glusterfs-client-1 must be present but I see 4 xattrs
- client-0,2,4 and 6<br>
3. On the rebooted node, do you have ssl enabled by any chance?
There is a bug for "Not able to fetch volfile' when ssl is enabled:
<a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1258931">https://bugzilla.redhat.com/show_bug.cgi?id=1258931</a><br>
<br>
Btw, you for data and metadata split-brains you can use the gluster
CLIÂ
<a class="moz-txt-link-freetext" href="https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md">https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md</a>
instead of modifying the file from the back end.<br>
<br>
-Ravi<br>
<blockquote
cite="mid:CA+15cFNH+218-4AUag=bW2GRDQ8jbd+G6GiuFsZFAXNkjaw3hA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div><br>
</div>
So, what I have done I manually deleted the gfid entry
of that file from .glusterfs directory and follow the
instruction mentioned in the following link to do heal<br>
<br>
<a moz-do-not-send="true"
href="https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md">https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md</a><br>
<br>
</div>
and this works fine for me.<br>
<br>
</div>
But my question is why the split-brain command not showing
any file in output.<br>
<br>
</div>
<div>Here I am attaching all the log which I get from the node
for you and also the output of commands from both of the
boards<br>
<br>
</div>
<div>In this tar file two directories are present <br>
<br>
</div>
<div>000300 - log for the board which is running continuously<br>
</div>
<div>002500-Â log for the board which is rebooted <br>
<br>
</div>
<div>I am waiting for your reply please help me out on this
issue.<br>
<br>
</div>
<div>Thanks in advanced.<br>
</div>
<div><br>
</div>
Regards,<br>
</div>
Abhishek<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Feb 26, 2016 at 1:21 PM,
ABHISHEK PALIWAL <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:abhishpaliwal@gmail.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:abhishpaliwal@gmail.com">abhishpaliwal@gmail.com</a></a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><span class="">On Fri, Feb 26,
2016 at 10:28 AM, Ravishankar N <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:ravishankar@redhat.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:ravishankar@redhat.com">ravishankar@redhat.com</a></a>></span>
wrote:<br>
</span>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span class="">
<div>On 02/26/2016 10:10 AM, ABHISHEK PALIWAL
wrote:<br>
</div>
<blockquote type="cite">
<p dir="ltr">Yes correct</p>
</blockquote>
<br>
Okay, so when you say the files are not in sync
until some time, are you getting stale data when
accessing from the mount?<br>
I'm not able to figure out why heal info shows
zero when the files are not in sync, despite all
IO happening from the mounts. Could you provide
the output of getfattr -d -m . -e hex
/brick/file-name from both bricks when you hit
this issue?</span>
<div>
<div><br>
</div>
<div>I'll provide the logs once I get. here
delay means we are powering on the second
board after the 10 minutes.<br>
</div>
<div>
<div class="h5">
<div> <br>
<br>
<blockquote type="cite">
<div class="gmail_quote">On Feb 26, 2016
9:57 AM, "Ravishankar N" <<a
moz-do-not-send="true"
href="mailto:ravishankar@redhat.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:ravishankar@redhat.com">ravishankar@redhat.com</a></a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF">
<div>Hello,<br>
<br>
On 02/26/2016 08:29 AM, ABHISHEK
PALIWAL wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>Hi Ravi,<br>
<br>
</div>
Thanks for the
response.<br>
<br>
</div>
We are using
Glugsterfs-3.7.8<br>
<br>
Here is the use
case:<br>
<br>
<span
style="color:rgb(0,0,0)">We
have a logging
file which saves
logs of the events
for every board of
a node and these
files are in sync
using glusterfs.
System in replica
2 mode it means <span>When
one brick in a
replicated
volume goes
offline, the
glusterd daemons
on the other
nodes keep track
of all the files
that are not
replicated to
the offline
brick. When the
offline brick
becomes
available again,
the cluster
initiates a
healing process,
replicating the
updated files to
that brick. </span>But
in our casse, we
see that log file
of one board is
not in the sync
and its format is
corrupted means
files are not in
sync.</span><br>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Just to understand you correctly,
you have mounted the 2 node
replica-2 volume on both these
nodes and writing to a logging
file from the mounts right? <br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div><br>
</div>
Even the outcome of <span><span>#gluster
volume heal
c_glusterfs info
shows that there
is no pending
heals.<br>
<br>
</span></span><span><span>Also
, The logging file
which is updated
is of fixed size
and the new
entries will be
wrapped
,overwriting the
old entries.<br>
<br>
This way we have
seen that after
few restarts , the
contents of the
same file on two
bricks are
different , but
the volume heal
info shows zero
entries<br>
<br>
</span></span></div>
<span><span>Solution:<br>
<br>
</span></span></div>
<span><span>But when we
tried to put delay </span></span><span><span><span><span>
> 5 min</span></span>
before the healing
everything is working
fine.<br>
<br>
</span></span></div>
<span><span>Regards,<br>
</span></span></div>
<span><span>Abhishek<br>
</span></span> </div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On
Fri, Feb 26, 2016 at 6:35
AM, Ravishankar N <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:ravishankar@redhat.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:ravishankar@redhat.com">ravishankar@redhat.com</a></a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF"><span>
<div>On 02/25/2016
06:01 PM, ABHISHEK
PALIWAL wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>Hi,<br>
<br>
</div>
Here, I have
one query
regarding the
time taken by
the healing
process.<br>
</div>
In current two
node setup
when we
rebooted one
node then the
self-healing
process starts
less than 5min
interval on
the board
which
resulting the
corruption of
the some files
data.<br>
</div>
</div>
</div>
</blockquote>
<br>
</span> Heal should
start immediately after
the brick process comes
up. What version of
gluster are you using?
What do you mean by
corruption of data?
Also, how did you
observe that the heal
started after 5 minutes?<br>
-Ravi<br>
<blockquote type="cite"><span>
<div dir="ltr">
<div>
<div><br>
</div>
And to resolve
it I have search
on google and
found the
following link:<br>
<a
moz-do-not-send="true"
href="https://support.rackspace.com/how-to/glusterfs-troubleshooting/"
target="_blank"><a class="moz-txt-link-freetext" href="https://support.rackspace.com/how-to/glusterfs-troubleshooting/">https://support.rackspace.com/how-to/glusterfs-troubleshooting/</a></a><br>
<br>
</div>
<div>Mentioning
that the healing
process can
takes upto 10min
of time to start
this process.<br>
<br>
</div>
<div>Here is the
statement from
the link:<br>
<br>
"Healing
replicated
volumes <br>
<br>
When any brick
in a replicated
volume goes
offline, the
glusterd daemons
on the remaining
nodes keep track
of all the files
that are not
replicated to
the offline
brick. When the
offline brick
becomes
available again,
the cluster
initiates a
healing process,
replicating the
updated files to
that brick. <b>The
start of this
process can
take up to 10
minutes, based
on
observation.</b>"
<br>
</div>
<div><br>
</div>
<div>After giving
the time of more
than 5 min file
corruption
problem has been
resolved.<br>
</div>
<div><br>
</div>
<div>So, Here my
question is
there any way
through which we
can reduce the
time taken by
the healing
process to
start?<br>
<br>
</div>
<br>
Regards,<br>
Abhishek Paliwal<br
clear="all">
<br>
<br>
</div>
<br>
<fieldset></fieldset>
<br>
</span>
<pre>_______________________________________________
Gluster-devel mailing list
<a moz-do-not-send="true" href="mailto:Gluster-devel@gluster.org" target="_blank">Gluster-devel@gluster.org</a>
<a moz-do-not-send="true" href="http://www.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://www.gluster.org/mailman/listinfo/gluster-devel</a></pre>
</blockquote>
<br>
<br>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
<div>
<div dir="ltr"><br>
<br>
<br>
<br>
Regards<br>
Abhishek Paliwal<br>
</div>
</div>
</div>
</blockquote>
<br>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
<br>
</div>
</div>
</div>
</div>
</div>
<span class="HOEnZb"><font color="#888888">
</font></span></blockquote>
</div>
<span class="HOEnZb"><font color="#888888"><br>
<br clear="all">
<br>
-- <br>
<div>
<div dir="ltr"><br>
<br>
<br>
<br>
Regards<br>
Abhishek Paliwal<br>
</div>
</div>
</font></span></div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
<div class="gmail_signature">
<div dir="ltr"><br>
<br>
<br>
<br>
Regards<br>
Abhishek Paliwal<br>
</div>
</div>
</div>
</blockquote>
<br>
<br>
</body>
</html>