<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<tt>Hi Geoffrey,</tt><tt><br>
</tt><tt><br>
</tt><tt>Please find my comments in-line.</tt><tt><br>
</tt><br>
<br>
<div class="moz-cite-prefix">On Saturday 01 August 2015 04:10 AM,
Geoffrey Letessier wrote:<br>
</div>
<blockquote cite="mid:B56E1012-97C2-47AC-9C71-15E1B813689D@cnrs.fr"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
Hello,
<div><br>
</div>
<div>As Krutika said, I resolved with success all split-brains
(more than 3450) appeared after the first data transfert from
one backup server to my new and fresh volume but… </div>
<div><br>
</div>
<div>The following step to validate my new volume was to enable
the quota on it; and now, more than one day after this
activation, all the results are still completely wrong:</div>
<div>Example:</div>
<div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;"># df -h /home/sterpone_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Filesystem Size Used
Avail Use% Mounted on</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ib-storage1:vol_home.tcp</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;"> 14T 3,3T
11T 24% /home</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;"># pdsh -w storage[1,3] du -sh
/export/brick_home/brick{1,2}/data/sterpone_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">storage3: 2,5T<span
class="Apple-tab-span" style="white-space:pre"> </span>/export/brick_home/brick1/data/sterpone_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">storage3: 2,4T<span
class="Apple-tab-span" style="white-space:pre"> </span>/export/brick_home/brick2/data/sterpone_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">storage1: 2,7T<span
class="Apple-tab-span" style="white-space:pre"> </span>/export/brick_home/brick1/data/sterpone_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">storage1: 2,4T<span
class="Apple-tab-span" style="white-space:pre"> </span>/export/brick_home/brick2/data/sterpone_team</span></div>
</div>
<div>As you can read, all data for this account is around 10TB and
quota displays only 3.3TB used.</div>
<div><br>
</div>
<div>Worse:</div>
<div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;"># pdsh -w storage[1,3] du -sh
/export/brick_home/brick{1,2}/data/baaden_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">storage3: 2,9T<span
class="Apple-tab-span" style="white-space:pre"> </span>/export/brick_home/brick1/data/baaden_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">storage3: 2,7T<span
class="Apple-tab-span" style="white-space:pre"> </span>/export/brick_home/brick2/data/baaden_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">storage1: 3,2T<span
class="Apple-tab-span" style="white-space:pre"> </span>/export/brick_home/brick1/data/baaden_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">storage1: 2,8T<span
class="Apple-tab-span" style="white-space:pre"> </span>/export/brick_home/brick2/data/baaden_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;"># df -h /home/baaden_team/</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">Filesystem Size Used
Avail Use% Mounted on</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">ib-storage1:vol_home.tcp</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;"> 20T 786G
20T 4% /home</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;"># gluster volume quota vol_home list
/baaden_team</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;"> Path
Hard-limit Soft-limit Used Available Soft-limit
exceeded? Hard-limit exceeded?</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">---------------------------------------------------------------------------------------------------------------------------</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">/baaden_team
20.0TB 80% 785.6GB 19.2TB No
No</span></div>
</div>
<div>This account is around 11.6TB and quota detects only 786GB
used…</div>
<div><br>
</div>
</blockquote>
<tt>As you mentioned below, some of the bricks were down. 'quota
list' will only show the aggregated value of online bricks, Could
you please check the 'quota list' when all the bricks are up and
running?</tt><tt><br>
I suspect quota initiate might not have completed because of brick
down.<br>
</tt><br>
<blockquote cite="mid:B56E1012-97C2-47AC-9C71-15E1B813689D@cnrs.fr"
type="cite">
<div>Can someone help me to fix it -knowing if I've previously
updated GlusterFS from 3.5.3 to 3.7.2 it was exactly to solve a
similar trouble… </div>
<div><br>
</div>
<div>For information, in quotad log file:</div>
<div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0); position: static;
z-index: auto;"><span style="font-size: 9px;">[2015-07-31
22:13:00.574361] I [MSGID: 114047]
[client-handshake.c:1225:client_setvolume_cbk]
0-vol_home-client-7: Server and Client lk-version numbers
are not same, reopening the fds</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">[2015-07-31 22:13:00.574507] I
[MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk]
0-vol_home-client-7: Server lk version = 1</span></div>
</div>
<div><br>
</div>
<div>is there any causal connection (client/server version
conflict)?</div>
<div><br>
</div>
<div>Here what i noticed on
my /var/log/glusterfs/quota-mount-vol_home.log file:</div>
<div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">… <same kind of lines></span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">[2015-07-31 21:26:15.247269] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_home-client-5:
changing port to 49162 (from 0)</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">[2015-07-31 21:26:15.250272] E
[socket.c:2332:socket_connect_finish] 0-vol_home-client-5:
connection to 10.0.4.2:49162 failed (Connexion refusée)</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">[2015-07-31 21:26:19.250545] I
[rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_home-client-5:
changing port to 49162 (from 0)</span></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255,
255, 255); background-color: rgb(0, 0, 0);"><span
style="font-size: 9px;">[2015-07-31 21:26:19.253643] E
[socket.c:2332:socket_connect_finish] 0-vol_home-client-5:
connection to 10.0.4.2:49162 failed (Connexion refusée)</span></div>
</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(255, 255,
255); background-color: rgb(0, 0, 0);"><span style="font-size:
9px;">… <same kind of lines></span></div>
<div><br>
</div>
</blockquote>
<tt>Connection refused is because brick is down.</tt><tt><br>
</tt><br>
<blockquote cite="mid:B56E1012-97C2-47AC-9C71-15E1B813689D@cnrs.fr"
type="cite">
<div><A few minutes after:> OK, this was due to one brick
which was down. It’s strange: since I have updated GlusteFS to
3.7.x I notice a lot of bricks which go down, sometimes a few
moment after starting the volume, sometime after a couple of
days/weeks… What never happened with GlusterFS version 3.3.1 and
3.5.3.</div>
<div><br>
</div>
</blockquote>
<tt>Could please provide brick log? We will check the log on this
issue, once this issue is fixed, we can initiate quota healing
again.<br>
<br>
</tt><br>
<blockquote cite="mid:B56E1012-97C2-47AC-9C71-15E1B813689D@cnrs.fr"
type="cite">
<div>Now, I need to stop-and-start the volume because I notice
again some hangs with "gluster volume quota … ", "df", etc. One
more time, i’ve never noticed this kind of hangs with previous
versions of GlusterFS I used; is it "expected"?</div>
</blockquote>
<tt><br>
From you previous mail we tried re-creating hang problem, however
it was not re-creating.</tt><tt><br>
</tt><br>
<br>
<br>
<blockquote cite="mid:B56E1012-97C2-47AC-9C71-15E1B813689D@cnrs.fr"
type="cite">
<div>One more time: thank you very much by advance.</div>
<div>
<div>Geoffrey</div>
<div><br>
</div>
</div>
<div>
<div apple-content-edited="true">
------------------------------------------------------<br>
Geoffrey Letessier<br>
Responsable informatique & ingénieur système<br>
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique<br>
Institut de Biologie Physico-Chimique<br>
13, rue Pierre et Marie Curie - 75005 Paris<br>
Tel: 01 58 41 50 93 - eMail: <a moz-do-not-send="true"
href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a>
</div>
<br>
<div>
<div>Le 31 juil. 2015 à 11:26, Niels de Vos <<a
moz-do-not-send="true" href="mailto:ndevos@redhat.com">ndevos@redhat.com</a>>
a écrit :</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">On Wed, Jul 29, 2015 at 12:44:38AM
+0200, Geoffrey Letessier wrote:<br>
<blockquote type="cite">OK, thank you Niels for this
explanation. Now, this makes sense.<br>
<br>
And concerning all split-brains appeared during the
back-transfert, do you have an idea where is this coming
from?<br>
</blockquote>
<br>
Sorry, no, I dont know how that is happening in your
environment. I'll<br>
try to find someone that understands more about it and can
help you with<br>
that.<br>
<br>
Niels<br>
<br>
<blockquote type="cite"><br>
Best,<br>
Geoffrey<br>
------------------------------------------------------<br>
Geoffrey Letessier<br>
Responsable informatique & ingénieur système<br>
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique<br>
Institut de Biologie Physico-Chimique<br>
13, rue Pierre et Marie Curie - 75005 Paris<br>
Tel: 01 58 41 50 93 - eMail: <a moz-do-not-send="true"
href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a><br>
<br>
Le 29 juil. 2015 à 00:02, Niels de Vos <<a
moz-do-not-send="true" href="mailto:ndevos@redhat.com">ndevos@redhat.com</a>>
a écrit :<br>
<br>
<blockquote type="cite">On Tue, Jul 28, 2015 at 03:46:37PM
+0200, Geoffrey Letessier wrote:<br>
<blockquote type="cite">Hi,<br>
<br>
In addition of all split brains reported, is it normal
to notice<br>
thousands and thousands (several tens nay hundreds of
thousands)<br>
broken symlinks browsing the .glusterfs directory on
each brick? <br>
</blockquote>
<br>
Yes, I think it is normal. A symlink points to a
particular filename,<br>
possibly in a different directory. If the target file is
located on a<br>
different brick, the symlink points to a non-local file.<br>
<br>
Consider this example with two bricks in a distributed
volume:<br>
- file: README<br>
- symlink: IMPORTANT -> README<br>
<br>
When the distribution algorithm is done, README 'hashes'
to brick-A. The<br>
symlink 'hashes' to brick-B. This means that README will
be localed on<br>
brick-A, and the symlink with name IMPORTANT would be
located on<br>
brick-B. Because README is not on the same brick as
IMPORTANT, the<br>
symlink points to the non-existing file README on
brick-B.<br>
<br>
However, when a Gluster client reads the target of
symlink IMPORTANT,<br>
the Gluster client calculate the location of README and
will know that<br>
README can be found on brick-A.<br>
<br>
I hope that makes sense?<br>
<br>
Niels<br>
<br>
<br>
<blockquote type="cite">For the moment, i just
synchronized one remote directory (around 30TB<br>
and a few million files) into my new volume. No other
operations on<br>
files on this volume has yet been done.<br>
How can I fix it? Can I delete these dead-symlinks?
How can I fix all<br>
my split-brains? <br>
<br>
Here is an example of a ls:<br>
[root@cl-storage3 ~]# cd
/export/brick_home/brick1/data/.glusterfs/7b/d2/<br>
[root@cl-storage3 d2]# ll<br>
total 8,7M<br>
13706 drwx------ 2 root root 8,0K
26 juil. 17:22 .<br>
2147483784 drwx------ 258 root root
8,0K 20 juil. 23:07 ..<br>
2148444137 -rwxrwxrwx 2 baaden baaden_team
173K 22 mai 2008
7bd200dd-1774-4395-9065-605ae30ec18b<br>
1559384 -rw-rw-r-- 2 tarus amyloid_team 4,3K
19 juin 2013 7bd2155c-7a05-4edc-ae77-35ed7e16afbc<br>
287295 lrwxrwxrwx 1 root root 58
20 juil. 23:38 7bd2370a-100b-411e-89a4-d184da9f0f88
->
../../a7/59/a759de6f-cdf5-43dd-809a-baf81d103bf7/prop-base<br>
2149090201 -rw-rw-r-- 2 tarus amyloid_team
76K 8 mars 2014
7bd2497f-d24b-4b19-a1c5-80a4956e56a1<br>
2148561174 -rw-r--r-- 2 tran derreumaux_team
575 14 févr. 07:54
7bd25db0-67f5-43e5-a56a-52cf8c4c60dd<br>
1303943 -rw-r--r-- 2 tran derreumaux_team 576
10 févr. 06:06 7bd25e97-18be-4faf-b122-5868582b4fd8<br>
1308607 -rw-r--r-- 2 tran derreumaux_team 414K
16 juin 11:05 7bd2618f-950a-4365-a753-723597ef29f5<br>
45745 -rw-r--r-- 2 letessier admin_team 585
5 janv. 2012 7bd265c7-e204-4ee8-8717-e4a0c393fb0f<br>
2148144918 -rw-rw-r-- 2 tarus amyloid_team
107K 28 févr. 2014
7bd26c5b-d48a-481a-9ca6-2dc27768b5ad<br>
13705 -rw-rw-r-- 2 tarus amyloid_team 25K
4 juin 2014 7bd27e4c-46ba-4f21-a766-389bfa52fd78<br>
1633627 -rw-rw-r-- 2 tarus amyloid_team 75K
12 mars 2014 7bd28631-90af-4c16-8ff0-c3d46d5026c6<br>
1329165 -rw-r--r-- 2 tran derreumaux_team 175
15 juin 23:40 7bd2957e-a239-4110-b3d8-b4926c7f060b<br>
797803 lrwxrwxrwx 2 baaden baaden_team 26
2 avril 2007 7bd29933-1c80-4c6b-ae48-e64e4da874cb
-> ../divided/a7/2a7o.pdb1.gz<br>
1532463 -rw-rw-rw- 2 baaden baaden_team 1,8M
2 nov. 2009 7bd29d70-aeb4-4eca-ac55-fae2d46ba911<br>
1411112 -rw-r--r-- 2 sterpone sterpone_team 3,1K
2 mai 2012 7bd2a5eb-62a4-47fc-b149-31e10bd3c33d<br>
2148865896 -rw-r--r-- 2 tran derreumaux_team
2,1M 15 juin 23:46
7bd2ae9c-18ca-471f-a54a-6e4aec5aea89<br>
2148762578 -rw-rw-r-- 2 tarus amyloid_team
154K 11 mars 2014
7bd2b7d7-7745-4842-b7b4-400791c1d149<br>
149216 -rw-r--r-- 2 vamparys sacquin_team 241K
17 mai 2013 7bd2ba98-6a42-40ea-87ea-acb607d73cb5<br>
2148977923 -rwxr-xr-x 2 murail baaden_team
23K 18 juin 2012
7bd2cf57-19e7-451c-885d-fd02fd988d43<br>
1176623 -rw-rw-r-- 2 tarus amyloid_team 227K
8 mars 2014 7bd2d92c-7ec8-4af8-9043-49d1908a99dc<br>
1172122 lrwxrwxrwx 2 sterpone sterpone_team 61
17 avril 12:49 7bd2d96e-e925-45f0-a26a-56b95c084122
->
../../../../../src/libs/ck-libs/ParFUM-Tops-Dev/ParFUM_TOPS.h<br>
1385933 -rw-r--r-- 2 tran derreumaux_team 2,9M
16 juin 05:29 7bd2df54-17d2-4644-96b7-f8925a67ec1e<br>
745899 lrwxrwxrwx 1 root root 58
22 juil. 09:50 7bd2df83-ce58-4a17-aca8-a32b71e953d4
->
../../5c/39/5c39010f-fa77-49df-8df6-8d72cf74fd64/model_009<br>
2149100186 -rw-rw-r-- 2 tarus amyloid_team
494K 17 mars 2014
7bd2e865-a2f4-4d90-ab29-dccebe2e3440<br>
<br>
<br>
<br>
Best.<br>
Geoffrey<br>
------------------------------------------------------<br>
Geoffrey Letessier<br>
Responsable informatique & ingénieur système<br>
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique<br>
Institut de Biologie Physico-Chimique<br>
13, rue Pierre et Marie Curie - 75005 Paris<br>
Tel: 01 58 41 50 93 - eMail: <a
moz-do-not-send="true"
href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a><br>
<br>
Le 27 juil. 2015 à 22:57, Geoffrey Letessier <<a
moz-do-not-send="true"
href="mailto:geoffrey.letessier@cnrs.fr">geoffrey.letessier@cnrs.fr</a>>
a écrit :<br>
<br>
<blockquote type="cite">Dears,<br>
<br>
For a couple of weeks (more than one month), our
computing production is stopped due to several -but
amazing- troubles with GlusterFS. <br>
<br>
After having noticed a big problem with incorrect
quota size accounted for many many files, i decided
under the guidance of Gluster team support to
upgrade my storage cluster from version 3.5.3 to the
latest (3.7.2-3) because these bugs are
theoretically fixed in this branch. Now, since i’ve
done this upgrade, it’s the amazing mess and i
cannot restart the production.<br>
Indeed :<br>
<span class="Apple-tab-span" style="white-space:pre">
</span>1 - RDMA protocol is not working and hang my
system / shell commands; only TCP protocol (over
Infiniband) is more or less operational - it’s not
a blocking point but… <br>
<span class="Apple-tab-span" style="white-space:pre">
</span>2 - read/write performance relatively low<br>
<span class="Apple-tab-span" style="white-space:pre">
</span>3 - thousands split-brains are appeared.<br>
<br>
So, for the moment, i believe GlusterFS 3.7 is not
actually production ready. <br>
<br>
Concerning the third point: after having destroy all
my volumes (RAID re-init, new partition, GlusterFS
volumes, etc.), recreate the main one, I tried to
back-transfert my data from archive/backup server
info this new volume and I note a lot of errors in
my mount log file, as your can read in this extract:<br>
[2015-07-26 22:35:16.962815] I
[afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
865083fa-984e-44bd-aacf-b8195789d9e0<br>
[2015-07-26 22:35:16.965896] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>,
e944d444-66c5-40a4-9603-7c190ad86013 on
vol_home-client-1 and
820f9bcc-a0f6-40e0-bcec-28a76b4195ea on
vol_home-client-0. Skipping conservative merge on
the file.<br>
[2015-07-26 22:35:16.975206] I
[afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
29382d8d-c507-4d2e-b74d-dbdcb791ca65<br>
[2015-07-26 22:35:28.719935] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>,
951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on
vol_home-client-1 and
5ae663ca-e896-4b92-8ec5-5b15422ab861 on
vol_home-client-0. Skipping conservative merge on
the file.<br>
[2015-07-26 22:35:29.764891] I
[afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
865083fa-984e-44bd-aacf-b8195789d9e0<br>
[2015-07-26 22:35:29.768339] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<865083fa-984e-44bd-aacf-b8195789d9e0/job.pbs>,
e944d444-66c5-40a4-9603-7c190ad86013 on
vol_home-client-1 and
820f9bcc-a0f6-40e0-bcec-28a76b4195ea on
vol_home-client-0. Skipping conservative merge on
the file.<br>
[2015-07-26 22:35:29.775037] I
[afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
29382d8d-c507-4d2e-b74d-dbdcb791ca65<br>
[2015-07-26 22:35:29.776857] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<29382d8d-c507-4d2e-b74d-dbdcb791ca65/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt>,
951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on
vol_home-client-1 and
5ae663ca-e896-4b92-8ec5-5b15422ab861 on
vol_home-client-0. Skipping conservative merge on
the file.<br>
[2015-07-26 22:35:29.800535] W [MSGID: 108008]
[afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check]
0-vol_home-replicate-0: GFID mismatch for
<gfid:29382d8d-c507-4d2e-b74d-dbdcb791ca65>/res_1BVK_r_u_1IBR_l_u_Cond.1IBR_l_u.1BVK_r_u.UB.global.dat.txt
951c5ffb-ca38-4630-93f3-8e4119ab0bd8 on
vol_home-client-1 and
5ae663ca-e896-4b92-8ec5-5b15422ab861 on
vol_home-client-0<br>
<br>
And when I try to browse some folders (still in
mount log file):<br>
[2015-07-27 09:00:19.005763] I
[afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
2ac27442-8be0-4985-b48f-3328a86a6686<br>
[2015-07-27 09:00:22.322316] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>,
9c635868-054b-4a13-b974-0ba562991586 on
vol_home-client-1 and
1943175c-b336-4b33-aa1c-74a1c51f17b9 on
vol_home-client-0. Skipping conservative merge on
the file.<br>
[2015-07-27 09:00:23.008771] I
[afr-self-heal-entry.c:565:afr_selfheal_entry_do]
0-vol_home-replicate-0: performing entry selfheal on
2ac27442-8be0-4985-b48f-3328a86a6686<br>
[2015-07-27 08:59:50.359187] W [MSGID: 108008]
[afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check]
0-vol_home-replicate-0: GFID mismatch for
<gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012588.gro
9c635868-054b-4a13-b974-0ba562991586 on
vol_home-client-1 and
1943175c-b336-4b33-aa1c-74a1c51f17b9 on
vol_home-client-0<br>
[2015-07-27 09:00:02.500419] W [MSGID: 108008]
[afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check]
0-vol_home-replicate-0: GFID mismatch for
<gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0012590.gro
b22aec09-2be3-41ea-a976-7b8d0e6f61f0 on
vol_home-client-1 and
ec100f9e-ec48-4b29-b75e-a50ec6245de6 on
vol_home-client-0<br>
[2015-07-27 09:00:02.506925] W [MSGID: 108008]
[afr-self-heal-name.c:353:afr_selfheal_name_gfid_mismatch_check]
0-vol_home-replicate-0: GFID mismatch for
<gfid:2ac27442-8be0-4985-b48f-3328a86a6686>/md0009059.gro
0485c093-11ca-4829-b705-e259668ebd8c on
vol_home-client-1 and
e83a492b-7f8c-4b32-a76e-343f984142fe on
vol_home-client-0<br>
[2015-07-27 09:00:23.001121] W [MSGID: 108008]
[afr-read-txn.c:241:afr_read_txn]
0-vol_home-replicate-0: Unreadable subvolume -1
found with event generation 2. (Possible
split-brain)<br>
[2015-07-27 09:00:26.231262] E
[afr-self-heal-entry.c:249:afr_selfheal_detect_gfid_and_type_mismatch]
0-vol_home-replicate-0: Gfid mismatch detected for
<2ac27442-8be0-4985-b48f-3328a86a6686/md0012588.gro>,
9c635868-054b-4a13-b974-0ba562991586 on
vol_home-client-1 and
1943175c-b336-4b33-aa1c-74a1c51f17b9 on
vol_home-client-0. Skipping conservative merge on
the file.<br>
<br>
And, above all, browsing folder I get a lot of
input/ouput errors.<br>
<br>
Currently I have 6.2M inodes and roughly 30TB in my
"new" volume.<br>
<br>
For the moment, Quota is disable to increase the IO
performance during the back-transfert… <br>
<br>
Your can also find in attachments:<br>
<span class="Apple-tab-span" style="white-space:pre">
</span>- an "ls" result<br>
<span class="Apple-tab-span" style="white-space:pre">
</span>- a split-brain research result<br>
<span class="Apple-tab-span" style="white-space:pre">
</span>- the volume information and status<br>
<span class="Apple-tab-span" style="white-space:pre">
</span>- a complete volume heal info<br>
<br>
Hoping this can help your to help me to fix all my
problems and reopen the computing production.<br>
<br>
Thanks in advance,<br>
Geoffrey<br>
<br>
PS: « Erreur d’Entrée/Sortie » = « Input / Output
Error » <br>
------------------------------------------------------<br>
Geoffrey Letessier<br>
Responsable informatique & ingénieur système<br>
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique<br>
Institut de Biologie Physico-Chimique<br>
13, rue Pierre et Marie Curie - 75005 Paris<br>
Tel: 01 58 41 50 93 - eMail: <a
moz-do-not-send="true"
href="mailto:geoffrey.letessier@ibpc.fr">geoffrey.letessier@ibpc.fr</a><br>
<br>
<ls_example.txt><br>
<split_brain__20150725.txt><br>
<vol_home_healinfo.txt><br>
<vol_home_info.txt><br>
<vol_home_status.txt><br>
</blockquote>
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>