[Gluster-users] Issue for replicate translator

Zhuo Yin zhuoyin at gmail.com
Fri Oct 23 21:05:23 UTC 2009


Hi, All:

I've met a problem when doing unit test for replicate translator. And this
essential problem has bothered me for 2 weeks.

My setup is 4 copies in 4 machines (A,B,C,D), all A,B,C,D are acting both
server and client(the mount point is /home/),
and there is another machine E, which doesn't has any copy, just act as
purely client and utilize all A,B,C,D's disks

my simulate failure strategy is:

Step 1. randomly choose 1 machine, fail the NIC glusterfs is listen on(there
are still 3 copies on-line)
Step 2. sleep for a while (like 60 seconds)
Step 3. bring up the NIC I failed before (there are 4 copies on-line right
now)
Step 4. do "ls -laR /home" on the failed machine before
Step 5. goto step 1

Simultaneously,  I'm also doing `ls -laR /home > test_result.txt` on the
machine E for times.

I've found the problems like:
1. missing files in directory or duplicate name in the same directory in the
ls output, like:
   see the small part of different output of `ls -laR`, this is the vimdiff
output.
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file84                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file83
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file84                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file84
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file85                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file85

----------------------------------------------------------------------------------|
-rw-r--r--   1 root root 5004454 2009-10-16 19:21
file86

----------------------------------------------------------------------------------|
-rw-r--r--   1 root root 5004454 2009-10-16 19:21
file87
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file88                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file88
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file89                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file89
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file89                          |
----------------------------------------------------------------------------------
  -rw-r--r--   1 root root 5004454 2009-10-16 19:20
file9                           |  -rw-r--r--   1 root root 5004454
2009-10-16 19:20 file9
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file90                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file90
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file91                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file91
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file91                          |
----------------------------------------------------------------------------------
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file92                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file92
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file93                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file93
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file94                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file94
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file95                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file95
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file97                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file96
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file97                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file97
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file98                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file98
  -rw-r--r--   1 root root 5004454 2009-10-16 19:21
file99                          |  -rw-r--r--   1 root root 5004454
2009-10-16 19:21 file99
    There are apparently some files are missing and duplicate name, all are
in the same directory.

2. occasionally, the ls reports:
    ls: reading directory /home/dir1/dir21: File descriptor in bad state

I really want guys can solve this basic and essential problem

The glusterfsd.vol I'm using for all 5 machines is:
=========================================================================================
# THIS IS THE SERVER-END CONFIGURATION
# Brick 1
volume posix
        type storage/posix
        option directory /mnt/disk1
end-volume

volume locks
        type features/locks
        subvolumes posix
end-volume

volume brick
        type performance/io-threads
        option thread-count 16
        subvolumes locks
end-volume

# Server
volume server
        type protocol/server
        option transport-type tcp/server
        option transport.socket.bind-address `ifconfig -a | grep
"10.106.105." | awk '{print $2}' | awk 'BEGIN {FS=":"};{print $2}'`
        option transport.socket.listen-port 6996
        subvolumes brick
        option auth.addr.brick.allow *
end-volume

# SERVER-END CONFIGURATION ENDS

# THIS IS THE CLIENT-END CONFIGURATION
# 3 Disks Machines
# Machine 1
volume cbrick1
        type protocol/client
        option transport-type tcp
        option remote-host 10.106.105.150
        option remote-port 6996
        option remote-subvolume brick
end-volume

# Machine 2
volume cbrick4
        type protocol/client
        option transport-type tcp
        option remote-host 10.106.105.151
        option remote-port 6996
        option remote-subvolume brick
end-volume

# Machine 3
volume cbrick7
        type protocol/client
        option transport-type tcp
        option remote-host 10.106.105.152
        option remote-port 6996
        option remote-subvolume brick
end-volume

# Machine 4
volume cbrick10
        type protocol/client
        option transport-type tcp
        option remote-host 10.106.105.153
        option remote-port 6996
        option remote-subvolume brick
end-volume

# All the bricks delare finished

# Replicate part
volume rep1
        type cluster/replicate
        subvolumes cbrick1 cbrick4 cbrick7 cbrick10
end-volume

# CLIENT END CONFIGURATION ENDS
========================================================================================================



Regards,
Zhuo Yin (917)215-8740
Gentoo Linux Fan - int (*(*(*pFile)())[10])();


More information about the Gluster-users mailing list