Version 3.3 introduced a new structure to the bricks, the .glusterfs directory. So what is it?
As you’re probably aware, GlusterFS stores metadata info in extended attributes. One of these bits of metadata is the “trusted.gfid”. This is, for all intents and purposed the inode number. A uuid that’s unique to each file across the entire cluster. This worked pretty well for 3.1 and 3.2, but there were always a few weaknesses with regard to AFR (automatic file replication).
The GFID is used to build the structure of the .glusterfs directory. Each file is hardlinked to a path that takes the first two digits and makes a directory, then the next two digits makes the next one, and finally the complete uuid.
For instance
# getfattr -m . -d -e hex /data/glusterfs/d_home/stat.c getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/d_home/stat.c trusted.afr.home-client-10=0x000000000000000000000000 trusted.afr.home-client-11=0x000000000000000000000000 trusted.afr.home-client-9=0x000000000000000000000000 trusted.gfid=0xc62757554baf4a33bc7690c56dac23e0
makes a hardlink to
/data/glusterfs/d_home/.glusterfs/c6/27/c6275755-4baf-4a33-bc76-90c56dac23e0
Several ways in which the prior method was deficient was for deletes, renames, and hardlinks. If the connection was lost to a replica and a file was renamed, how would we know that it wasn’t just deleted (or vice versa)?This caused issues where duplicate files were created causing confusion.
Now if a file is deleted, so is its .glusterfs file. The self-heal daemon can walk the tree of the reconnected server and see a file that doesn’t exist on the good server. Since the gfid file is also gone, it’s deleted. If the gfid file of the missing file does still exist, it’s been renamed. The filename can be deleted from the stale server, but not the gfid file. Once the self-heal daemon walks to the new filename, that filename is then hardlinked with the data that’s still on the server. This also reduces the need for data transfer to heal a renamed file.
If a file was hardlinked, You were generally screwed. Eventually a disconnect would happen. A file would get stale. When the self-heal happened, the client had no way of knowing that there was another file with the same gfid, so it would create one. Lots of unnecessary file duplication was created. With the gfid files, each filename is hardlinked to the same gfid file so there’s no waste.
Coming soon to GlusterFS is NFSv4 support in which you can have anonymous file descriptors. gfid files allow that to happen by creating the gfid file without creating an entry in the directory tree.
As an admin that means that you now have to manage gfid files as well as tree files with regard to self-heal and split-brain (see the article on healing split-brain). To do that I thought it might be useful to know how it’s layed out.
To begin with, the root directory of each brick has the gfid of 00000000-0000-0000-0000-000000000001. This puts it in .glusterfs/00/00. It’s gfid file is a symlink that points to “../../..”. If it’s not, you’ll get self-heal failures healing “/”. Still not sure I how got them, but after creating a multiple split-brain scenerio with my replica 3 servers, some of the root gfid files were directories instead of symlinks (bug #859581).
Directories each create symlinks that point to the gfid of themselves within the gfid of their parent. So my home directory:
# getfattr -m . -d -e hex /data/glusterfs/d_home/jjulian getfattr: Removing leading '/' from absolute path names # file:data/glusterfs/d_home/jjulian security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000 trusted.afr.home-client-10=0x000000000000000000000000 trusted.afr.home-client-11=0x000000000000000000000000 trusted.afr.home-client-9=0x000000000000000000000000 trusted.gfid=0xa0d421e0c3f249d4b2ee64e101c233af trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff
Creates a symlink like
/data/glusterfs/d_home/.glusterfs/a0/d4/a0d421e0-c3f2-49d4-b2ee-64e101c233af -> ../../00/00/00000000-0000-0000-0000-000000000001/jjulian
The next directory down would point to ../../a0/d4/a0d421e0-c3f2-49d4-b2ee-64e101c233af/${self}, etc.
Symlinks retain their same symlink but with the gfid name:
# ls -l /data/glusterfs/b_home/jjulian/.fedora-upload-ca.cert lrwxrwxrwx 2 root root 22 Sep 21 09:42 /data/glusterfs/b_home/jjulian/.fedora-upload-ca.cert -> .fedora-server-ca.cert # getfattr -h -n trusted.gfid -e hex /data/glusterfs/b_home/jjulian/.fedora-upload-ca.cert getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/b_home/jjulian/.fedora-upload-ca.cert trusted.gfid=0x4bfc7da690004fe4b54eb0399984b712 # ls -l /var/spool/glusterfs/b_home/.glusterfs/4b/fc/4bfc7da6-9000-4fe4-b54e-b0399984b712 lrwxrwxrwx 2 root root 22 Sep 21 09:42 /data/glusterfs/b_home/.glusterfs/4b/fc/4bfc7da6-9000-4fe4-b54e-b0399984b712 -> .fedora-server-ca.cert
If you delete a file from a brick without deleting it’s gfid hardlink, the filename will be restored as part of the self-heal process and that filename will be linked back with it’s gfid file. If that gfid file is broken, the filename file will be as well.
2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...
It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...
The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...