GlusterFS cookbook

From GlusterDocumentation

FIXME: work in progress - suggestions welcome

Contents

Introduction

What is GlusterFS

A largely scalable clustered filesystem in userspace (FUSE). GlusterFS is highly adaptable, feature rich, POSIX compliant and has a layered design. GlusterFS can work on any type of interconnect, IB or gig/e, or 10gig/e.

Why we wrote another filesystem

We ourself wanted to deploy a large filesystem for one of our customer, who needed Petabyte + storage. For that type of large volume, we needed a filesystem which addresses reliability, maintainability (ease of use), and scalability. We couldn't find all these three in a single filesystem. This is the reason why we started our own filesystem.

Where is white paper about your filesystem format

GlusterFS is a clustered network filesystem, not a disk based filesystem. with GlusterFS, on back end one can keep their own pet filesystem in the back end directory. GlusterFS doesn't have any format of its own to keep data. A back end of GlusterFS can be visualized same as NFS export point, where you can keep the directories as it is. As its not a new filesystem format by itself, we happen to spend more time coding than publishing a white paper. But we will come out with one soon until then the answer will be "No, we don't have one yet!".

Why use GlusterFS

As told earlier, we wrote it to address three major issues seen in large scale storage solutions.

  1. Scalability
  2. Performance
  3. Manageability

The Userspace design approach which we took helped us to achieve these goals faster, in a much efficient way.

Its well known that filesystems are core part of the OS, hence has to be in kernel space. But when it comes to network filesystems, this is is not true. The delay which is caused by network latency is much larger than context switch overhead caused by being in userspace. And if the network latency is reduced by using Infiniband or 10gigabyte cards, one can do RDMA from userspace to remote machines, hence the question of context switch doesn't arise. So, the question of performance doesn't come into picture.

Also as we are in userspace, the development cycles for any feature is quite less compared to kernel based filesystem. Any problem with the filesystem, just a application restart is enough (say a umount/mount is enough). No need of reboots, no nightmares of kernel panics.

Versions

Currently GlusterFS version is 1.4.0. GlusterFS uses 'GNU arch' to handle distributed development. There is no relation between GNU arch branch names and the version of glusterfs releases.

There are few strings attached with the release tarballs.

  • Stable Releases

A stable release is the one which is made after there are no outstanding known bugs in the code base. A standard stable release tarball looks like 'glusterfs-1.3.12.tar.gz'

  • QA

QA releases are done by developers for internal qa/testing team. These are not advised for actual production deployments, and may still have some known issues. The release tarball looks like 'glusterfs-1.4.0qa63.tar.gz'

  • PRE

Releases done prior to the actual release. These are pretty stable code, which doesn't have any 'major' issues. The release tarball looks like 'glusterfs-1.4.0pre12.tar.gz'

  • RC (Release Candidate)

Done just before the stable release. A release may look like 'glusterfs-1.4.0rc2.tar.gz'

  • DOA (Dead on arrival)

A doa release is a release done by developer to test some specific features implemented by them, where they don't guarantee any break/errors as a overall package. Using this release tarballs are highly discouraged. A release version may look like 'glusterfs-0.1.DOA2.tar.gz'. Currently GlusterFS team is not doing any more doa releases.

Copyright

GlusterFS is distributed under 'GNU Public License version 3 or later' (GPLv3) license. And its documentation is released under GNU Free Documentation License 1.2 or later. Gluster.com or Z Research Inc, is owning the copyright of the product. More questions about usage, contributions, licensing in our Legal FAQ section

Support

Gluster.com, or Z Research Inc, works based on support/subscription for the product business model. There is only one code base of the product, which is given to both community users and paid customers. Infact, users are having direct read (or checkout) permission to our archive, through which they can get access to any new features before the enterprises or customers. The version which the developers think is more stable is pushed to the paid customers.

Free Software community

  • Gluster developer mailing list

Gluster developer mailing list where discussions are mostly based on pure technical problems, like compilation issues, bugs, new feature developments.

  • Gluster user mailing list

Gluster user mailing list is open for discussions regarding configurations, management related discussions.

  • #gluster irc channel

IRC channel (on irc.gnu.org or irc.freenode.net) #gluster can be used as a direct discussion channel to talk with experts on GlusterFS. As chat log is indexed, leave your questions there, they may be answered based on the availability of developers or users.

  • Roadmap open for requests

GlusterFS Roadmap page is open for requests of community. If a feature has higher requests, its pulled up in roadmaps.

Paid Subscription

Visit Subscription Request page to get a quote based on your requirement. If you are willing to get a subscription package, your requests on Roadmap, your queries will be given higher priorities in the development team.

Features in GlusterFS

Its surely very hard to explain the feature of GlusterFS in few words. To list few,

  • Fully POSIX compliant.
  • The features are implemented as layers (called Translators), and are very modular.
  • Scales seamlessly to more number of servers and capacity.
  • No 'fsck', errors are self-healed.
  • Files are kept as files and folders in backend (remember NFS ?), so recovering data without GlusterFS is very simple.
  • Works over multiple OSes, different hardwares.
  • Cost effective and user friendly.

Translators

  • distribute
  • unify (deprecated in favor of distribute)
  • stripe (raid-0 like)
  • replicate (formerly named 'afr') Automatic File Replication (raid-1 like)
  • dht or Distributed Hash Table
  • ha or High Availability
  • write-behind
  • read-ahead
  • io-threads
  • io-cache
  • posix
  • BDB or Berkeley DB storage backend
  • filter
  • quota
  • posix-locks
  • trash

Protocol

The protocol GlusterFS uses between server and client is very much specific to GlusterFS and is not compatible to any other existing protocols, that means, one can't have NFS client or CIFS client for glusterfs server exported volume.

Earlier versions of GlusterFS used ASCII based protocol, which enabled us to concentrate on other features/functionality without worrying much about modifying protocol syntax. Once the product started getting wider acceptance, the need for performance, the need for going low on CPU was a requirement. GlusterFS is not any more a proof of concept, fancy, research oriented filesystem. Rather, its aimed to be enterprise class large filesystem, which can be used to address variety of storage problems. From large files to small files. Hence there was a need for a standard binary protocol, which can reduce the CPU overhead caused by the string comparison/conversion, and the extra data transferred over the wire. Both of these were a major hurdle to achieve the best possible performance. So, now we have binary protocol between client and server process with very less header overhead, which makes the performance of small files very attractive.

Transports

A common questions asked when you encounter a network based product is what interface does this need?. GlusterFS has different transport modules. It can work fine with TCP/IP stack (both ipv4 and ipv6), and also has module to work with IB (Infiniband) verbs.

The following transport modules are supported in glusterfs

  • tcp/ip (both ipv4 and ipv6)
  • ib-verbs (Infiniband native RDMA support)
  • ib sdp (IB socket direct protocol)

Authentication

Currently GlusterFS implements very minimal authentication modules.

Address based

Authentication is done based on ip address.

username based

Authentication is based on username password

Booster

Booster is a shared library that allows applications to bypass the FUSE-based mount point and access the GlusterFS client side stack directly. The advantage lies in the fact that some applications can experience a significant performance boost as the FUSE kernel module and multiple context switch bottleneck is avoided. The added benefit is that the application does not require modification at all since the library is LD_PRELOAD'ed.

The LD_PRELOAD mechanism allows libraries to intercept libc library and system calls in order to augment the system call's functionality. In booster, we use this ability to redirect system calls, to route them directly into GlusterFS client side stack.

An example of how simple it is to use booster is below:

bash# vi booster.conf
..Create a booster configuration file..
bash# export GLUSTERFS_BOOSTER_CONF=$(pwd)"/booster.conf"
bash# LD_PRELOAD=/usr/lib/glusterfs/glusterfs-booster.so <APPLICATION-BINARY>
bash# 
... All I/O operations over GlusterFS here are done using booster, not FUSE.

Comprehensive instructions on using Booster are available at BoosterConfiguration

libglusterfsclient

Application developers who want to squeeze every bit of performance from their networked storage system can leverage libglusterfsclient to write applications that bypass the FUSE-based in-kernel file system calls and hook directly into the GlusterFS client side stack. The booster module above is a layer built over libglusterfsclient.

mod_glusterfs

Web embeddable glusterfs module, works fine on Apache versions 1.3.x and 2.2.x, and Lighty (lighttpd) version 1.4 and 1.5. Corresponding README.txt files are present in source tarball.

Installation process

The first thing we recommend any of the interested users is to try out the product by themself to see whether it suits their needs. Installation is the first step to get a feel of the product.

Dependencies

Basic Dependencies

  • FUSE is the primary requirement for GlusterFS to work. Now a days its part of most of the OSes. (oh! sorry not yet stable on MS Windows yet).
  • Extended attribute support for backend (exporting) filesystem. (This may not be required in all the cases, but is required with some of the key features).

Misc/Feature supports

  • OFED stack: If you have Infiniband network, this is required.
  • Apache / Lighttp: If you want 'mod_glusterfs', a web embeddable module for web server, which doesn't need fuse layer to see the filesystem behind.
  • Berkeley DB: To get the BDB backend to store very small files as records.

Download

Check the download page for latest version

Installation

Here are some distro specific Installation steps of GlusterFS. If your distribution of OS is different than below mentioned, try compiling from Source.

After installation, make sure the installation is successful by checking the version of GlusterFS.

bash# glusterfs --version

Generally its advised to shift to newer releases as they will be coming out with some extra features and bug fixes. But you are the best person to judge which version is working good for you.

GNU/Linux

'rpm' based distros

One can use the rpm available in GlusterFS ftp site.

bash# rpm -ivh glusterfs-<version>.rpm

It can be used on 'Fedora', 'OpenSuse', RedHat and CentOS distributions.

'deb' based distros

Currently there are few contributors maintaining the glusterfs debian package. If the latest version is available in the debian repository, you just need to do

bash# apt-cache search glusterfs
bash# apt-get install <glusterfs-*> # what ever the above search shows.

If the latest package is not available, then you need to install from source. which is described following sections.

Install from Source

Source tarball is available in the ftp repository of the project. Get the latest version as of today.

bash# tar -xzf glusterfs-<version>.tar.gz
bash# cd glusterfs-<version>
bash# ./configure > /dev/null
GlusterFS configure summary
===========================
FUSE client        : yes
Infiniband verbs   : yes
epoll IO multiplex : yes
Berkeley-DB        : yes
libglusterfsclient : yes
mod_glusterfs      : yes
argp-standalone    : no

bash# make && make install
bash# glusterfs --version

NOTE: By default ./configure takes installation path (prefix) as /usr/local/, if you want different path, just add --prefix=/<your>/<path> to ./configure.

OS X

On Mac, though the source tarball can be built without any problems, one may be interested to use the click install .dmg images available from our ftp site. Click on the .dmg image after download and you will get a glusterfs package, which need to be installed by clicking on it again. If its a remote machine, you are doing installation on a terminal, you can use the below commands to install glusterfs.

 bash# hdiutil attach glusterfs-<version>.dmg
 bash# installer -pkg /Volumes/glusterfs-<version>/glusterfs-<version>.pkg -target /
 bash# hdiutil detach /Volumes/glusterfs-<version>/

NOTE: Please go through 'README.MacOS' available with the .dmg image for complete steps for installation, for any version specific information.

Solaris

One can use the Solaris (ZFS) as storage backend with GlusterFS. The client part is not tested yet (mainly due to fuse support on solaris). Go through the below steps to get it working without any issues.

Packages needed

  • SUNWarc - system SUNWarc Lint Libraries (usr)
  • SUNWhea - system SUNWhea SunOS Header Files
  • SUNWgcc gcc - The GNU C compiler
  • SUNWbinutils binutils - GNU binutils
  • pkg-get - /opt/csw/bin
  • automake - /opt/csw/bin
  • autoconf - /opt/csw/bin
  • m4 - /opt/csw/bin
  • bison 2.3 - /opt/csw/bin (GlusterFS works with this bison)
  • flex 2.5.4 - /opt/csw/bin (GlusterFS works with this flex)

NOTE: flex also comes with SUNWflexlex package, which is installed in /usr/sfw/bin, but its been seen that with that flex there is problem of linking in GlusterFS. So, use the flex installed using 'pkg-get install flex' )

commands to follow

# bash
bash-3.00# export PATH=/usr/sfw/bin:/usr/gnu/bin:/usr/bin:/usr/X11/bin:/usr/sbin:/sbin:/opt/csw/bin
bash-3.00# gunzip -d glusterfs-VERSION.tar.gz
bash-3.00# tar -xf glusterfs-VERSION.tar
bash-3.00# cd glusterfs-VERSION
bash-3.00# ./configure --disable-fuse-client --disable-ibverbs
bash-3.00# make CFLAGS="-g -O0" LDFLAGS="-L/opt/csw/lib"
bash-3.00# make install
bash-3.00# /usr/local/sbin/glusterfs --version

BSD

Only tested on FreeBSD 7 or later.

bash# gunzip glusterfs-<version>.tar.gz
bash# tar -xf glusterfs-<version>.tar
bash# cd glusterfs-<version>
bash# ./configure && make && make install
bash# glusterfs --version

If you find any problems, write to the developer mailing list (please provide complete information like machine type, OS version, log messages while failing, for quick help).

Configuration

We hope your installation process was simple enough. Now, lets move to the configuration part, which is also simple, but one need to understand the design of volume specification files before anything else.

If you don't want to spend more time learning this, and want to start right away, there are standard volume specification files available, just copy them into proper path, change IP address as per your network, and you are all set to mount GlusterFS

Volume file

Volume file (or volfile in short) is the mechanism through which GlusterFS understands how it has to behave, how it has to load its translators to give a feature rich filesystem. In GlusterFS, all the translators are dynamically loadable shared object libraries. Using specfile, glusterfs gets a graph of translators which defines its behavior. The art of writing a good volfile can make you GlusterFS 'GURU' . So, our personal advise to you is read this section bit carefully and understand it :-)

Importance

As said earlier, the volfile defines the behavior of the filesystem, which means it can make it a perfect solution, or a product which doesn't work at all. I hope you know what is a Swiss army knife. When all the tools are hidden, its a small heavy metal piece. When you use appropriate tool inside for required task, you realize it can solve so many problems of you in daily life. Well, you can't even hold it if you take out all the tools from it. Well, the glimpse of that is, with all these layered approach of glusterfs, so many features, you can solve most of your storage problems, but properly tuning of parameters, proper organizing of translators in volfile is required to get the best performance and stability. Everyday, we try to make sure that even the worst possible combination functionally works fine.

You just need to understand that, volume file is important. Please make sure that you send the volfile (which is logged in logfile when the process starts) with your bug report :-)

Syntax

GlusterFS has a strict syntax check on the volfile. The keywords allowed are 'volume' , 'end-volume' , 'subvolumes' , 'type' and 'option' .

A snippet of the volume file looks like below

volume union
  type cluster/unify
  option scheduler rr
  option namespace ns
  option self-heal on
  subvolumes client1 client2 client3 client4 client5
end-volume

GlusterFS syntax is pretty simple, but still, there can be errors while writing/editing it. Hence GlusterFS team provides 'emacs' and 'vim' specific syntax modes.

'emacs' mode

Download the syntax mode file here - glusterfs-mode.el

You should add the following lines in your '~/.emacs' file

;; (add-to-list 'load-path "<directory path, which contains glusterfs-mode.el")
(add-to-list 'load-path "/usr/share/doc/glusterfs/")
(require 'glusterfs-mode)

Now when you open '*.vol' files in emacs, you can see different colors for glusterfs syntax. If the volume file name ends with different extension (other than *.vol), do 'M-x glusterfs-mode' to get the syntax highlighting feature.

'vim' mode

Download the syntax mode file here - glusterfs.vim

You can enable it in the command mode of vim, by typing 'source glusterfs.vim'. Now you can see the different color coding used for volume files.

Example

NOTE: These volume files given above are for functionality tests only. These may not include any or all the performance tuning translators.

Management scripts on OS

When you install glusterfs, 'mount.glusterfs' script is installed at /sbin, so administrator gets a option to have the glusterfs mountpoint in /etc/fstab like any other filesystems. This reduces lot of management overhead for admins.

Well, the above statement is a bestcase which covers 90% of setups. Due to its wide configurable options, sometimes just having mount point entry in /etc/fstab may not be enough (the cases where a server process and client process are present in the machine), even sometimes, a entry in /etc/fstab is not at all required (a case where a machine is acting only as storage server). In that type of cases admins may want to use the 'init.d' scripts to start the server processes. Checkout extras/init.d/* in source tarball for proper init.d scripts. Well, its not difficult to write one for yourself too.

NOTE: Refer to Roadmap where we have a WebUI coming out to monitor other fields related to network filesystem, which is intended to handle most of the manageability issues.

Exporting over NFS

As on backend GlusterFS needs just a directory to export, you can run GlusterFS over NFS. But as NFS is not completely posix compliant FS, those operations which fail over NFS, also fails over GlusterFS. Also, because some versions of NFS don't support extended attributes, AFR/DHT may not work properly.

But if you are exporting GlusterFS over NFS, don't think of getting very good performance as NFS may choke under load.

NFS re-export

NFS re-export works fine with GlusterFS. But you may have to go through the 'README.NFS' of fuse tarball (in fuse-2.7.3)

root@space:/tmp/fuse-2.7.3glfs10 # cat README.NFS

FUSE module in official kernels (>= 2.6.14) don't support NFS
exporting.  In this case if you need NFS exporting capability, use the
'--enable-kernel-module' configure option to compile the module from
this package.  And make sure, that the FUSE is not compiled into the
kernel (CONFIG_FUSE_FS must be 'm' or 'n').

You need to add an fsid=NNN option to /etc/exports to make exporting a
FUSE directory work.

You may get ESTALE (Stale NFS file handle) errors with this.  This is
because the current FUSE kernel API and the userspace library cannot
handle a situation where the kernel forgets about an inode which is
still referenced by the remote NFS client.  This problem will be
addressed in a later version.

root@space:/tmp/fuse-2.7.3glfs10 # _

Also you need to give option '--disable-direct-io-mode' for GlusterFS while starting it.

Re-exporting with Samba for CIFS clients

You can easily re-share a GlusterFS mountpoint, or subdirectories of that mountpoint using samba. To support windows ACLs use vfs_acl_xattr support (experimental).

[glusterfs]
   comment = GlusterFS Files
   path = /mnt/gluster
   browsable = yes
   writable = yes
   vfs objects = acl_xattr


To improve performance with Windows clients, compile the samba server with 'no-utimes' option.

while compiling samba server package use:

bash# export ac_cv_func_utimes=no ac_cv_func_utime=no ./configure --prefix=/usr
bash# make install

RoadMap

Visit GlusterFS Roadmap.

Highlights are

  • webUI for management
  • hot add/remove of nodes
  • nodup
  • snapshot

and the list goes on..

FAQ

Questions are part of any new initiatives. That too when a revolutionary concept becomes so simple, its very hard to believe. So, the number of questions increases. The GlusterFS team is very happy to answer most of your doubts about filesystem in general, and GlusterFS in specific.

General

GlusterFS FAQ

Legal

What is the license used for Gluster?

GlusterFS is released under GNU General Public License v3 or later. Documentation is released under GNU Free Documentation License 1.2 or later.

What is the relation between Gluster and Z RESEARCH Inc?

Z RESEARCH Inc owns the copyright and trademark of GlusterFS. All the current developers are employed by Z Research Inc. Also, Z Research Inc offers support on the software.

How does the external contributions handled ?

External contributions are copyrighted to their corresponding owners if the work is significant. If the contribution is a very small patch, minor bug fix, currently the copyright is held by Z Research Inc itself as it becomes easier to maintain the legal part of it.

Is there a difference between commercially supported Gluster and community version?

Z RESEARCH Inc does not maintain any proprietary extensions to GlusterFS. Free Software can be commercial too. Z RESEARCH bundles support and services into a commercial subscription package. For more info, please visit this page Subscription Package.


Technical

Technical FAQ

Contribute

GlusterFS is a FreeSoftware Project, and all of its developers believe in Freedom of mind. You are free to choose how to contribute in your own way. Here are few common options if you like to contribute.

Develop

As each functionality of GlusterFS is implemented as layers (or translators), you can write your own functionality layer.

Testing

Again, because of its modularity and layered design, there are so many different combination possible using GlusterFS. You can test it in your own combination of translators, and run your application and report us bugs if you find any.

Porting

Currently GlusterFS is tested on GNU/Linux, Mac OS X, OpenSolaris, FreeBSD 7.0 or later, on x86 architecture. If you are using any other OS, or different hardware architecture, you can help us port GlusterFS to that specific setup.

Documentation

We are trying to keep our documentation up to date, but it may not be complete, hence any piece of documentation is helpful for the project. You can post a tutorial of your own, correct grammar, add picture of your setup, add case study.

Spread the word

If you like the product, please let the world know about it. You can let the world know about it by adding an entry in our Who's using GlusterFS page. You can also promote us by writing blogs, speaking in conferences etc.

Buy Support

Though all the above type of contributions can be done by individual, as an industry/enterprise you can choose to support the product by taking the Subscription, which works as win win case for you and us. Increased development speed for us, and more prioritized support for you.

Any user of GlusterFS are not bounded by above mentioned steps. One can choose to use the product without notifying the team. But when you choose to re-distribute, make sure you publish the code under GPLv3 or later license.

Thanks

To the members of developer mailing list who helped us to test the product and get it to the stable state, and all of you who helped to spread the name.

Contact/Suggestions

In case you have any suggestions or queries please write to:


Gluster Core Team

gluster-users (at) gluster dot org

 

Copyright © Gluster, Inc. All Rights Reserved.