GlusterFS FAQ
From GlusterDocumentation
General FAQ
What is GlusterFS?
GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86-64 server with SATA-II RAID and Infiniband or GigE interconnect.
What is the problem with DAS / RAID / JBOD / NFS / SAN?
They don't scale both in terms of size and performance. SAN is better than the rest, but it is exorbitantly expensive. But even SAN cannot scale to hundreds of TBs for a large number of clients.
What is the problem with existing Cluster File Systems?
Cluster file systems are still not mature for enterprise market. They are too complex to deploy and maintain, although they are extremely scalable and cheap since they can be entirely built out of commodity OS and hardware.
GlusterFS hopes to solve this problem.
Why is striping bad?
- Increased Overhead
Striping files across multiple bricks and reading/writing them at same time will cause serious disk contention issues and the performance will suffer badly as load increases. If you avoid striping, the underlying filesystem and the I/O scheduler on each brick knows best how to organize the file data into contiguous disk blocks for optimized read and write operations.
- Increased Complexity
Striping complicates the design of clustered filesystems badly. Instead of using the underlying mature filesystem's ability to do block disk management, you will have to implement another clustered file system across multiple underlying filesystems (duplication).
- Increased Risk
Loss of a single node can mean loss of entire file system. Imagine how slow it is to run fsck on hundreds of TBs of data.
In reality, striping introduces more problems than it solves, particularly when a file system scales beyond hundreds of TBs.
Alternatively when files and folders remain as is, they take advantage of the underlying file system to do the real block I/O management. A single file can grow from 4TB to 16TB within a single node. In reality, files are not of TBs in size. When multiple clients access a same file, most likely the blocks are already cached in the RAM and sent via RDMA to the clients. GlusterFS takes advantage of high bandwidth low-latency interconnects such as Infiniband.
The GlusterFS AFR (Automatic File Replication) translator does a striped read to improve performance on mirrored files.
You say striping is bad, but I see a 'cluster/stripe' translator in GlusterFS, why is it?
Well, let's list a few points.
- Implementing the striping feature was a few days work for the GlusterFS team, due to its modular design.
- There are a few people/companies who want a striping feature in the FS as their application uses a single large file (100GB - 2TB).
- If the file is very big (which we don't recommend, though) a single server gets completely loaded when its accessed, so striping helps to distribute the load.
- If one uses 'cluster/afr' translator with 'cluster/stripe' then GlusterFS can provide high availability.
Advantages of GlusterFS
- Designed for O(1) scalability and feature rich.
- Aggregates on top of existing filesystems. User can recover the files and folders even without GlusterFS.
- GlusterFS has no single point of failure. Completely distributed. No centralized meta-data server like Lustre.
- Extensible scheduling interface with modules loaded based on user's storage I/O access pattern.
- Modular and extensible through powerful translator mechanism.
- Supports Infiniband RDMA and TCP/IP.
- Entirely implemented in user-space. Easy to port, debug and maintain.
- Scales on demand.
Can GlusterFS store files redundantly?
GlusterFS automatic-file-replication translator does the job.
Minimum system requirements to use GlusterFS?
On the client side, GlusterFS requires FUSE (Filesystem in Userspace http://fuse.sf.net) kernel support. There are no minimum system requirements as such. It can even run within a system on local loopback. However, here are a few suggestions.
- Storage Cluster:
A cluster of 4 or more x86-64 servers with SATA-II RAID and Infiniband interconnect. - NAS Server:
x86-64 platform, 2GB RAM, SATA-II RAID and gigabit ethernet network interface card.
You can vary the amount of RAM, type of interconnect (say Infiniband or GigE), number of processors, number of bricks, amount of storage capacity per brick, type of disk (say Ultra320 SCSI or SATA II)... all depending upon your application needs.
What operating systems are supported by GlusterFS?
GlusterFS server can run on any POSIX compliant OS. GlusterFS client requires FUSE support in your kernel. As of now Linux, FreeBSD, OpenSolaris (work in progress) and Mac OS X kernels are known to support FUSE. We are also planning to introduce LD_PRELOAD'able GlusterFS client for non-FUSE compliant operating systems.
Linux users with the PaX memory protection kernel patches need to run:
# paxctl -psmer /path/to/glusterfs # paxctl -psmer /path/to/glusterfsd
Currently GlusterFS is successfully tested on:
- GNU/Linux - (both client and server)
- FreeBSD - (both client and server)
- OpenSolaris - (server)
- MAC OS X - (both client and server)
Will a client for Windows be made?
Currently the support is through CIFS, (samba export). But we do have it in mind to port it natively using WinFUSE. The work will be undertaken once WinFUSE is stable
What file system semantics does GlusterFS Support; is it fully POSIX compliant?
GlusterFS is fully POSIX compliant.
Will GlusterFS be upward compatible?
Q: Since GlusterFS is so easy to scale, and after large deployments are setup, when new versions of Gluster come out, will I have to bring down my GlusterFS to upgrade, or will Gluster work in a heterogeneous environment where part of the servers can be upgraded to the newer version and some of the servers run older versions?
A: It's true that with newer versions, we are going to introduce new features. But, considering situations like this, we will be handling such situations (backward compatibility) but the user may miss the new features altogether. Hence we recommend that one upgrades to newer versions. Anyway, we will announce the compatibility break in mailing list.
The GlusterFS protocol has a version checking mechanism, which checks for the compatibility of both server and client before making a successful connection.
NOTE: Its very much advised to use the same version of client and server.
How to interpret GlusterFS versions?
MAJOR.MINOR.PATCH-LEVEL.
- MAJOR changes when compatibility is broken. You can imagine NFSv2, NFSv3 or Ext2, Ext3 and Ext4 filesystems.
- MINOR changes for every significant release and then goes into a maintenance phase.
- PATCH-LEVEL changes within a maintenance phase for bug fixes or insignificant changes. If REVISION string contains RC or DOA, that release should not be used in production.
What is a DOA release?
DOA (Dead On Arrival) releases are only for developers and beta testers. We can only guarantee you for loss of data
What does BENKI mean?
Benki means fire in Kannada language. Entire 1.2.x series of GlusterFS has been codenamed BENKI. Every major series of Gluster will adopt an IRC nick name of one of our contributors. Benki is IRC nickname of Basavana Gowda KG.
What does SUSKE mean?
SUSKE is a comic character. Entire 1.3.x series of GlusterFS has been codenamed SUSKE. Every major series of Gluster will adopt an IRC nick name of one of our contributors. SUSKE is IRC nickname of Balamurugan.
I'm getting bad aggregate performance, how do I tune my performance
Q: I see the throughput I'm getting reading and writing to glusterfs. Now how do I figure out how to improve the performance? I'm using io-threads xlator. I've set it to 8 threads. How do I know if I should increase or decrease that number? How will I know if I will get more advantage by adding another client?
A: As there are so many configuration details, where one can improve the throughput, we recommend you to mail to gluster-devel (at) nongnu.org
Q: what are the DIMENSIONS ( of parameters ) to tune, so I can experiment and discover which are most-significant ( with my hardware ), and which are least-significant?
E.g. obviously # of io-threads is one...
( and then we post such rankings here, to save others' time )
( PS: there's a bug in your Wiki/Edit thingy: it insists I've included a new URL, which is daft; also, giving the choice of reCaptcha & arithmetic would be better than either-or .. I'm arithmetic-disabled, damnit )
Can I add my question here?
If you do not find your question answered here and if you think it is a frequently asked question, you may add your question here. One of us will fill in the answer.
Can I edit these wiki pages?
You are most welcome to contribute to GlusterFS documentation.
Note: Anonymous editing of this wiki has been suspended after seeing wide spread vandalism on this site. So please create an account for yourself before editing these pages.

