Google Summer of Code 2010 Ideas
From GlusterDocumentation
Gluster is applying for participation in Google Summer of Code 2010 program. We are excited to participate for the first time. Here is a list of our ideas ...
Gluster Storage Platform
Automated test suite for UI
Develop automated test framework for Gluster Storage Platform UI testing. Tests should include functionality, accessibility, security and browser compatibility.
Prerequisites: Good understanding of web application testing is ad added advantage. Web application programming knowledge essential.
Gluster File System
/proc-like interface to GlusterFS
This project involves exposing the internal data structures of GlusterFS through a /proc-like interface. Some of the things that can be accomplished using such an interface are:
- Expose other views of the filesystem. For example, each of the backends of a replicate pair can be individually shown, or the backends that are part of a single namespace can be exposed.
- Expose all currently open files and the per-xlator state associated with them.
- Expose the entire inode table and the per-xlator state associated with them.
- Support dynamically changing translator parameters (think "echo 3 > /proc/sys/vm/drop_caches").
- Expose latency and other statistical data about file operations. This would allow profiling tools to easily parse and graph this data.
This is an open-ended project, and the more ideas the better.
Pre-requisites: Good C skills, and a good imagination.
Wireshark plugin for the GlusterFS protocol
Wireshark is a popular open-source protocol analyzer (http://www.wireshark.org/). It would be nice to have a plugin that allows Wireshark to interpret the GlusterFS binary protocol. There are already many plugins available which analyze protocols such as CIFS, NFS, etc. A GlusterFS plugin would be very useful in debugging performance issues and other problems.
Pre-requisites: Good C programming experience. Familiarity with Wireshark will be an advantage.
Implement ACL support for GlusterFS
Access Control Lists (ACL) are a powerful mechanism for fine-grained access control for file systems. This project aims at bringing in support for ACLs into GlusterFS while maintaining compatibility or at least extensibility for supporting CIFS, Posix and NFSv4 style ACLs. The candidate will be required to investigate these existing ACL models, design and implement the selected ACL model for GlusterFS.
This project will expose candidates to access control and security mechanisms not just for disk-based file systems but also on a clustered file system including exposure to global standards like CIFS and NFSv4.
Prerequisites: Good C programming experience. Familiarity with ACL on shared filesystems will be an advantage
Reed Solomon algorithm based M:N redundancy translator
This features adds redundancy ability to GlusterFS such that M nodes worth data is stored across N nodes, where N > M, and N-M nodes worth of data is "redundancy" data such that, any of the N-M nodes can fail at a given time.
Prerequisites: Good C programming experience.
Compression translator
This feature stores data on GlusterFS in a compressed manner and uncompresses on the fly to present data as "normal" files and folders. Different data compression algorithms should be selectable as configuration options.
Prerequisites: Good C programming experience. Familiarity with various compression algorithms is desired.
Encryption translator
Encryption is a critical component of privacy protection measures. This project aims at bringing in encryption support for GlusterFS file system's read-write path. The candidate will have to investigate various cryptographic algorithms and implement or evaluate any existing implementations in order to select the most suitable ones for a clustered file system. Encryption support will be incorporated in a way that allows storage admins to select the privacy levels and protection strength through configurable options and algorithms. Implementation must include key management as well for integration with network- or organization-wide identity management.
Prerequisites: Good C programming experience. Familiarity with encryption algorithms and key management is an added advantage.
Monitoring Tools
Explore a framework to monitor system calls, throughput, availability, diskspace, concurrent clients, server load, CPU/Memory usage etc for GlusterFS. Plugins for various open source monitoring tools such as Nagios, Zenoss, Groundwork Opensource, Munin is essential.
Prerequisites: Good programming skills is essential. Network/Administration experience is an advantage.
System Statistics and Performance Framework
Review GlusterFS code and determine performance bottlenecks by implementing probes (similar to SystemTap or Dprobes). A complete analysis of necessary parameters such as latencies, throughput to improve performance with minimum overhead is essential.
Prerequisites: Good C programming skills.
Migration Toolkit
Develop a set of tools, methodology where migration of data, users, and configuration (like replication) can be performed from Lustre, Hadoop etc to Gluster without compromising on functionality. since it is not always possible to have a complete second set of hardware available all the time, and also not possible to bring the storage down for extended period of time, the methodology should address phased migration. Possible co-existence of both filesystems while the migration is underway is recommended. Tools must include a method to validate successful migration.
Prerequisites: Familiarity with Hadoop, Lustre is an advantage.
GlusterFS native Windows port
The applicant should create a native Windows port of GlusterFS using the Dokan userspace filesystem framework. This consists of the following major sub-tasks:
- a "glue" component to convert in between Windows (path-based) and POSIX (inode-based) filesystem semantics
- make Glusterfs compilable on Windows (possibly relying on POSIX compatibility frameworks like Cygwin)
- make it possible to add Glusterfs as a Windows drive by means of creating a Dokan filesystem server component
Prerequisites: Good programming skills are essential for 1. Familiarity with filesystems is an advantage for 1. Familiarity with development on the Windows platform is an advantage for 2. and 3.
inode based filesystem backend
The applicant is expected to design and implement an experimental low-level filesystem format, based on inodes. That is, in terms of structure it should resemble disk based filesystems; however, unlike disk based filesystems, there is no constraint to implement it within a fixed image device or file (e.g., a directory type inode can be represented by a real directory, inodes contained in the directory can be represented by symlinks in this directory).
- propose a design of the filesystem format, with taking care about atomicity of operations where it's possible, at least, safety of operations where atomicity is not possible (i.e. design should be resilient to data loss), and extensibility
- implement a prototype storage translator for this filesystem format
- implement utilities for migration (from and to) and maintainance (alternatively, such functionality can be included in the translator)
Prerequisites: Good programming skills, insight for design work will be helpful, familiarity with filesystems is an advantage.
Distributed lsof/fuser/showmount command for GlusterFS
Many a times, a process might keep a file open from a client preventing a user from deleting this file or take the volume offline. When we encounter this on a local system, its easy to check the process holding the file open and can be stopped by using the lsof of fuser command.
We want a distributed lsof/fuser/showmount command to do this for GlusterFS since GlusterFS keeps track of client/pid pairs for files.
Prerequisites: Good C programming skills.
Questions? Email us at gsoc@gluster.com or post them on our development mailinglist at gluster-devel mailinglist. You can also find one of us (nick: thom[p]son) on the GSoC IRC channel #gsoc or #gluster on Freenode.


