Global File System
Encyclopedia
In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, the Global File System (GFS) is a shared disk file system for Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 computer clusters. This is not to be confused with the Google File System
Google File System
Google File System is a proprietary distributed file system developed by Google Inc. for its own use. It is designed to provide efficient, reliable access to data using large clusters of commodity hardware...

, a proprietary distributed filesystem developed by Google.

GFS and GFS2 differ from distributed file system
Distributed file system
Network file system may refer to:* A distributed file system, which is accessed over a computer network* Network File System , a specific brand of distributed file system...

s (such as AFS
Andrew file system
The Andrew File System is a distributed networked file system which uses a set of trusted servers to present a homogeneous, location-transparent file name space to all the client workstations. It was developed by Carnegie Mellon University as part of the Andrew Project. It is named after Andrew...

, Coda
Coda (file system)
Coda is a distributed file system developed as a research project at Carnegie Mellon University since 1987 under the direction of Mahadev Satyanarayanan. It descended directly from an older version of AFS and offers many similar features. The InterMezzo file system was inspired by Coda...

, or InterMezzo
InterMezzo (file system)
InterMezzo is an obsolete distributed file system written for Linux, distributed under the GPL. The kernel component is not included in the current 2.6 kernel. It was included in the standard Linux kernel from kernel version 2.4.15 but was dropped from the 2.6 kernel...

) because they allow all nodes to have direct concurrent access to the same shared block storage. In addition, GFS or GFS2 can also be used as a local filesystem.

GFS has no disconnected operating-mode, and no client or server roles. All nodes in a GFS cluster function as peers. Using GFS in a cluster requires hardware to allow access to the shared storage, and a lock manager to control access to the storage.
The lock manager operates as a separate module: thus GFS and GFS2 can use the Distributed Lock Manager
Distributed lock manager
A distributed lock manager provides distributed software applications with a means to synchronize their accesses to shared resources....

 (DLM) for cluster
Cluster (computing)
A computer cluster is a group of linked computers, working together closely thus in many respects forming a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks...

 configurations and the "nolock" lock manager for local filesystems. Older versions of GFS also support GULM, a server based lock manager which implements redundancy via failover.

GFS and GFS2 are free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

, distributed under the terms of the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

.

History

Development of GFS began in 1995 and was originally developed by University of Minnesota
University of Minnesota
The University of Minnesota, Twin Cities is a public research university located in Minneapolis and St. Paul, Minnesota, United States. It is the oldest and largest part of the University of Minnesota system and has the fourth-largest main campus student body in the United States, with 52,557...

 professor Matthew O'Keefe and a group of students. It was originally written for SGI's IRIX
IRIX
IRIX is a computer operating system developed by Silicon Graphics, Inc. to run natively on their 32- and 64-bit MIPS architecture workstations and servers. It was based on UNIX System V with BSD extensions. IRIX was the first operating system to include the XFS file system.The last major version...

 operating system, but in 1998 it was ported to Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 since the open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 code provided a more convenient development platform. In late 1999/early 2000 it made its way to Sistina Software
Sistina Software
Sistina Software was an organization that focused on storage solutions designed around a Linux platform. It was acquired by Red Hat in December, 2003. Their two primary offerings were Global File System and logical volume management .-GFS:...

, where it lived for a time as an open-source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 project. Sometime in 2001 Sistina made the choice to make GFS a commercial product — not under an open-source license.

Developers forked OpenGFS from the last public release of GFS and then further enhanced it to include updates allowing it to work with OpenDLM. But OpenGFS and OpenDLM became defunct, since Red Hat
Red Hat
Red Hat, Inc. is an S&P 500 company in the free and open source software sector, and a major Linux distribution vendor. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina with satellite offices worldwide....

 purchased Sistina in December 2003 and released GFS and many cluster-infrastructure pieces under the GPL
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

 in late June 2004.

Red Hat
Red Hat
Red Hat, Inc. is an S&P 500 company in the free and open source software sector, and a major Linux distribution vendor. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina with satellite offices worldwide....

 subsequently financed further development geared towards bug-fixing and stabilization. A further development, GFS2 derives from GFS and was included along with its distributed lock manager
Distributed lock manager
A distributed lock manager provides distributed software applications with a means to synchronize their accesses to shared resources....

 (shared with GFS) in Linux 2.6.19. Red Hat Enterprise Linux 5.2 included GFS2 as a kernel module for evaluation purposes. With the 5.3 update, GFS2 became part of the kernel package.

, GFS forms part of the Fedora
Fedora (operating system)
Fedora is a RPM-based, general purpose collection of software, including an operating system based on the Linux kernel, developed by the community-supported Fedora Project and sponsored by Red Hat...

, Red Hat Enterprise Linux
Red Hat Enterprise Linux
Red Hat Enterprise Linux is a Linux-based operating system developed by Red Hat and targeted toward the commercial market. Red Hat Enterprise Linux is released in server versions for x86, x86-64, Itanium, PowerPC and IBM System z, and desktop versions for x86 and x86-64...

 5.3 and upwards and associated CentOS
CentOS
CentOS is a free operating system based on Red Hat Enterprise Linux . It exists to provide a free enterprise class computing platform and strives to maintain 100% binary compatibility with its upstream distribution...

 Linux distributions. Users can purchase commercial support to run GFS fully supported on top of Red Hat Enterprise Linux
Red Hat Enterprise Linux
Red Hat Enterprise Linux is a Linux-based operating system developed by Red Hat and targeted toward the commercial market. Red Hat Enterprise Linux is released in server versions for x86, x86-64, Itanium, PowerPC and IBM System z, and desktop versions for x86 and x86-64...

. Since Red Hat Enterprise Linux
Red Hat Enterprise Linux
Red Hat Enterprise Linux is a Linux-based operating system developed by Red Hat and targeted toward the commercial market. Red Hat Enterprise Linux is released in server versions for x86, x86-64, Itanium, PowerPC and IBM System z, and desktop versions for x86 and x86-64...

 version 5.3, Red Hat Enterprise Linux Advanced Platform
Red Hat Enterprise Linux
Red Hat Enterprise Linux is a Linux-based operating system developed by Red Hat and targeted toward the commercial market. Red Hat Enterprise Linux is released in server versions for x86, x86-64, Itanium, PowerPC and IBM System z, and desktop versions for x86 and x86-64...

 has included support for GFS at no additional cost.

The following list summarizes some version numbers and major features introduced:
  • v1.0 (1996) SGI
    Silicon Graphics
    Silicon Graphics, Inc. was a manufacturer of high-performance computing solutions, including computer hardware and software, founded in 1981 by Jim Clark...

     IRIX
    IRIX
    IRIX is a computer operating system developed by Silicon Graphics, Inc. to run natively on their 32- and 64-bit MIPS architecture workstations and servers. It was based on UNIX System V with BSD extensions. IRIX was the first operating system to include the XFS file system.The last major version...

     only
  • v3.0 Linux port
  • v4 journaling
    Journaling file system
    A journaling file system is a file system that keeps track of the changes that will be made in a journal before committing them to the main file system...

  • v5 Redundant Lock Manager
  • v6.1 (2005) Distributed Lock Manager
    Distributed lock manager
    A distributed lock manager provides distributed software applications with a means to synchronize their accesses to shared resources....

  • Linux 2.6.19 - GFS2 and DLM merged into Linux kernel
  • Red Hat Enterprise Linux 5.3
    Red Hat Enterprise Linux
    Red Hat Enterprise Linux is a Linux-based operating system developed by Red Hat and targeted toward the commercial market. Red Hat Enterprise Linux is released in server versions for x86, x86-64, Itanium, PowerPC and IBM System z, and desktop versions for x86 and x86-64...

     releases the first fully supported GFS2

Hardware

The design of GFS and of GFS2 targets SAN
Storage area network
A storage area network is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices...

-like environments. Although it is possible to use them as a single node filesystem, the full feature-set requires a SAN. This can take the form of iSCSI
ISCSI
In computing, iSCSI , is an abbreviation of Internet Small Computer System Interface, an Internet Protocol -based storage networking standard for linking data storage facilities. By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage...

, FibreChannel, AoE
ATA over Ethernet
ATA over Ethernet is a network protocol developed by the Brantley Coile Company, designed for simple, high-performance access of SATA storage devices over Ethernet networks. It is used to build storage area networks with low-cost, standard technologies.- Protocol description :AoE runs on layer 2...

, or any other device which can be presented under Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 as a block device shared by a number of nodes, for example a DRBD
DRBD
DRBD is a distributed storage system for the GNU/Linux platform. It consists of a kernel module, several userspace management applications and some shell scripts and is normally used on high availability clusters...

 device.

The DLM
Distributed lock manager
A distributed lock manager provides distributed software applications with a means to synchronize their accesses to shared resources....

 requires an IP
Internet Protocol
The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...

 based network over which to communicate. This is normally just Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

, but again, there are many other possible solutions. Depending upon the choice of SAN, it may be possible to combine this, but normal practice involves separate networks for the DLM
Distributed lock manager
A distributed lock manager provides distributed software applications with a means to synchronize their accesses to shared resources....

 and storage.

The GFS requires fencing
Fencing (computing)
Fencing is the process of isolating a node of a computer cluster when the former is malfunctioning. Isolating a node means ensuring that I/O can no longer be done from it. Fencing is typically done automatically, by cluster infrastructure such as shared disk file systems, in order to protect...

 hardware of some kind. This is a requirement of the cluster
infrastructure, rather than GFS/GFS2 itself, but it is required for all multi-node clusters. The usual options include power switches and remote access controllers (e.g. DRAC
DRAC
In computing, the Dell Remote Access Controller or DRAC, an interface card from Dell Inc, provides out-of-band management facilities. The controller has its own processor, memory, network connection, and access to the system bus...

, IPMI, or ILO). Fencing
Fencing (computing)
Fencing is the process of isolating a node of a computer cluster when the former is malfunctioning. Isolating a node means ensuring that I/O can no longer be done from it. Fencing is typically done automatically, by cluster infrastructure such as shared disk file systems, in order to protect...

 is used to ensure that a node which the cluster believes to be failed cannot suddenly start working again while another node is recovering the journal for the failed node. It can also optionally restart the failed node automatically once the recovery is complete.

Differences from a local filesystem

Although the designers of GFS/GFS2 aimed to emulate a local filesystem closely, there are a number of differences to be aware of. Some of these are due to the existing filesystem interfaces not allowing the passing of information relating to the cluster. Some stem from the difficulty of implementing those features efficiently in a clustered manner. For example:
  • The flock system call on GFS/GFS2 is not interruptible by signals
    Signal (computing)
    A signal is a limited form of inter-process communication used in Unix, Unix-like, and other POSIX-compliant operating systems. Essentially it is an asynchronous notification sent to a process in order to notify it of an event that occurred. When a signal is sent to a process, the operating system...

    .
  • The fcntl F_GETLK system call returns a PID of any blocking lock. Since this is a cluster filesystem, that PID might refer to a process on any of the nodes which have the filesystem mounted. Since the purpose of this interface is to allow a signal to be sent to the blocking process, this is no longer possible.
  • Leases are not supported with the lock_dlm (cluster) lock module, but they are supported when used as a local filesystem
  • dnotify
    Dnotify
    dnotify is a file system event monitor for the Linux kernel, one of the subfeatures of the fcntl call. It was introduced in the 2.4 kernel series...

     will work on a "same node" basis, but its use with GFS/GFS2 is not recommended
  • inotify
    Inotify
    inotify is a Linux kernel subsystem that acts to extend filesystems to notice changes to the filesystem, and report those changes to applications. It replaces an earlier facility, dnotify, which had similar goals....

     will also work on a "same node" basis, and is also not recommended (but it may become supported in the future)
  • splice
    Splice (system call)
    splice is a Linux-specific system call that moves data between a file descriptor and a pipe without a round trip to user space. The related system call vmsplice moves or copies data between a pipe and user space. Ideally, splice and vmsplice work by remapping pages and do not actually copy any...

     is supported on GFS2 only


The other main difference, and one that is shared by all similar cluster filesystems, is that the cache control mechanism, known as glocks for GFS/GFS2, has an effect across the whole cluster. Each inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

 on the filesystem has two glocks associated with it. One (called the iopen glock) keeps track of which processes have the inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

 open. The other (the inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

 glock) controls the cache relating to that inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

. A glock has four states, UN (unlocked), SH (shared - a read lock), DF (deferred - a read lock incompatible with SH) and EX (exclusive). Each of the four modes maps directly to a DLM
Distributed lock manager
A distributed lock manager provides distributed software applications with a means to synchronize their accesses to shared resources....

 lock mode.

When in EX mode, an inode is allowed to cache data and metadata (which might be "dirty", i.e. waiting for write back to the filesystem). In SH mode, the inode can cache data and metadata, but it must not be dirty. In DF mode, the inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

 is allowed to cache metadata only, and again it must not be dirty. The DF mode is used only for direct I/O. In UN mode, the inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

 must not cache any metadata.

In order that operations which change an inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

's data or metadata do not interfere with each other, an EX lock is used. This means that certain operations, such as create/unlink of files from the same directory and writes to the same file should be, in general, restricted to one node in the cluster. Of course, doing these operations from multiple nodes will work as expected, but due to the requirement to flush caches frequently, it will not be very efficient.

The single most frequently asked question about GFS/GFS2 performance is why the performance can be poor with email servers. It should be reasonably obvious from the above that the solution is to break up the mail spool into separate directories and to try to keep (so far as is possible) each node reading and writing to a private set of directories.

Journaling

GFS and GFS2 are both journaled filesystems; and GFS2 supports a similar set of journaling modes as ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

. In data=writeback mode, only metadata is journaled. This is the only mode supported by GFS, however it is possible to turn on journaling on individual data-files, but only when they are of zero size. Journaled files in GFS have a number of restrictions placed upon them, such as
no support for the mmap or sendfile system calls, they also use a different on-disk format from regular files. There is also an "inherit-journal" attribute which when set on a directory causes all files (and sub-directories) created within that directory to have the journal (or inherit-journal, respectively) flag set. This can be used instead of the data=journal mount option which ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

 supports (and GFS/GFS2 doesn't).

GFS2 also supports data=ordered mode which is similar to data=writeback except that dirty data is synced before each journal flush is completed. This ensures that blocks which have been added to an inode will have their content synced back to disk before the metadata is updated to record the new size and thus prevents uninitialised blocks appearing in a file under node failure conditions. The default journaling mode is data=ordered, to match ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

's default.

GFS2 does not yet support data=journal mode, but it does (unlike GFS) use the same on-disk format for both regular and journaled files, and it also supports the same journaled and inherit-journal attributes. GFS2 also relaxes the restrictions on when a file may have its journaled attribute changed to any time that the file is not open (also the same as ext3
Ext3
The ext3 or third extended filesystem is a journaled file system that is commonly used by the Linux kernel. It is the default file system for many popular Linux distributions, including Debian...

).

For performance reasons, each node in GFS and GFS2 has its own journal. In GFS the journals are disk extents, in GFS2 the journals are just regular files. The number of nodes which may mount the filesystem at any one time is limited by the number of available journals.

Features of GFS2 compared with GFS

GFS2 adds a number of new features which are not in GFS. Here is a summary of those features not already mentioned in the boxes to the right of this page:
  • The metadata filesystem (really a different root) - see Compatibility and the GFS2 meta filesystem below
  • GFS2 specific trace points have been available since kernel 2.6.32
  • The XFS-style quota interface has been available in GFS2 since kernel 2.6.33
  • Caching ACLs have been available in GFS2 since 2.6.33
  • GFS2 supports the generation of "discard" requests for thin provisioning/SCSI TRIM requests
  • GFS2 supports I/O barriers (on by default, assuming underlying device supports it. Configurable from kernel 2.6.33 and up)
  • FIEMAP ioctl (to query mappings of inodes on disk)
  • Splice (system call)
    Splice (system call)
    splice is a Linux-specific system call that moves data between a file descriptor and a pipe without a round trip to user space. The related system call vmsplice moves or copies data between a pipe and user space. Ideally, splice and vmsplice work by remapping pages and do not actually copy any...

     support
  • mmap/splice support for journaled files (enabled by using the same on disk format as for regular files)
  • Far fewer tweekables (making set-up less complicated)
  • Ordered write mode (as per ext3, GFS only has writeback mode)

Compatibility and the GFS2 meta filesystem

GFS2 was designed so that upgrading from GFS would be a simple procedure. To this end, most of the on-disk structure has remained the same as GFS, including the big-endian byte ordering. There are a few differences though:
  • GFS2 has a "meta filesystem" through which processes access system files
  • GFS2 uses the same on-disk format for journaled files as for regular files
  • GFS2 uses regular (system) files for journals, whereas GFS uses special extents
  • GFS2 has some other "per_node" system files
  • The layout of the inode
    Inode
    In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

     is (very slightly) different
  • The layout of indirect blocks differs slightly


The journaling systems of GFS and GFS2 are not compatible with each other. Upgrading is possible by means of a tool (gfs2_convert) which is run with the filesystem off-line to update the metadata. Some spare blocks in the GFS journals are used to create the (very small) per_node files required by GFS2 during the update process. Most of the data remains in place.

The GFS2 "meta filesystem" is not a filesystem in its own right, but an alternate root
Root directory
In computer file systems, the root directory is the first or top-most directory in a hierarchy. It can be likened to the root of a tree — the starting point where all branches originate.-Metaphor:...

 of the main filesystem. Although it behaves like a "normal" filesystem, its contents are the various system files used by GFS2, and normally users do not need to ever look at it. The GFS2 utilities mount
Mount (computing)
Mounting takes place before a computer can use any kind of storage device . The user or their operating system must make it accessible through the computer's file system. A user can access only files on mounted media.- Mount point :A mount point is a physical location in the partition used as a...

 and unmount the meta filesystem as required, behind the scenes.

See also

  • Comparison of file systems
    Comparison of file systems
    -General information:-Limits:-Metadata:-Features:-Allocation and layout policies:-Supporting operating systems:-See also:* Comparison of archive formats* Comparison of file archivers* List of archive formats* List of file archivers...

  • GPFS
    General Parallel File System
    The General Parallel File System is a high-performance shared-disk clustered file system developed by IBM. It is used by some of the supercomputers on the Top 500 List...

    , ZFS
    ZFS
    In computing, ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include data integrity verification against data corruption modes , support for high storage capacities, integration of the concepts of filesystem and volume management,...

  • Lustre
    Lustre (file system)
    Lustre is a massively parallel distributed file system, generally used for large scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster...

  • GlusterFS
    GlusterFS
    GlusterFS is a scale-out NAS file system developed by Gluster. It aggregates various storage servers over Ethernet or Infiniband RDMA interconnect into one large parallel network file system. GlusterFS is based on a stackable user space design without compromising performance. It has found a...

  • List of file systems
  • Oracle cluster file system
    OCFS
    OCFS is a shared disk file system developed by Oracle Corporation and released under the GNU General Public License....

     (OCFS)
  • QFS
    QFS
    QFS is an open source filesystem from Sun Microsystems. It is tightly integrated with SAM, the Storage and Archive Manager, and hence is often referred to as SAM-QFS. SAM provides the functionality of a Hierarchical Storage Manager....

  • SAN file system
  • Fencing
    Fencing (computing)
    Fencing is the process of isolating a node of a computer cluster when the former is malfunctioning. Isolating a node means ensuring that I/O can no longer be done from it. Fencing is typically done automatically, by cluster infrastructure such as shared disk file systems, in order to protect...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK