XFS
Encyclopedia
XFS is a high-performance journaling file system
Journaling file system
A journaling file system is a file system that keeps track of the changes that will be made in a journal before committing them to the main file system...

 created by Silicon Graphics, Inc. It is the default file system in IRIX
IRIX
IRIX is a computer operating system developed by Silicon Graphics, Inc. to run natively on their 32- and 64-bit MIPS architecture workstations and servers. It was based on UNIX System V with BSD extensions. IRIX was the first operating system to include the XFS file system.The last major version...

 releases 5.3 and onwards and later ported to the Linux kernel
Linux kernel
The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....

. XFS is particularly proficient at parallel IO due to its allocation group based design. This enables extreme scalability of IO threads, filesystem bandwidth, file and filesystem size when spanning multiple storage devices. A typical XFS user site, NASA Advanced Supercomputing Division takes advantage of these capabilities, deploying two 300+ terabyte XFS filesystems on two SGI Altix archival storage servers, each direct attached to multiple fiber channel disk arrays.

History

Development of XFS was started by Silicon Graphics in 1993, with first deployment being seen on IRIX
IRIX
IRIX is a computer operating system developed by Silicon Graphics, Inc. to run natively on their 32- and 64-bit MIPS architecture workstations and servers. It was based on UNIX System V with BSD extensions. IRIX was the first operating system to include the XFS file system.The last major version...

 5.3 in 1994. The filesystem was released under the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

 in May 2000, and ported to Linux, with the first distribution support appearing in 2001. It later became available in almost all GNU/Linux distributions.

XFS was first merged into the mainline Linux kernel in version 2.4 (around 2002), making it almost universally available on Linux systems.
Gentoo Linux
Gentoo Linux
Gentoo Linux is a computer operating system built on top of the Linux kernel and based on the Portage package management system. It is distributed as free and open source software. Unlike a conventional software distribution, the user compiles the source code locally according to their chosen...

 was an early user by mid-2002.
Installation programs for the Arch
Arch Linux
Arch Linux is an independently developed, Linux-based operating system for i686 and x86-64 computers. It is composed predominantly of free and open source software, and supports community involvement....

, Debian
Debian
Debian is a computer operating system composed of software packages released as free and open source software primarily under the GNU General Public License along with other free software licenses. Debian GNU/Linux, which includes the GNU OS tools and Linux kernel, is a popular and influential...

, Fedora
Fedora (operating system)
Fedora is a RPM-based, general purpose collection of software, including an operating system based on the Linux kernel, developed by the community-supported Fedora Project and sponsored by Red Hat...

, openSUSE
OpenSUSE
openSUSE is a general purpose operating system built on top of the Linux kernel, developed by the community-supported openSUSE Project and sponsored by SUSE...

, Kate OS
Kate OS
KateOS is a Linux distribution originally based on Slackware. It is designed for intermediate users. Its package management system uses so called TGZex packages, which unlike Slackware packages support dependency tracking , internationalized descriptions, and are much easier to update...

, Mandriva
Mandriva
Mandriva S.A. is a publicly traded Linux and open source software company with its headquarters in Paris, France and development center in Curitiba, Brazil. Mandriva, S.A...

, Slackware
Slackware
Slackware is a free and open source Linux-based operating system. It was one of the earliest operating systems to be built on top of the Linux kernel and is the oldest currently being maintained. Slackware was created by Patrick Volkerding of Slackware Linux, Inc. in 1993...

, Ubuntu, VectorLinux and Zenwalk Linux distributions all offer XFS as a choice of filesystem, but few of these let the user create XFS for the /boot filesystems due to deficiencies and unpredictable behavior in GRUB
GNU GRUB
GNU GRUB is a boot loader package from the GNU Project. GRUB is the reference implementation of the Multiboot Specification, which provides a user the choice to boot one of multiple operating systems installed on a computer or select a specific kernel configuration available on a particular...

, often the default bootloader. FreeBSD gained read-only
Read-only
In computing, read-only can mean:* Read-only memory , a type of storage media* Read-only access to files or directories in file system permissions...

 support for XFS in December 2005 and in June 2006 experimental write support was introduced; however this is supposed to be used only as an aid in migration from Linux, not to be used as a "main" filesystem. The 64-bit Red Hat Enterprise Linux
Red Hat Enterprise Linux
Red Hat Enterprise Linux is a Linux-based operating system developed by Red Hat and targeted toward the commercial market. Red Hat Enterprise Linux is released in server versions for x86, x86-64, Itanium, PowerPC and IBM System z, and desktop versions for x86 and x86-64...

 5.4 distribution in 2009 had all needed kernel support but did not include command line tools for creating and using xfs filesystems. The tools from CentOS
CentOS
CentOS is a free operating system based on Red Hat Enterprise Linux . It exists to provide a free enterprise class computing platform and strives to maintain 100% binary compatibility with its upstream distribution...

 worked, or were provided to customers on request. It was included in the 2010 release of Red Hat Enterprise Linux 6.0.

Capacity

XFS is a 64-bit
64-bit
64-bit is a word size that defines certain classes of computer architecture, buses, memory and CPUs, and by extension the software that runs on them. 64-bit CPUs have existed in supercomputers since the 1970s and in RISC-based workstations and servers since the early 1990s...

 file system. It supports a maximum file system size of 8 exbibyte
Exbibyte
The exbibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The exbibyte unit symbol is EiB....

s minus one byte, though this is subject to block limits imposed by the host operating system. On 32-bit
32-bit
The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....

 Linux systems, this limits the file and file system sizes to 16 tebibyte
Tebibyte
The tebibyte is a standards-based binary multiple of the byte, a unit of digital information storage. The tebibyte unit symbol is TiB....

s.

Journaling

Journaling is an approach to guaranteeing file system consistency even in presence of power failures or system crashes. XFS provides journaling for file system metadata, where file system updates are first written to a serial journal before the actual disk blocks are updated. The journal is a circular buffer of disk blocks that is never read in normal filesystem operation. The XFS journal is limited to a maximum size of both 64k blocks and 128MB with the minimum size dependent upon a calculation of the filesystem block size and directory block size. Placing the journal on an external device larger than the maximum journal size will cause the extra space to be unused. It can be stored within the data section of the filesystem (an internal log), or on a separate device to minimize disk contention. On XFS the journal contains “logical” entries that describe at a high level what operations are being performed (as opposed to a “physical” journal that stores a copy of the blocks modified during each transaction). Journal updates are performed asynchronously to avoid incurring a performance penalty. In the event of a system crash, operations immediately prior to the crash can be redone using data in the journal, which allows XFS to retain file system consistency. Recovery is performed automatically at file system mount time, and the recovery speed is independent of the size of the file system. Where recently modified data has not been flushed to disk before a system crash, XFS ensures that any unwritten data blocks are zeroed on reboot, obviating any possible security issues arising from residual data (as far as access through the filesystem interface is concerned, as distinct from accessing the raw device or raw hardware).

Allocation groups

XFS filesystems are internally partitioned into allocation groups, which are equally sized linear regions within the file system. File
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

s and directories can span allocation groups. Each allocation group manages its own inode
Inode
In computing, an inode is a data structure on a traditional Unix-style file system such as UFS. An inode stores all the information about a regular file, directory, or other file system object, except its data and name....

s and free space separately, providing scalability and parallelism — multiple threads and processes can perform I/O operations on the same filesystem simultaneously. This architecture helps to optimize parallel I/O performance on multiprocessor or multicore systems, as metadata updates are also parallelizable. The internal partitioning provided by allocation groups can be especially beneficial when the file system spans multiple physical devices, allowing for optimal usage of throughput of the underlying storage components.

Striped allocation

If an XFS filesystem is to be created on a striped RAID array, a stripe
Data striping
In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, in a way that accesses of sequential segments are made to different physical storage devices. Striping is useful when a processing device requests access to data more quickly than a...

 unit
can be specified when the file system is created. This maximises throughput by ensuring that data allocations, inode allocations and the internal log (journal) are aligned with the stripe unit.

Extent based allocation

Blocks used in files stored on XFS filesystems are managed with variable length extents where one extent describes one or more contiguous blocks. This can shorten the list considerably compared to file systems that list all blocks used by a file individually. Also many file systems manage space allocation with one or more block oriented bitmaps — in XFS these structures are replaced with an extent oriented structure consisting of a pair of B+ tree
B+ tree
In computer science, a B+ tree or B plus tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of...

s for each filesystem allocation group (AG). One of the B+ tree
B+ tree
In computer science, a B+ tree or B plus tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of...

s is indexed by the length of the free extents, while the other is indexed by the starting block of the free extents. This dual indexing scheme allows for highly efficient location of free extents for file system operations.

Variable block sizes

The file system block size represents the minimum allocation unit. XFS allows file systems to be created with block sizes ranging between 512 bytes and 64 kilobytes, allowing the file system to be tuned for the expected use. When many small files are expected a small block size would typically maximize capacity, but for a system dealing mainly with large files, a larger block size can provide a performance advantage.

Delayed allocation

XFS makes use of lazy evaluation techniques for file allocation. When a file is written to the buffer cache, rather than allocating extents for the data, XFS simply reserves the appropriate number of file system blocks for the data held in memory. The actual block allocation occurs only when the data is finally flushed to disk. This improves the chance that the file will be written in a contiguous group of blocks, reducing fragmentation
File system fragmentation
In computing, file system fragmentation, sometimes called file system aging, is the inability of a file system to lay out related data sequentially , an inherent phenomenon in storage-backed file systems that allow in-place modification of their contents. It is a special case of data fragmentation...

 problems and increasing performance.

Sparse files

XFS provides a 64-bit sparse address space for each file, which allows both for very large file sizes, and for holes within files for which no disk space is allocated. As the file system uses an extent map for each file, the file allocation map size is kept small. Where the size of the allocation map is too large for it to be stored within the inode, the map is moved into a B+ tree
B+ tree
In computer science, a B+ tree or B plus tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of...

 which allows for rapid access to data anywhere in the 64-bit address space provided for the file.

Extended attributes

XFS provides multiple data streams for files through its implementation of extended attributes
Extended file attributes
Extended file attributes is a file system feature that enables users to associate computer files with metadata not interpreted by the filesystem, whereas regular attributes have a purpose strictly defined by the filesystem...

. These allow the storage of a number of name/value pairs attached to a file. Names are null-terminated printable character strings of up to 256 bytes in length, while their associated values can contain up to 64 KB
Kilobyte
The kilobyte is a multiple of the unit byte for digital information. Although the prefix kilo- means 1000, the term kilobyte and symbol KB have historically been used to refer to either 1024 bytes or 1000 bytes, dependent upon context, in the fields of computer science and information...

 of binary data. They are further subdivided into two namespaces, root and user. Extended attributes stored in the root namespace can be modified only by the superuser, while attributes in the user namespace can be modified by any user with permission to write to the file. Extended attributes can be attached to any kind of XFS inode, including symbolic links, device nodes, directories, etc. The attr program can be used to manipulate extended attributes from the command line, and the xfsdump and xfsrestore utilities are aware of them and will back up and restore their contents. Most other backup systems are not aware of extended attributes.

Direct I/O

For applications requiring high throughput to disk, XFS provides a direct I/O implementation that allows non-cached I/O directly to userspace. Data is transferred between the application's buffer and the disk using DMA
Direct memory access
Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....

, which allows access to the full I/O bandwidth of the underlying disk devices.

Guaranteed-rate I/O

The XFS guaranteed-rate I/O system provides an API that allows applications to reserve bandwidth to the filesystem. XFS will dynamically calculate the performance available from the underlying storage devices, and will reserve bandwidth sufficient to meet the requested performance for a specified time. This feature is unique to the XFS file system. Guarantees can be hard or soft, representing a trade off between reliability and performance, though XFS will only allow hard guarantees if the underlying storage subsystem supports it. This facility is most used by real-time applications, such as video-streaming.

DMAPI

XFS implemented the DMAPI
DMAPI
Data Management API is the interface defined in the X/Open document "Systems Management: Data Storage Management API" dated February 1997. XFS, IBM JFS, VxFS, AdvFS and GPFS file systems support DMAPI for Hierarchical Storage Management ....

 interface to support Hierarchical Storage Management
Hierarchical storage management
Hierarchical storage management is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive than slower devices, such as optical discs and magnetic...

 in IRIX. As of October 2010, the Linux implementation of XFS supported the required on-disk metadata for DMAPI implementation, but the kernel support was reportedly not usable. For some time, SGI hosted a kernel tree which included the DMAPI hooks, but this support has not been adequately maintained, though kernel developers stated an intention to bring it up to date.

Snapshots

XFS does not provide direct support for snapshots, as it expects the snapshot process to be implemented by the volume manager. Taking a snapshot of an XFS filesystem involves freezing I/O to the filesystem using the xfs_freeze utility, having the volume manager perform the actual snapshot, and then unfreezing I/O to resume normal operations. The snapshot can then be mounted read-only for backup purposes. XFS releases on IRIX incorporated an integrated volume manager called XLV. This volume manager has not been ported to Linux and XFS works with standard LVM
Logical Volume Manager (Linux)
LVM is a logical volume manager for the Linux kernel; it manages disk drives and similar mass-storage devices, in particular large ones. The term "volume" refers to a disk drive or partition thereof...

 instead. In recent Linux kernels, the xfs_freeze functionality is implemented in the VFS layer, and happens automatically when the Volume Manager's snapshot functionality is invoked. This was once a valuable advantage as Ext3 system could not be suspended and volume manager was unable to create a consistent 'hot' snapshot to back up a heavily busy database. Fortunately this is no longer the case. Since Linux 2.6.29 ext3, ext4, gfs2 and jfs have the freeze feature as well.

Online defragmentation

Although the extent-based nature of XFS and the delayed allocation strategy it uses significantly improves the file system's resistance to fragmentation problems, XFS provides a filesystem defragmentation
Defragmentation
In the maintenance of file systems, defragmentation is a process that reduces the amount of fragmentation. It does this by physically organizing the contents of the mass storage device used to store files into the smallest number of contiguous regions . It also attempts to create larger regions of...

 utility (xfs_fsr, short for XFS filesystem reorganizer) that can defragment the files on a mounted and active XFS filesystem.

Online resizing

XFS provides the xfs_growfs utility to perform online resizing of XFS file systems. XFS filesystems can be grown provided there is remaining unallocated space on the device holding the filesystem. This feature is typically used in conjunction with volume management, as otherwise the partition
Disk partitioning
Disk partitioning is the act of dividing a hard disk drive into multiple logical storage units referred to as partitions, to treat one physical disk drive as if it were multiple disks. Partitions are also termed "slices" for operating systems based on BSD, Solaris or GNU Hurd...

 holding the filesystem will need enlarging separately. XFS partitions cannot (as of August 2010) be shrunk in place, although several possible workarounds have been discussed.

Native backup/restore utilities

XFS provides the xfsdump and xfsrestore utilities to aid in backup of data held on XFS file systems. The xfsdump utility backs up an XFS filesystem in inode order, and in contrast to traditional UNIX file systems which must be unmounted before dumping to guarantee a consistent dump image, XFS file systems can be dumped while the file system is in use. This is not the same as a snapshot since files are not frozen during the dump. XFS dumps and restores are also resumable, and can be interrupted without difficulty. The multi-threaded operation of xfsdump provides high performance of backup operations by splitting the dump into multiple streams, which can be sent to different dump destinations. The multi stream capabilities have not been fully ported to Linux yet, however.

Atomic disk quotas

Quotas for XFS filesystems are turned on when initially mounted; this fixes a race window that is present with most other filesystems that first require to be mounted and where no quotas are enforced until quotaon(8) is called.

Write barriers

XFS filesystems mount by default with "write barriers" enabled. This feature will cause the write back cache of the underlying storage device to be flushed at appropriate times, particularly on write operations to the XFS log. This feature is intended to assure filesystem consistency, and its implementation is device specific - not all underlying hardware will support cache flush requests. When an XFS filesystem is used on a logical device provided by a hardware RAID controller with battery backed cache, this feature can cause significant performance degradation, as the filesystem code is not aware that the cache is nonvolatile, and if the controller honors the flush requests, data will be written to physical disk more often than necessary. To avoid this problem, where the data in the device cache is protected from power failure or other host problems, the filesystem should be mounted with the "nobarrier" option.

Journal placement

By default, XFS filesystems are created with an "internal" log, which places the filesystem journal on the same block device as the filesystem data. Filesystem writes are preceded by metadata updates to the journal, which can be a cause of disk contention. Under most workloads, the level of contention caused is too low to impact performance, but random-write heavy workloads, such as those seen on busy database servers, can suffer from sub-optimal performance as a result of this I/O contention. An additional factor which can increase the severity of this problem is that writes to the journal are committed synchronously—they must complete successfully before the associated write operation can begin.

Where optimum filesystem performance is required, XFS provides the option of placing the log on a separate physical device, with its own I/O path. This requires little physical space, and if a low-latency path can be provided for synchronous writes, it can provide significant performance enhancements to the operation of the filesystem. The required performance characteristics make this a suitable candidate for the use of an solid-state drive
Solid-state drive
A solid-state drive , sometimes called a solid-state disk or electronic disk, is a data storage device that uses solid-state memory to store persistent data with the intention of providing access in the same manner of a traditional block i/o hard disk drive...

 (SSD) device, or a RAID system with write-back cache, though the latter can reduce data safety in the event of power problems. The use of an external log simply requires the filesystem to be mounted with the logdev option, indicating a suitable journal device.

Disadvantages

  • An XFS file system cannot be shrunk.
  • Metadata operations in XFS have historically been slower than with other file systems, yielding poor performance with operations such as mass deletions of large numbers of files. This metadata performance issue has been addressed by code created by Red Hat XFS developer Dave Chinner. The feature, a mount option known as "delayed logging", increases the performance of metadata operations by many orders of magnitude, by pushing them almost entirely into memory. The patch was included in the mainline kernel as an experimental feature in 2.6.35, is a stable feature of 2.6.37, and is the default journal logging method in Linux 2.6.39. Testing by the developer in 2010 showed performance to be similar to ext4
    Ext4
    The ext4 or fourth extended filesystem is a journaling file system for Linux, developed as the successor to ext3.It was born as a series of backward compatible extensions to ext3, many of them originally developed by Cluster File Systems for the Lustre file system between 2003 and 2006, meant to...

     at low thread counts, and superior to ext4 at high thread counts.

See also

  • Comparison of file systems
    Comparison of file systems
    -General information:-Limits:-Metadata:-Features:-Allocation and layout policies:-Supporting operating systems:-See also:* Comparison of archive formats* Comparison of file archivers* List of archive formats* List of file archivers...

  • CXFS
    CXFS
    The CXFS file system is a proprietary shared disk file system designed by Silicon Graphics specifically to be used in a Storage area network environment....

    , the clustered form of XFS
  • List of file systems

External links

  • XFS.org, community wiki
  • SGI.com, a high-performance journaling filesystem
  • Madpenguin.org, File System Design part 1: XFS by Narayan Newton on madpenguin.org
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK