Copy-on-write
Encyclopedia
Copy-on-write is an optimization
strategy used in computer programming
. The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, they can all be given pointers to the same resource. This function can be maintained until a caller tries to modify its "copy" of the resource, at which point a true private copy is created to prevent the changes becoming visible to everyone else. All of this happens transparently
to the callers. The primary advantage is that if a caller never makes any modifications, no private copy need ever be created.
operating system
s; when a process creates a copy of itself, the pages in memory
that might be modified by either the process or its copy are marked copy-on-write. When one process modifies the memory, the operating system's kernel intercepts the operation and copies the memory so that changes in one process's memory are not visible to the other.
Another use involves the calloc function. This can be implemented by having a page of physical memory filled with zeros. When the memory is allocated, the pages returned all refer to the page of zeros and are all marked as copy-on-write. This way, the amount of physical memory allocated for the process does not increase until data is written. This is typically only done for larger allocations.
Copy-on-write can be implemented by notifying the MMU
that certain pages in the process's address space are read-only. When data is written to these pages, the MMU raises an exception which is handled by the kernel, which allocates new space in physical memory and makes the page being written to correspond to that new location in physical memory.
One major advantage of COW is the ability to use memory sparsely. Because the usage of physical memory only increases as data is stored in it, very efficient hash table
s can be implemented which only use little more physical memory than is necessary to store the objects they contain. However, such programs run the risk of running out of virtual address space — virtual pages unused by the hash table cannot be used by other parts of the program. The main problem with COW at the kernel level is the complexity it adds, but the concerns are similar to those raised by more basic virtual-memory concerns such as swapping pages to disk; when the kernel writes to pages, it must copy any such pages marked copy-on-write.
, application
and system
code. The string
class provided by the C++ standard library
, for example, was
specifically designed to allow copy-on-write implementations:
In multithreaded systems, COW can be implemented without the use of traditional locking and instead use Compare-and-swap
to increment or decrement the internal reference counter. Since the original resource will never be altered, it can safely be copied by multiple threads (after the reference count was increased) without the need of performance-expensive locking such as mutexes. If the reference counter turns 0, then by definition only 1 thread is holding a reference so the resource can safely be de-allocated from memory, again without the use of performance-expensive locking mechanisms. The benefit of not having to copy the resource (and the resulting performance gain over traditional deep-copying) will therefore be valid in both single- and multithreaded systems.
The COW concept is also used in virtualization/emulation software such as VMware Virtualization, Bochs
, QEMU
, Linux vserver, UML
and VirtualBox
for virtual disk storage. This allows a great reduction in required disk space when multiple VMs can be based on the same hard disk image, as well as increased performance as disk reads can be cached in RAM and subsequent reads served to other VMs out of the cache. This is usually the case.
The COW concept is also used in maintenance of instant snapshot on database servers like Microsoft SQL Server 2005. Instant snapshots preserve a static view of a database by storing a pre-modification copy of data when underlying data are updated. Instant snapshots are used for testing uses or moment-dependent reports and should not be used to replace backups.
COW may also be used as the underlying mechanism for snapshots
provided by logical volume management
and Microsoft Volume Shadow Copy Service.
The copy-on-write technique can be used to emulate a read-write storage on media that require wear levelling
or are physically Write Once Read Many
.
Optimization (computer science)
In computer science, program optimization or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources...
strategy used in computer programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
. The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, they can all be given pointers to the same resource. This function can be maintained until a caller tries to modify its "copy" of the resource, at which point a true private copy is created to prevent the changes becoming visible to everyone else. All of this happens transparently
Transparency (computing)
Any change in a computing system, such as new feature or new component, is transparent if the system after change adheres to previous external interface as much as possible while changing its internal behaviour. The purpose is to shield from change all systems on the other end of the interface...
to the callers. The primary advantage is that if a caller never makes any modifications, no private copy need ever be created.
Copy-on-write in virtual memory
Copy-on-write finds its main use in virtual memoryVirtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...
operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
s; when a process creates a copy of itself, the pages in memory
Computer storage
Computer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....
that might be modified by either the process or its copy are marked copy-on-write. When one process modifies the memory, the operating system's kernel intercepts the operation and copies the memory so that changes in one process's memory are not visible to the other.
Another use involves the calloc function. This can be implemented by having a page of physical memory filled with zeros. When the memory is allocated, the pages returned all refer to the page of zeros and are all marked as copy-on-write. This way, the amount of physical memory allocated for the process does not increase until data is written. This is typically only done for larger allocations.
Copy-on-write can be implemented by notifying the MMU
Memory management unit
A memory management unit , sometimes called paged memory management unit , is a computer hardware component responsible for handling accesses to memory requested by the CPU...
that certain pages in the process's address space are read-only. When data is written to these pages, the MMU raises an exception which is handled by the kernel, which allocates new space in physical memory and makes the page being written to correspond to that new location in physical memory.
One major advantage of COW is the ability to use memory sparsely. Because the usage of physical memory only increases as data is stored in it, very efficient hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
s can be implemented which only use little more physical memory than is necessary to store the objects they contain. However, such programs run the risk of running out of virtual address space — virtual pages unused by the hash table cannot be used by other parts of the program. The main problem with COW at the kernel level is the complexity it adds, but the concerns are similar to those raised by more basic virtual-memory concerns such as swapping pages to disk; when the kernel writes to pages, it must copy any such pages marked copy-on-write.
Other applications of copy-on-write
COW is also used outside the kernel, in libraryLibrary (computer science)
In computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....
, application
Application software
Application software, also known as an application or an "app", is computer software designed to help the user to perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software and media players. Many application programs deal principally with...
and system
System software
System software is computer software designed to operate the computer hardware and to provide a platform for running application software.The most basic types of system software are:...
code. The string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
class provided by the C++ standard library
C++ standard library
In C++, the C++ Standard Library is a collection of classes and functions, which are written in the core language and part of the C++ ISO Standard itself...
, for example, was
specifically designed to allow copy-on-write implementations:
In multithreaded systems, COW can be implemented without the use of traditional locking and instead use Compare-and-swap
Compare-and-swap
In computer science, the compare-and-swap CPU instruction is a special instruction that atomically compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value...
to increment or decrement the internal reference counter. Since the original resource will never be altered, it can safely be copied by multiple threads (after the reference count was increased) without the need of performance-expensive locking such as mutexes. If the reference counter turns 0, then by definition only 1 thread is holding a reference so the resource can safely be de-allocated from memory, again without the use of performance-expensive locking mechanisms. The benefit of not having to copy the resource (and the resulting performance gain over traditional deep-copying) will therefore be valid in both single- and multithreaded systems.
The COW concept is also used in virtualization/emulation software such as VMware Virtualization, Bochs
Bochs
Bochs is a portable x86 and x86-64 IBM PC compatible emulator and debugger mostly written in C++ and distributed as free software under GNU Lesser General Public License...
, QEMU
QEMU
QEMU is a processor emulator that relies on dynamic binary translation to achieve a reasonable speed while being easy to port on new host CPU architectures....
, Linux vserver, UML
User-mode Linux
User-mode Linux enables multiple virtual Linux systems to run as an application within a normal Linux system...
and VirtualBox
VirtualBox
Oracle VM VirtualBox is an x86 virtualization software package, originally created by software company Innotek GmbH, purchased by Sun Microsystems, and now developed by Oracle Corporation as part of its family of virtualization products...
for virtual disk storage. This allows a great reduction in required disk space when multiple VMs can be based on the same hard disk image, as well as increased performance as disk reads can be cached in RAM and subsequent reads served to other VMs out of the cache. This is usually the case.
The COW concept is also used in maintenance of instant snapshot on database servers like Microsoft SQL Server 2005. Instant snapshots preserve a static view of a database by storing a pre-modification copy of data when underlying data are updated. Instant snapshots are used for testing uses or moment-dependent reports and should not be used to replace backups.
COW may also be used as the underlying mechanism for snapshots
Snapshot (computer storage)
In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography. It can refer to an actual copy of the state of a system or to a capability provided by certain systems....
provided by logical volume management
Logical volume management
In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes...
and Microsoft Volume Shadow Copy Service.
The copy-on-write technique can be used to emulate a read-write storage on media that require wear levelling
Wear levelling
Wear leveling is a technique for prolonging the service life of some kinds of erasable computer storage media, such as Flash memory used in solid-state drives and USB Flash drives...
or are physically Write Once Read Many
Write Once Read Many
A Write Once Read Many or WORM drive is a data storage device where information, once written, cannot be modified. On ordinary data storage devices, the number of times data can be modified is not limited, except by the rated lifespan of the device, as modification involves physical changes that...
.
See also
- Allocate-on-flushAllocate-on-flushAllocate-on-flush is a computer file system feature implemented in the HFS+, XFS, Reiser4, ZFS, Btrfs and ext4 file systems...
- ZFSZFSIn computing, ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include data integrity verification against data corruption modes , support for high storage capacities, integration of the concepts of filesystem and volume management,...
- Ext3cowExt3cowExt3cow or third extended filesystem with copy-on-write is an open source, versioning file system based on the ext3 file system. Versioning is implemented through block-level copy-on-write. It shares many of its performance characteristics with ext3....
- BtrfsBtrfsBtrfs is a GPL-licensed copy-on-write file system for Linux.Development began at Oracle Corporation in 2007....
- Write Anywhere File LayoutWrite Anywhere File LayoutThe Write Anywhere File Layout is a file layout that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure , and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances...
- VxFS
- Logical Volume Manager (Linux)Logical Volume Manager (Linux)LVM is a logical volume manager for the Linux kernel; it manages disk drives and similar mass-storage devices, in particular large ones. The term "volume" refers to a disk drive or partition thereof...
- R1Soft Hot copy
- Shadow Copy