Revision control
Encyclopedia
Revision control, also known as version control and source control (and an aspect of software configuration management
Software configuration management
In software engineering, software configuration management is the task of tracking and controlling changes in the software. Configuration management practices include revision control and the establishment of baselines....

or SCM), is the management of changes to documents, program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

s, and other information stored as computer file
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

s. It is most commonly used in software development
Software development
Software development is the development of a software product...

, where a team
Team
A team comprises a group of people or animals linked in a common purpose. Teams are especially appropriate for conducting tasks that are high in complexity and have many interdependent subtasks.A group in itself does not necessarily constitute a team...

 of people may change the same files. Changes are usually identified by a number or letter code, termed the "revision number", "revision level", or simply "revision". For example, an initial set of files is "revision 1". When the first change is made, the resulting set is "revision 2", and so on. Each revision is associated with a timestamp
Timestamp
A timestamp is a sequence of characters, denoting the date or time at which a certain event occurred. A timestamp is the time at which an event is recorded by a computer, not the time of the event itself...

 and the person making the change. Revisions can be compared, restored, and with some types of files, merged.

Version control systems (VCSs – singular VCS) most commonly run as stand-alone applications, but revision control is also embedded in various types of software such as word processor
Word processor
A word processor is a computer application used for the production of any sort of printable material....

s (e.g., Microsoft Word
Microsoft Word
Microsoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...

, OpenOffice.org Writer
OpenOffice.org Writer
OpenOffice.org Writer is the word processor component of the OpenOffice.org software package. Writer is a word processor similar to Microsoft Word and Corel's WordPerfect, with some of their features....

, KWord
KWord
Calligra Words is a free word processor, part of Calligra Suite and developed by KDE.The text-layout scheme in Words is based on frames, making it similar to FrameMaker by Adobe. These can be placed anywhere on the page, and can incorporate text, graphics and embedded objects...

, Pages
Pages
Pages is a word processor and page layout application developed by Apple. It is part of the iWork productivity suite and runs on the Mac OS X & iOS operating systems. The first version of Pages was announced on January 11, 2005, and was released one month later. The most recent Macintosh version,...

, etc.), spreadsheets (e.g., Microsoft Excel
Microsoft Excel
Microsoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...

, OpenOffice.org Calc
OpenOffice.org Calc
OpenOffice.org Calc is the spreadsheet component of the OpenOffice.org software package.Calc is similar to Microsoft Excel, with a roughly equivalent range of features. Calc is capable of opening and saving most spreadsheets in Microsoft Excel file format...

, KSpread
KSpread
Tables is a free software spreadsheet application that is part of Calligra Suite, an integrated graphic art and office suite by KDE.Among Tables's features are multiple sheets per document, assorted formatting possibilities, support for more than 300 built-in functions, templates, chart,...

, Numbers
Numbers (software)
Numbers is a spreadsheet application developed by Apple Inc. as part of the iWork productivity suite alongside Keynote and Pages. Numbers 1.0 was announced on August 7, 2007 and thus it is the newest application in the iWork Suite. Numbers runs on Mac OS X v10.4 "Tiger" or newer...

, etc.), and in various content management system
Content management system
A content management system is a system providing a collection of procedures used to manage work flow in a collaborative environment. These procedures can be manual or computer-based...

s (e.g., Drupal
Drupal
Drupal is a free and open-source content management system and content management framework written in PHP and distributed under the GNU General Public License. It is used as a back-end system for at least 1.5% of all websites worldwide ranging from personal blogs to corporate, political, and...

, Joomla, WordPress
WordPress
WordPress is a free and open source blogging tool and publishing platform powered by PHP and MySQL. It is often customized into a content management system . It has many features including a plug-in architecture and a template system. WordPress is used by over 14.7% of Alexa Internet's "top 1...

). Integrated revision control is a key feature of wiki software
Wiki software
Wiki software is collaborative software that runs a wiki, i.e., a website that allows users to create and collaboratively edit web pages via a web browser. A wiki system is usually a web application that runs on one or more web servers...

 packages such as MediaWiki
MediaWiki
MediaWiki is a popular free web-based wiki software application. Developed by the Wikimedia Foundation, it is used to run all of its projects, including Wikipedia, Wiktionary and Wikinews. Numerous other wikis around the world also use it to power their websites...

, DokuWiki
DokuWiki
DokuWiki is a wiki application aimed at small companies’ documentation needs. DokuWiki is licensed under GPL 2 and written in the programming language PHP. It works on plain text files and thus needs no database. Its syntax is similar to the one used by MediaWiki.-History:DokuWiki was created by...

, TWiki
TWiki
TWiki is a Perl-based structured wiki application, typically used to run a collaboration platform, knowledge or document management system, a knowledge base, or team portal...

 etc. In wiki
Wiki
A wiki is a website that allows the creation and editing of any number of interlinked web pages via a web browser using a simplified markup language or a WYSIWYG text editor. Wikis are typically powered by wiki software and are often used collaboratively by multiple users. Examples include...

s, revision control allows for the ability to revert a page to a previous revision, which is critical for allowing editors to track each other's edits, correct mistakes, and defend public wikis against vandalism and spam
Spam (electronic)
Spam is the use of electronic messaging systems to send unsolicited bulk messages indiscriminately...

.

Software tools for revision control are essential for the organization of multi-developer projects.

Overview

In computer software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

, revision control is any practice that tracks and provides control over changes to source code. Software developer
Software developer
A software developer is a person concerned with facets of the software development process. Their work includes researching, designing, developing, and testing software. A software developer may take part in design, computer programming, or software project management...

s sometimes use revision control software to maintain documentation and configuration file
Configuration file
In computing, configuration files, or config files configure the initial settings for some computer programs. They are used for user applications, server processes and operating system settings. The files are often written in ASCII and line-oriented, with lines terminated by a newline or carriage...

s as well as source code.

As teams design, develop and deploy software, it is common for multiple versions of the same software to be deployed in different sites and for the software's developers to be working simultaneously on updates. Bugs or features of the software are often only present in certain versions (because of the fixing of some problems and the introduction of others as the program develops). Therefore, for the purposes of locating and fixing bugs, it is vitally important to be able to retrieve and run different versions of the software to determine in which version(s) the problem occurs. It may also be necessary to develop two versions of the software concurrently (for instance, where one version has bugs fixed, but no new features (branch), while the other version is where new features are worked on (trunk
Trunk (software)
In the field of software development, trunk refers to the unnamed branch of a file tree under revision control. The trunk is usually meant to be the base of a project on which development progresses. If developers are working exclusively on the trunk, it always contains the latest cutting-edge...

).

At the simplest level, developers could simply retain multiple copies of the different versions of the program, and label them appropriately. This simple approach has been used on many large software projects. While this method can work, it is inefficient as many near-identical copies of the program have to be maintained. This requires a lot of self-discipline on the part of developers, and often leads to mistakes. Consequently, systems to automate some or all of the revision control process have been developed.

Moreover, in software development, legal and business practice and other environments, it has become increasingly common for a single document or snippet of code to be edited by a team, the members of which may be geographically dispersed and may pursue different and even contrary interests. Sophisticated revision control that tracks and accounts for ownership of changes to documents and code may be extremely helpful or even necessary in such situations.

Revision control may also track changes to configuration files, such as those typically stored in /etc or /usr/local/etc on Unix systems. This gives system administrators another way to easily track changes made and a way to roll back to earlier versions should the need arise.

Specialized strategies

Engineering revision control developed from formalized processes based on tracking revisions of early blueprints or bluelines
Whiteprint
Whiteprint is the commercial terminology to describe document reproduction using the diazo chemical process. It is also known as the blue-line or blue-line process...

. This system of control implicitly allowed returning to any earlier state of the design, for cases in which an engineering dead-end was reached in the development of the design.
A revision table was used to keep track of the changes made. Additionally, the modified areas of the drawing were highlighted using revision clouds.

Version control is also widespread in business and law. Indeed, "contract redline" and "legal blackline" are some of the earliest forms of revision control, and are still employed in business and law with varying degrees of sophistication. An entire industry has emerged to service the document revision control needs of business and other users, and some of the revision control technology employed in these circles is subtle, powerful, and innovative. The most sophisticated techniques are beginning to be used for the electronic tracking of changes to CAD
Computer-aided design
Computer-aided design , also known as computer-aided design and drafting , is the use of computer technology for the process of design and design-documentation. Computer Aided Drafting describes the process of drafting with a computer...

 files (see product data management
Product Data Management
Product data management is the business function often within product lifecycle management that is responsible for the creation, management and publication of product data...

), supplanting the "manual" electronic implementation of traditional revision control.

Source-management models

Traditional revision control systems use a centralized model where all the revision control functions take place on a shared server
Server (computing)
In the context of client-server architecture, a server is a computer program running to serve the requests of other programs, the "clients". Thus, the "server" performs some computational task on behalf of "clients"...

. If two developers try to change the same file at the same time, without some method of managing access the developers may end up overwriting each other's work. Centralized revision control systems solve this problem in one of two different "source management models": file locking and version merging.

Atomic operations

Computer scientists speak of atomic operations if the system is left in a consistent state even if the operation is interrupted. The commit operation is usually the most critical in this sense. Commits are operations which tell the revision control system you want to make a group of changes you have been making final and available to all users. Not all revision control systems have atomic commits; notably, the widely-used CVS
Concurrent Versions System
The Concurrent Versions System , also known as the Concurrent Versioning System, is a client-server free software revision control system in the field of software development. Version control system software keeps track of all work and all changes in a set of files, and allows several developers ...

 lacks this feature.

File locking

The simplest method of preventing "concurrent access" problems involves locking files
File locking
File locking is a mechanism that restricts access to a computer file by allowing only one user or process access at any specific time. Systems implement locking to prevent the classic interceding update scenario ....

 so that only one developer at a time has write access to the central "repository" copies of those files. Once one developer "checks out" a file, others can read that file, but no one else may change that file until that developer "checks in" the updated version (or cancels the checkout).

File locking has both merits and drawbacks. It can provide some protection against difficult merge conflicts when a user is making radical changes to many sections of a large file (or group of files). However, if the files are left exclusively locked for too long, other developers may be tempted to bypass the revision control software and change the files locally, leading to more serious problems.

Version merging

Most version control systems allow multiple developers to edit the same file at the same time. The first developer to "check in" changes to the central repository always succeeds. The system may provide facilities to merge
Merge (revision control)
Merging in revision control, is a fundamental operation that reconciles multiple changes made to a revision-controlled collection of files. Most often, it is necessary when a file is modified by two people on two different computers at the same time...

 further changes into the central repository, and preserve the changes from the first developer when other developers check in.

Merging two files can be a very delicate operation, and usually possible only if the data structure is simple, as in text file
Text file
A text file is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists within a computer file system...

s. The result of a merge of two image files might not result in an image file at all. The second developer checking in code will need to take care with the merge, to make sure that the changes are compatible and that the merge operation does not introduce its own logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...

 errors within the files. These problems limit the availability of automatic or semi-automatic merge operations mainly to simple text based documents, unless a specific merge plugin is available for the file types.

The concept of a reserved edit can provide an optional means to explicitly lock a file for exclusive write access, even when a merging capability exists.

Baselines, labels and tags

Most revision control tools will use only one of these similar terms (baseline, label, tag) to refer to the action of identifying a snapshot ("label the project") or the record of the snapshot ("try it with baseline X"). Typically only one of the terms baseline, label, or tag is used in documentation or discussion; they can be considered synonyms.

In most projects some snapshots are more significant than others, such as those used to indicate published releases, branches, or milestones.

When both the term baseline and either of label or tag are used together in the same context, label and tag usually refer to the mechanism within the tool of identifying or making the record of the snapshot, and baseline indicates the increased significance of any given label or tag.

Most formal discussion of configuration management
Configuration management
Configuration management is a field of management that focuses on establishing and maintaining consistency of a system or product's performance and its functional and physical attributes with its requirements, design, and operational information throughout its life.For information assurance, CM...

 uses the term baseline
Baseline (configuration management)
Configuration management is the process of managing change in hardware, software, firmware, documentation, measurements, etc. As change requires an initial state and next state, the marking of significant states within a series of several changes becomes important...

.

Distributed revision control

Distributed revision control systems (DRCS) take a peer-to-peer approach, as opposed to the client-server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a bona-fide
Good faith
In philosophy, the concept of Good faith—Latin bona fides “good faith”, bona fide “in good faith”—denotes sincere, honest intention or belief, regardless of the outcome of an action; the opposed concepts are bad faith, mala fides and perfidy...

 repository.
Distributed revision control conducts synchronization by exchanging patches
Patch (Unix)
patch is a Unix program that updates text files according to instructions contained in a separate file, called a patch file. The patch file is a text file that consists of a list of differences and is produced by running the related diff program with the original and updated file as arguments...

 (change-sets) from peer to peer. This results in some important differences from a centralized system:
  • No canonical, reference copy of the codebase exists by default; only working copies.
  • Common operations (such as commits, viewing history, and reverting changes) are fast, because there is no need to communicate with a central server.

Rather, communication is only necessary when pushing or pulling changes to or from other peers.
  • Each working copy effectively functions as a remote backup of the codebase and of its change-history, providing natural protection against data loss.

Integration

Some of the more advanced revision-control tools offer many other facilities, allowing deeper integration with other tools and software-engineering processes. Plugins are often available for IDEs
Integrated development environment
An integrated development environment is a software application that provides comprehensive facilities to computer programmers for software development...

 such as Oracle JDeveloper, IntelliJ IDEA
IntelliJ IDEA
IntelliJ IDEA is a commercial Java IDE by JetBrains. It is often simply referred to as "IDEA" or "IntelliJ."-History:The first version of IntelliJ IDEA was released in January 2001, and at the time was the only available Java IDE with advanced code navigation and code refactoring capabilities...

, Eclipse and Visual Studio. NetBeans IDE
NetBeans
NetBeans refers to both a platform framework for Java desktop applications, and an integrated development environment for developing with Java, JavaScript, PHP, Python, Groovy, C, C++, Scala, Clojure, and others...

 and Xcode
Xcode
Xcode is a suite of tools, developed by Apple, for developing software for Mac OS X and iOS. Xcode 4.2, the latest major version, is available on the Mac App Store for free for Mac OS X 10.7 , and on the Apple Developer Connection website for free to registered developers Xcode is a suite of tools,...

 come with integrated version control support.

Common vocabulary

Terminology can vary from system to system, but some terms in common usage include:

Baseline
Baseline (configuration management)
Configuration management is the process of managing change in hardware, software, firmware, documentation, measurements, etc. As change requires an initial state and next state, the marking of significant states within a series of several changes becomes important...

 : An approved revision of a document or source file from which subsequent changes can be made. See baselines, labels and tags.
Branch
Branching (software)
Branching, in revision control and software configuration management, is the duplication of an object under revision control so that modifications can happen in parallel along both branches....

 : A set of files under version control may be branched or forked at a point in time so that, from that time forward, two copies of those files may develop at different speeds or in different ways independently of each other.
Change : A change (or diff
Diff
In computing, diff is a file comparison utility that outputs the differences between two files. It is typically used to show the changes between one version of a file and a former version of the same file. Diff displays the changes made per line for text files. Modern implementations also...

, or delta
Delta encoding
Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than complete files; more generally this is known as data differencing...

) represents a specific modification to a document under version control. The granularity of the modification considered a change varies between version control systems.
Change list : On many version control systems with atomic multi-change commits, a changelist, change set, or patch identifies the set of changes made in a single commit. This can also represent a sequential view of the source code, allowing the examination of source "as of" any particular changelist ID.
Checkout : A check-out (or co) is the act of creating a local working copy from the repository. A user may specify a specific revision or obtain the latest. The term 'checkout' can also be used as a noun to describe the working copy.
Commit : A commit (checkin, ci or, more rarely, install, submit or record) is the action of writing or merging the changes made in the working copy back to the repository. The terms 'commit' and 'checkin' can also be used in noun form to describe the new revision that is created as a result of committing.
Conflict : A conflict occurs when different parties make changes to the same document, and the system is unable to reconcile the changes. A user must resolve the conflict by combining the changes, or by selecting one change in favour of the other.
Delta compression : Most revision control software uses delta compression, which retains only the differences between successive versions of files. This allows for more efficient storage of many different versions of files.
Dynamic stream : A stream in which some or all file versions are mirrors of the parent stream's versions.
Export : exporting is the act of obtaining the files from the repository. It is similar to checking-out except that it creates a clean directory tree without the version-control metadata used in a working copy. This is often used prior to publishing the contents, for example.
Head: Also sometime called tip, this refers to the most recent commit.
Import : importing is the act of copying a local directory tree (that is not currently a working copy) into the repository for the first time.
Label : See tag.
Mainline : Similar to trunk, but there can be a mainline for each branch.
Merge
Merge (revision control)
Merging in revision control, is a fundamental operation that reconciles multiple changes made to a revision-controlled collection of files. Most often, it is necessary when a file is modified by two people on two different computers at the same time...

 : A merge or integration is an operation in which two sets of changes are applied to a file or set of files. Some sample scenarios are as follows:
  • A user, working on a set of files, updates or syncs their working copy with changes made, and checked into the repository, by other users.

  • A user tries to check-in files that have been updated by others since the files were checked out, and the revision control software automatically merges the files (typically, after prompting the user if it should proceed with the automatic merge, and in some cases only doing so if the merge can be clearly and reasonably resolved).
  • A set of files is branched, a problem that existed before the branching is fixed in one branch, and the fix is then merged into the other branch.
  • A branch is created, the code in the files is independently edited, and the updated branch is later incorporated into a single, unified trunk.

Promote : The act of copying file content from a less controlled location into a more controlled location. For example, from a user's workspace into a repository, or from a stream to its parent.
Repository
Repository (version control)
A Repository is a concept from distributed revision control that refers to a data structure, usually stored on a server, that contains, among other things:* A set of files and directories.* Historical record of changes in the repository....

 : The repository is where files' current and historical data are stored, often on a server. Sometimes also called a depot (for example, by SVK
SVK
SVK is a decentralized version control system written in Perl, with a hierarchical distributed design comparable to centralized deployment of BitKeeper and GNU arch. The primary author of svk is Kao Chia-liang...

, AccuRev
AccuRev SCM
AccuRev is a software configuration management application developed by AccuRev, Inc. and was first released in 2002.- Overview :AccuRev is a centralized version control system which uses a client/server model. Communication is performed via TCP/IP using an XML based protocol for actions, and...

 and Perforce
Perforce
Perforce is a commercial, proprietary, centralized revision control system developed by Perforce Software, Inc.-Architecture:Perforce is a client/server system.The server manages a central database and a master repository of file versions....

).
Resolve : The act of user intervention to address a conflict between different changes to the same document.
Reverse integration : The process of merging different team branches into the main trunk of the versioning system.
Revision : Also version: A version is any change in form. In SVK
SVK
SVK is a decentralized version control system written in Perl, with a hierarchical distributed design comparable to centralized deployment of BitKeeper and GNU arch. The primary author of svk is Kao Chia-liang...

, a Revision is the state at a point in time of the entire tree in the repository.
Ring: See tag.
Share: The act of making one file or folder available in multiple branches at the same time. When a shared file is changed in one branch, it is changed in other branches.
Stream : A container for branched files that has a known relationship to other such containers. Streams form a hierarchy; each stream can inherit various properties (like versions, namespace, workflow rules, subscribers, etc.) from its parent stream.
Tag
Revision tag
A revision tag is the term often used to define a textual label that can be associated with a specific revision of a project maintained by a revision control system. This allows the user to define a meaningful name to be given to a particular state of a project that is under version control...

 : A tag or label refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number. See baselines, labels and tags.
Trunk
Trunk (software)
In the field of software development, trunk refers to the unnamed branch of a file tree under revision control. The trunk is usually meant to be the base of a project on which development progresses. If developers are working exclusively on the trunk, it always contains the latest cutting-edge...

: The unique line of development that is not a branch (sometimes also called Baseline or Mainline)
Update : An update (or sync) merges changes made in the repository (by other people, for example) into the local working copy.
Working copy: The working copy is the local copy of files from a repository, at a specific time or revision. All work done to the files in a repository is initially done on a working copy, hence the name. Conceptually, it is a sandbox
Sandbox (software development)
A sandbox is a testing environment that isolates untested code changes and outright experimentation from the production environment or repository, in the context of software development including Web development and revision control...

.

See also

  • Comparison of revision control software
    Comparison of revision control software
    The following is a comparison of revision control software. The following tables includes general and technical information for notable revision control and software configuration management software.- General information :Table Explanation...

  • Distributed revision control
    Distributed revision control
    A distributed revision control system , distributed version control or decentralized version control keeps track of software revisions and allows many developers to work on a given project without necessarily being connected to a common network.-Distributed vs...

  • Software configuration management
    Software configuration management
    In software engineering, software configuration management is the task of tracking and controlling changes in the software. Configuration management practices include revision control and the establishment of baselines....

     (SCM)
  • Software versioning
  • Versioning file system
    Versioning file system
    A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control. Most common versioning file systems keep a number of old copies of the file. Some limit the number of changes per minute or per...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK