File archiver
Encyclopedia
A file archiver is a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 that combines a number of files
Computer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...

 together into one archive file
Archive file
An archive file is a file that is composed of one or more files along with metadata that can include source volume and medium information, file directory structure, error detection and recovery information, file comments, and usually employs some form of lossless compression. Archive files may be...

, or a series of archive files, for easier transportation or storage. Many file archivers employ archive format
Archive format
An archive format is the file format of an archive file. The archive format is determined by the file archiver. Some archive formats are well-defined by their authors and have become conventions supported by multiple vendors and/or open-source communities....

s that provide lossless data compression
Lossless data compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...

 to reduce the size of the archive which is often useful for transferring a large number of individual files over a high latency network like the Internet.

The most basic archivers just take a list of files and concatenate their contents sequentially into the archive. In addition the archive must also contain some information about at least the names and lengths of the originals, so that proper reconstruction is possible. Most archivers also store metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

 about a file that the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 provides, such as timestamps, ownership and access control.

The process of making an archive file is called archiving or packing. Reconstructing the original files from the archive is termed unarchiving, unpacking or extracting.

Unix Archiver Tools

Unlike integrated archival and compression tools like PKZIP
PKZIP
PKZIP is an archiving tool originally written by Phil Katz and marketed by his company PKWARE, Inc. The common "PK" prefix used in both PKZIP and PKWARE stands for "Phil Katz".-History:...

, WinZip
WinZip
WinZip is a proprietary file archiver and compressor for Microsoft Windows and Mac OS X, developed by WinZip Computing...

, and WinRAR
WinRAR
WinRAR is a shareware file archiver and data compression utility developed by Eugene Roshal, and first released in autumn of 1993. It is one of the few applications that is able to create RAR archives natively, because the encoding method is held to be proprietary.-Developer:The current developer...

, the Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 tools ar
Ar (Unix)
The archiver is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses; it can be used to create archives for any purpose, but has been largely replaced by tar for...

, tar
Tar (file format)
In computing, tar is both a file format and the name of a program used to handle such files...

, cpio
Cpio
cpio is a general file archiver utility and its associated file format. It is primarily installed on Unix-like computer operating systems. The software utility was originally intended as a tape archiving program as part of the Programmer's Workbench , and has been a component of virtually every...

(for "archiver", "tape archiver" and "copy in/out" respectively) act as archivers but not compressors. Users of the Unix tools typically add compression by compressing the result of packing (and uncompressing before unpacking), most often using the gzip
Gzip
Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...

 or bzip2
Bzip2
bzip2 is a free and open source implementation of the Burrows–Wheeler algorithm. It is developed and maintained by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996.-Compression efficiency:...

 programs. Modern tar programs can automatically invoke a (de)compression program, giving the appearance that tar itself handles compression and decompression.

This approach has two advantages:
  • It follows the Unix toolbox
    Unix philosophy
    The Unix philosophy is a set of cultural norms and philosophical approaches to developing software based on the experience of leading developers of the Unix operating system.-McIlroy: A Quarter Century of Unix:...

     concept that each program should accomplish a single, well-done task, as opposed to attempting to accomplish everything with one tool. As compression technology progresses, users may use different compression programs without having to modify or abandon their archiver.
  • The archives use solid compression
    Solid compression
    In computing, solid compression refers to a method for data compression of multiple files, wherein all the compressed files are concatenated and treated as a single data block. Such an archive is called a solid archive. It is used natively in the 7z and RAR formats, as well as indirectly in...

    . Unlike an archiver that compresses each file in isolation, an archiver that combines files before compressing them can exploit redundancy across several archived files.


Solid compression does have disadvantages as compared with compressing within the archive:
  • Extracting one file requires decompressing all the files that are before the file in the archive. This may take many minutes for a large archive.
  • Modification is even more inconvenient than extraction - just changing a single character of one of the archived files will typically require that the entire archive be uncompressed, updated, and then recompressed.
  • It's impossible to take advantage of redundancy between files unless the compression window is larger than the size of an individual file. For example, gzip uses DEFLATE
    DEFLATE
    Deflate is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool and was later specified in RFC 1951....

    , which typically operates with a 32768 byte window, whereas bzip2 uses a Burrows-Wheeler transform
    Burrows-Wheeler transform
    The Burrows–Wheeler transform , is an algorithm used in data compression techniques such as bzip2. It was invented by Michael Burrows and David Wheeler in 1994 while working at DEC Systems Research Center in Palo Alto, California...

     roughly 30 times bigger.

See also

  • Comparison of file archivers
    Comparison of file archivers
    The following tables compare general and technical information for a number of file archivers. Please see the individual products' articles for further information. They are neither all-inclusive nor are some entries necessarily up to date...

  • Archive format
    Archive format
    An archive format is the file format of an archive file. The archive format is determined by the file archiver. Some archive formats are well-defined by their authors and have become conventions supported by multiple vendors and/or open-source communities....

  • List of archive formats
  • Comparison of archive formats
    Comparison of archive formats
    There are many popular computer data archive formats for creating and maintaining archive files. The tables below compare many popular archive formats.-Purpose:The earliest use of archive formats was for backup, mobility, and archiving....

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK