Parchive
Encyclopedia
Parchive is an open source software project that emerged in 2001 to develop a parity file
Parity file
Parity files are files that are created to accompany data files, and are used to preserve data integrity and assist in data recovery. They are useful when data files are transmitted or stored on less-than-perfect media such as newsgroup messages, satellite transmission, or optical disk...

 format, as conceived by Tobias Rieper and Stefan Wehlus. These parity files use a forward error correction
Forward error correction
In telecommunication, information theory, and coding theory, forward error correction or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels....

-style system that can be used to perform data verification, and allow recovery when data is lost or corrupted.

The project is currently administered by Ryan Gallagher (binerman), Roger Harrison (kbalore), Willem Monsuwe (monsuwe), and Stefan Wehlus (wehlus).

Overview

Parchive was written to solve the problem of reliably sending large files on Usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

.

Usenet newsgroup
Newsgroup
A usenet newsgroup is a repository usually within the Usenet system, for messages posted from many users in different locations. The term may be confusing to some, because it is usually a discussion group. Newsgroups are technically distinct from, but functionally similar to, discussion forums on...

s were originally designed for informal conversations and the underlying protocol, NNTP was not designed to transmit arbitrary binary data. Another limitation, which was acceptable for conversations but not for files, was that messages were normally fairly short in length and limited to 7-bit ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 text.

Various techniques were devised to send files over Usenet, such as uuencoding
Uuencode
Uuencoding is a form of binary-to-text encoding that originated in the Unix program uuencode, for encoding binary data for transmission over the uucp mail system.The name "uuencoding" is derived from "Unix-to-Unix encoding"...

 and Base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

. Later Usenet software allowed 8 bit Extended ASCII
Extended ASCII
The term extended ASCII describes eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others...

, which permitted new techniques like yEnc
YEnc
yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit Extended ASCII encoding method...

. Large files were broken up to reduce the effect of a corrupted download, but the unreliable nature of Usenet remained.

With the introduction of Parchive, parity file
Parity file
Parity files are files that are created to accompany data files, and are used to preserve data integrity and assist in data recovery. They are useful when data files are transmitted or stored on less-than-perfect media such as newsgroup messages, satellite transmission, or optical disk...

s could be created that were then uploaded along with the original data files. If any of the data files were damaged or lost while being propagated between Usenet servers, users could download parity files and use them to reconstruct the damaged or missing files. Parchive included the construction of small index files (*.par in version 1 and *.par2 in version 2) that do not contain any recovery data. These indexes contain file hash
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

es that can be used to quickly identify the target files and verify their integrity.

Because the index files were so small, they minimized the amount of extra data that had to be downloaded from Usenet to verify that the data files were all present and undamaged, or to determine how many parity volumes were required to repair any damage or reconstruct any missing files. They were most useful in version 1 where the parity volumes were much larger than the short index files. These larger parity volumes contain the actual recovery data along with a duplicate copy of the information in the index files (which allows them to be used on their own to verify the integrity of the data files if there is no small index file available).

History

In July 2001, Tobias Rieper and Stefan Wehlus proposed the Parity Volume Set specification, and with the assistance of other project members, version 1.0 of the specification was published in October 2001. Par1 used Reed-Solomon error correction to create new recovery files. An end user could use any of the recovery files to rebuild a missing file from an incomplete download.

Version 1 became widely used on Usenet, but it did suffer some limitations:
  • It was restricted to handle at most 255 files.
  • The recovery files had to be the size of the largest input file, so it did not work well when the input files were of various sizes. (This limited its usefulness when not paired with the proprietary RAR compression tool.)
  • The recovery algorithm had a bug, due to a flaw in the academic paper on which it was based.
  • It was strongly tied to Usenet and it was felt that a more general tool might have a wider audience.


In January 2002, Howard Fukada proposed that a new PAR2 specification should be devised with the significant changes that data verification and repair should work on blocks of data rather than whole files, and that the algorithm should switch to using 16 bit numbers rather than the 8 bit numbers that PAR 1 used. Michael Nahas and Peter Clements took up these ideas in July 2002, with additional input from Paul Nettle and Ryan Gallagher (who both wrote Par1 clients). Version 2.0 of the Parchive specification was published by Michael Nahas in September 2002.

Peter Clements then went on to write the first two PAR2 implementations: QuickPar
QuickPar
QuickPar is a computer program that creates parchives used as verification and recovery information for a file or group of files, and uses the recovery information, if available, to attempt to reconstruct the originals from the damaged files and the PAR volumes.Designed for the Microsoft Windows...

 and par2cmdline.

Versions

Versions 1 and 2 of the file format
File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...

 are incompatible. (However, many clients support both.)

Version 1

For version 1, given files f1, f2, ..., fn, the Parchive consists of an index file (f.par) and a number of "parity volumes" (f.p01, f.p02, etc.). Given all of the original files except for one (for example, f2), it is possible to create the missing f2 given all of the other original files and any one of the parity volumes. Alternatively, it is possible to recreate two missing files from any two of the parity volumes and so forth.

Version 1 supports up to 256 recovery files. Each recovery file must be the size of the largest input file.

Version 2

Version 2 files generally use this naming/extension system: filename.vol000+01.PAR2, filename.vol001+02.PAR2, filename.vol003+04.PAR2, filename.vol007+06.PAR2, etc. The +01, +02, etc. in the filename indicates how many blocks it contains, and the vol000, vol001, vol003 etc. indicates the number of the first recovery block within the PAR2 file. If an index file of a download states that 4 blocks are missing, the easiest way to repair the files would be by downloading filename.vol003+04.PAR2. However, due to the redundancy, filename.vol007+06.PAR2 is also acceptable.

Version 2 supports up to 32768 (2^15) recovery blocks. Input files are split into multiple equal-sized blocks so that recovery files do not need to be the size of the largest input file.

There is no support for Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 (it is planned for version 3).

Directory support is provided in MultiPar's implementation of PAR2.

Version 3

Version 3 does not officially exist yet, but is planned to:
  • fix problems related to creating or repairing when the block count or block size is very high.
  • directory support.
  • File moving and renaming support.
  • Unicode support.


An application written for PAR2 will not be able to understand PAR3 files.

Software

  • Windows
  • MultiPar http://www.livebusinesschat.com/smf/index.php?board=396.0 - Builds upon QuickPar's features and GUI
    Gui
    Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...

    , with support for PAR3, multithreading, multiple processors
    SMP
    -In technology:* Shape memory polymers, in the field of chemical engineering.* Simulation Model Portability: A standard developed by ESA and ECSS for space mission simulators....

    , and the ability to recurse subfolders, GPL.
  • QuickPar
    QuickPar
    QuickPar is a computer program that creates parchives used as verification and recovery information for a file or group of files, and uses the recovery information, if available, to attempt to reconstruct the originals from the damaged files and the PAR volumes.Designed for the Microsoft Windows...

     - freeware
    Freeware
    Freeware is computer software that is available for use at no cost or for an optional fee, but usually with one or more restricted usage rights. Freeware is in contrast to commercial software, which is typically sold for profit, but might be distributed for a business or commercial purpose in the...

    , unmaintained since 2004, superseded by MultiPar.
  • par2+tbb (a concurrent (multithreaded) version of par2cmdline 0.4, GPLv2
    GNU General Public License
    The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

     (or later))
  • Par-N-Rar (GPL
    GNU General Public License
    The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

    )
  • phpar2 - advanced par2cmdline with multithreading and highly optimized assemblercode (about 66% faster than QuickPar 0.9.1)
  • Rarslave (GPLv2
    GNU General Public License
    The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

    )
  • SmartPAR
    SmartPAR
    SmartPAR is a freeware application for Microsoft Windows for working with Parchive format parity files. It supports the original Par1 format and uses Reed-Solomon error correction to create new recovery files. SmartPAR is able to correct errors and recover missing parts of distributed files from...

     (no support for PAR2)
  • Mac OS X
  • MacPAR deLuxe 4.2
  • UnRarX
  • par2+tbb (a concurrent (multithreaded) version of par2cmdline 0.4, GPLv2
    GNU General Public License
    The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

     (or later))
  • Linux
    Linux
    Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

  • PyPar2 1.4
  • GPar2 2.03
  • par2+tbb (a concurrent (multithreaded) version of par2cmdline 0.4, GPLv2
    GNU General Public License
    The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

     (or later))
  • FreeBSD
    FreeBSD
    FreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...

  • par2+tbb (a concurrent (multithreaded) version of par2cmdline 0.4, GPLv2
    GNU General Public License
    The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

     (or later))
  • Posix
    POSIX
    POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

  • Par2 for KDE 4

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK