Parallel (software)
Encyclopedia
GNU parallel is a command-line driven utility for Linux
or other Unix-like
operating systems which allows the user to execute shell
scripts in parallel
. GNU parallel is a free software
, written in Perl
. It is available under the terms of GPLv3.
(for x in `cat list` ; do
do_something $x
done) | process_output
to the form of
cat list | parallel do_something | process_output
where the file list contains arguments for do_something and where process_output may be empty.
Scripts using parallel are often easier to read than scripts using pexec
.
The program parallel features also
By default, parallel runs 9 jobs in parallel, but using -j+0 parallel can be made to detect the number of CPUs and use all of them.
An introduction video to GNU Parallel can be found on Wikimedia Commons.
The above is equivalent to:
This searches in all files in the current directory
and its subdirectories which end in
. In order to avoid this limitation one may use:
The above command uses GNU specific extensions to
;
The above command uses
The command above does the same as:
however, the former command which uses
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
or other Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....
operating systems which allows the user to execute shell
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...
scripts in parallel
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...
. GNU parallel is a free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...
, written in Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
. It is available under the terms of GPLv3.
Usage
The most common usage is to replace the shell loop, for example(for x in `cat list` ; do
do_something $x
done) | process_output
to the form of
cat list | parallel do_something | process_output
where the file list contains arguments for do_something and where process_output may be empty.
Scripts using parallel are often easier to read than scripts using pexec
Pexec
pexec is a command-line driven utility for Linux or other Unix-like operating systems which allows the user to execute "for ~ do ~ done" like shell loops in parallel. The specified command or script can be executed on both local and remote host computers, in the case of remote execution, ssh is...
.
The program parallel features also
- grouping of standard output and standard error so the output of the parallel running jobs do not run together;
- retaining the order of output to remain the same order as input;
- dealing nicely with file names containing special characters such as space, single quote, double quote, ampersand, and UTF-8 encoded characters;
By default, parallel runs 9 jobs in parallel, but using -j+0 parallel can be made to detect the number of CPUs and use all of them.
An introduction video to GNU Parallel can be found on Wikimedia Commons.
Examples
The above is equivalent to:
This searches in all files in the current directory
Directory (file systems)
In computing, a folder, directory, catalog, or drawer, is a virtual container originally derived from an earlier Object-oriented programming concept by the same name within a digital file system, in which groups of computer files and other folders can be kept and organized.A typical file system may...
and its subdirectories which end in
.foo
for occurrences of the stringString (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
bar
. The parallel command will work as expected unless a file name contains a newlineNewline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...
. In order to avoid this limitation one may use:
The above command uses GNU specific extensions to
find
to separate filenames using the null characterNull character
The null character , abbreviated NUL, is a control character with the value zero.It is present in many character sets, including ISO/IEC 646 , the C0 control code, the Universal Character Set , and EBCDIC...
;
The above command uses
{}
to tell parallel
to replace {}
with the argument list.The command above does the same as:
however, the former command which uses
find
/parallel
/cp
is more resource efficient and will not halt with an error if the expansion of *.ogg is too large for the shell.