Shebang (Unix)
Encyclopedia
In computing
, a shebang (also called a hashbang) is the character sequence consisting of the characters number sign
and exclamation point (#!), when it occurs as the first two characters on the first line of a text file. In this case, the program loader
in Unix-like
operating systems parses the rest of the first line as an interpreter directive
and invokes the program specified after the character sequence with any command line options specified as parameters. The name of the file being executed is passed as the final argument.
For example, a file starting with the line:
invokes the Bourne shell
or a compatible shell. This is the standard starting line of a shell script
.
The contents of the shebang line will be automatically ignored by the interpreter, because the # character is a comment marker in many scripting languages. Some language interpreters that do not use the hash mark to begin comments, such as Scheme, still may ignore the shebang line.
The shebang or hashbang name is also sometimes used of state-preserving fragment identifier
s in Ajax applications; Google Webmaster Central
specifies that fragment identifiers starting with an exclamation point (...url#!state...) are indexed specially by the Googlebot.
and an exclamation point character. This initiating character sequence may be followed by whitespace, then followed by the (absolute) path
to the interpreter program that will provide the interpretation. The shebang is looked for and used when a script is invoked directly (as with a regular executable), and largely to the end of making scripts look and act like regular executables, to the operating system and to the user.
of SHArp
bang
or haSH bang, referring to the two typical Unix names of the two characters. Unix jargon uses sharp or hash (and sometimes, even, mesh) to refer to the number sign character and bang to refer to the exclamation point, hence shebang. Another theory on sh in shebang's name is from default shell
The initial two characters, "#!" of the interpreter directive have a range of jargon terms. One, "shebang", is representative (with an American bias) but far from universal. An executable file starting with an interpreter directive is simply called a script, often prefaced with the name or general classification of the intended interpreter.
When asked about what he would call his feature (i.e.
"What do you personally call that first line"),
Dennis Ritchie
answered:
between Edition 7
and 8
at Bell Laboratories. It was also added to the BSD releases from Berkeley's Computer Science Research (present at 2.8BSD and activated by default by 4.2BSD). As AT&T Bell Laboratories Edition 8 Unix, and later editions, were not released to the public, the first widely known appearance of this feature was on BSD.
The lack of an interpreter directive, but support for shell scripts, is apparent in the documentation from Version 7 Unix
in 1979,
which describes instead a facility of the Bourne shell where files with execute permission would be handled specially by the shell, which would (sometimes depending on initial characters in the script, such as ":" or "#") spawn a subshell which would interpret and run the commands contained in the file. In this model, scripts would only behave as other commands if called from within a Bourne shell. An attempt to directly execute such a file via the operating system's own exec system trap would fail, preventing scripts from behaving uniformly as normal system commands.
In later versions of Unix-like systems, this inconsistency was removed. Dennis Ritchie
introduced kernel support for interpreter directives in January 1980, for Version 8 Unix
, with the following description:
Kernel support for interpreter directives spread to other versions of Unix, and one modern implementation can be seen in the Linux kernel source in fs/binfmt_script.c.
This mechanism allows scripts to be used in virtually any context normal compiled programs can be, including as full system programs, and even as interpreters of other scripts. As a caveat, though, some early versions of kernel support limited the length of the interpreter directive to roughly 32 characters (just 16 in its first implementation), would fail to split the interpreter name from any parameters in the directive, or had other quirks. Additionally, some modern systems allow the entire mechanism to be constrained or disabled for security purposes (for example, set-user-id support has been disabled for scripts on many systems).
Note that, even in systems with full kernel support for the
, some scripts lacking interpreter directives (although usually still requiring execute permission) are still runnable by virtue of the legacy script handling of the Bourne shell, still present in many of its modern descendants.
On many systems,
to bash. When invoked in this manner, many features of bash are disabled, to comply with POSIX.
Shebang lines may include specific options that are passed to the interpreter (see the Perl
example above). However, implementations vary in the parsing behavior of options.
Hence, assuming a Bourne shell
script in /usr/local/bin/foo with a first line of
...which is run from the command line with the following (where "$" is just one possible prompt):
$
...the result would be actual command execution equivalent (except for argv[0] being set to the filename) to:
$
Since
Since the initial number sign
is also the character introducing comments in the Bourne shell
and many other interpreters, that interpreter directive itself is considered by the interpreter to be merely a comment, and skipped.
However, it is up to the interpreter to ignore the shebang line; thus a script consisting of the following two lines:
#!/bin/cat
Hello world!
will echo both lines to standard output.
, for example, might be in /usr/bin/python, /usr/local/bin/python, or even something like /home/username/bin/python if being tested by a user.
Because of this it is common to need to edit the shebang line after copying a script from one computer to another because the path that was coded into the script may not apply on a new machine, depending on the consistency in past convention of placement of the interpreter. For this and other reasons, POSIX
does not standardize the feature.
Often, the program /usr/bin/env
can be used to circumvent this limitation by introducing a level of indirection. #! is followed by
/usr/bin/env,
followed by the desired command without full path,
as in this example:
This mostly works because the path /usr/bin/env is commonly used for the utility,
and env invokes the first sh found in the user's $PATH, typically /bin/sh, if the user's path is correctly configured.
This approach may introduce vulnerabilities that expose information or gain unauthorized root access and does not grant complete portability. There are still some portability issues with OpenServer 5.0.6 and Unicos 9.0.2 which have only /bin/env and no /usr/bin/env.
Another portability problem is the interpretation of the command arguments.
Some systems, including Linux, do not split up the arguments; for example, when running the script with the first line like,
#!/usr/bin/env python -c
That is, python -c will be passed as one argument to /usr/bin/env, rather than two arguments. Cygwin
also behaves this way. Some other systems handle the arguments differently.
Another common problem is scripts containing a carriage return
character immediately after the shebang, perhaps as a result of being edited on a system that uses DOS line break
s, such as Microsoft Windows
. Some systems interpret the carriage return character as part of the interpreter
command, resulting in an error message.
POSIX requires that
, although it does not require it to be located at
systems and recent releases of Mac OS X
,
to
Using syntax specific to bash while maintaining a shebang pointing to the Bourne shell is not portable.
in the executable file, the magic byte string being
. (Executable files that do not require an interpreter program start with other magic combinations. See File format
for more details of magic numbers.)
Nonetheless, interpreted text files using the shebang are still text files, not binary files; a text editor
that introduces superfluous leading bytes will break the constructions as the file would not start with
—the standard character encoding for text files on many Unix-like
systems—is ASCII-compatible, assigning all characters in the ASCII character set to the same one-byte codes; but UTF-8 files on Windows usually begin with a three-byte byte order mark
(
(Unix-like) systems. A byte order mark is unneeded for UTF-8 (as opposed to UTF-16) since UTF-8 can reliably be recognised as such by a simple algorithm.
There have been rumors that some old versions of UNIX look for the normal shebang followed by a space and a slash ("#! /"), but this appears to be untrue.
On Unix-like operating systems, new image files are started by the "exec
" family functions. This is where the operating system will detect that an image file is either a script or an executable binary. The presence of the shebang will result in the execution of the specified (usually script language) executable. This is described on the Solaris and Linux man page "execve".
attribute, set-user-ID, a Unix feature which means that a program is executed with the access rights of the program file's owner instead of the rights of the user running it. Although this mechanism may be safe for compiled code, the extra step introduced by the interpreter directive provides a extra window of opportunity of attack along the following lines:
This problem has been corrected on some modern systems, namely those supporting the /dev/fd filesystem can support the change, by opening the script first, producing a file descriptor
which is safe from attack, then invoking the interpreter with that safe file descriptor as input. However, the discovery of the problem led many system administrators and developers to the conclusion that scripts couldn't be made secure, a case made more compelling by issues with the shell's internal field separator
(also since corrected on modern systems); as a result, setuid
functionality is often made unavailable to scripts.
As a result of these issues, setuid scripts are unsafe on older Unix-like systems, which comprise the majority of such installations. Appropriate research into the security implications of setuid scripts is therefore necessary before permitting their use. The sudo
command is a widely-used alternative for providing similar functionality.
namespace
, and allows the implementation language of a script to be changed without changing its invocation syntax by other programs.
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
, a shebang (also called a hashbang) is the character sequence consisting of the characters number sign
Number sign
Number sign is a name for the symbol #, which is used for a variety of purposes including, in some countries, the designation of a number...
and exclamation point (#!), when it occurs as the first two characters on the first line of a text file. In this case, the program loader
Loader (computing)
In computing, a loader is the part of an operating system that is responsible for loading programs. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution...
in Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....
operating systems parses the rest of the first line as an interpreter directive
Interpreter directive
An interpreter directive is a computer language construct that is used to control which interpreter parses and interprets the instructions in a computer program.- See also :* Shebang * Bourne-Again Shell* C Shell...
and invokes the program specified after the character sequence with any command line options specified as parameters. The name of the file being executed is passed as the final argument.
For example, a file starting with the line:
#!/bin/sh
invokes the Bourne shell
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...
or a compatible shell. This is the standard starting line of a shell script
Shell script
A shell script is a script written for the shell, or command line interpreter, of an operating system. It is often considered a simple domain-specific programming language...
.
The contents of the shebang line will be automatically ignored by the interpreter, because the # character is a comment marker in many scripting languages. Some language interpreters that do not use the hash mark to begin comments, such as Scheme, still may ignore the shebang line.
The shebang or hashbang name is also sometimes used of state-preserving fragment identifier
Fragment identifier
In computer hypertext, a fragment identifier is a short string of characters that refers to a resource that is subordinate to another, primary resource...
s in Ajax applications; Google Webmaster Central
Google Webmaster Tools
Google Webmaster Tools is a no-charge web service by Google for webmasters. It allows webmasters to check indexing status and optimize visibility of their websites.It has tools that let the webmasters:* Submit and check a sitemap...
specifies that fragment identifiers starting with an exclamation point (...url#!state...) are indexed specially by the Googlebot.
Syntax
The syntax of feature consists of the character sequence #!, i.e. the number signNumber sign
Number sign is a name for the symbol #, which is used for a variety of purposes including, in some countries, the designation of a number...
and an exclamation point character. This initiating character sequence may be followed by whitespace, then followed by the (absolute) path
Path (computing)
A path, the general form of a filename or of a directory name, specifies a unique location in a file system. A path points to a file system location by following the directory tree hierarchy expressed in a string of characters in which path components, separated by a delimiting character, represent...
to the interpreter program that will provide the interpretation. The shebang is looked for and used when a script is invoked directly (as with a regular executable), and largely to the end of making scripts look and act like regular executables, to the operating system and to the user.
Etymology and name history
The name shebang comes from an inexact contractionContraction (grammar)
A contraction is a shortened version of the written and spoken forms of a word, syllable, or word group, created by omission of internal letters....
of SHArp
Sharp (music)
In music, sharp, dièse , or diesis means higher in pitch and the sharp symbol raises a note by a half tone. Intonation may be flat, sharp, or both, successively or simultaneously...
bang
Exclamation mark
The exclamation mark, exclamation point, or bang, or "dembanger" is a punctuation mark usually used after an interjection or exclamation to indicate strong feelings or high volume , and often marks the end of a sentence. Example: “Watch out!” The character is encoded in Unicode at...
or haSH bang, referring to the two typical Unix names of the two characters. Unix jargon uses sharp or hash (and sometimes, even, mesh) to refer to the number sign character and bang to refer to the exclamation point, hence shebang. Another theory on sh in shebang's name is from default shell
sh
, usually invoked with shebang.The initial two characters, "#!" of the interpreter directive have a range of jargon terms. One, "shebang", is representative (with an American bias) but far from universal. An executable file starting with an interpreter directive is simply called a script, often prefaced with the name or general classification of the intended interpreter.
When asked about what he would call his feature (i.e.
"What do you personally call that first line"),
Dennis Ritchie
Dennis Ritchie
Dennis MacAlistair Ritchie , was an American computer scientist who "helped shape the digital era." He created the C programming language and, with long-time colleague Ken Thompson, the UNIX operating system...
answered:
From: "Ritchie, Dennis M (Dennis)** CTR **"
To: <[redacted]@talisman.org>
Date: Thu, 19 Nov 2009 18:37:37 -0600
Subject: RE: What do -you- call your #!line?
I can't recall that we ever gave it a proper name.
It was pretty late that it went in--I think that I
got the idea from someone at one of the UCB conferences
on Berkeley Unix; I may have been one of the first to
actually install it, but it was an idea that I got
from elsewhere.
As for the name: probably something descriptive like
"hash-bang" though this has a specifically British flavor, but
in any event I don't recall particularly using a pet name
for the construction.
Regards,
Dennis
History
The shebang was introduced by Dennis RitchieDennis Ritchie
Dennis MacAlistair Ritchie , was an American computer scientist who "helped shape the digital era." He created the C programming language and, with long-time colleague Ken Thompson, the UNIX operating system...
between Edition 7
Version 7 Unix
Seventh Edition Unix, also called Version 7 Unix, Version 7 or just V7, was an important early release of the Unix operating system. V7, released in 1979, was the last Bell Laboratories release to see widespread distribution before the commercialization of Unix by AT&T in the early 1980s...
and 8
Version 8 Unix
Eighth Edition Unix, also known as Version 8 Unix or V8, was a version of the Research Unix operating system developed and used internally at Bell Labs and a select number of universities. It was "released" in February 1985, ran on VAX hardware, and was a variant of 4.1cBSD with some System V.1 ...
at Bell Laboratories. It was also added to the BSD releases from Berkeley's Computer Science Research (present at 2.8BSD and activated by default by 4.2BSD). As AT&T Bell Laboratories Edition 8 Unix, and later editions, were not released to the public, the first widely known appearance of this feature was on BSD.
The lack of an interpreter directive, but support for shell scripts, is apparent in the documentation from Version 7 Unix
Version 7 Unix
Seventh Edition Unix, also called Version 7 Unix, Version 7 or just V7, was an important early release of the Unix operating system. V7, released in 1979, was the last Bell Laboratories release to see widespread distribution before the commercialization of Unix by AT&T in the early 1980s...
in 1979,
which describes instead a facility of the Bourne shell where files with execute permission would be handled specially by the shell, which would (sometimes depending on initial characters in the script, such as ":" or "#") spawn a subshell which would interpret and run the commands contained in the file. In this model, scripts would only behave as other commands if called from within a Bourne shell. An attempt to directly execute such a file via the operating system's own exec system trap would fail, preventing scripts from behaving uniformly as normal system commands.
In later versions of Unix-like systems, this inconsistency was removed. Dennis Ritchie
Dennis Ritchie
Dennis MacAlistair Ritchie , was an American computer scientist who "helped shape the digital era." He created the C programming language and, with long-time colleague Ken Thompson, the UNIX operating system...
introduced kernel support for interpreter directives in January 1980, for Version 8 Unix
Version 8 Unix
Eighth Edition Unix, also known as Version 8 Unix or V8, was a version of the Research Unix operating system developed and used internally at Bell Labs and a select number of universities. It was "released" in February 1985, ran on VAX hardware, and was a variant of 4.1cBSD with some System V.1 ...
, with the following description:
From uucp Thu Jan 10 01:37:58 1980
>From dmr Thu Jan 10 04:25:49 1980 remote from research
The system has been changed so that if a file being executed
begins with the magic characters #! , the rest of the line is understood
to be the name of an interpreter for the executed file.
Previously (and in fact still) the shell did much of this job;
it automatically executed itself on a text file with executable mode
when the text file's name was typed as a command.
Putting the facility into the system gives the following
benefits.
1) It makes shell scripts more like real executable files,
because they can be the subject of 'exec.'
2) If you do a 'ps' while such a command is running, its real
name appears instead of 'sh'.
Likewise, accounting is done on the basis of the real name.
3) Shell scripts can be set-user-ID.
4) It is simpler to have alternate shells available;
e.g. if you like the Berkeley csh there is no question about
which shell is to interpret a file.
5) It will allow other interpreters to fit in more smoothly.
To take advantage of this wonderful opportunity,
put
#! /bin/sh
at the left margin of the first line of your shell scripts.
Blanks after ! are OK. Use a complete pathname (no search is done).
At the moment the whole line is restricted to 16 characters but
this limit will be raised.
Kernel support for interpreter directives spread to other versions of Unix, and one modern implementation can be seen in the Linux kernel source in fs/binfmt_script.c.
This mechanism allows scripts to be used in virtually any context normal compiled programs can be, including as full system programs, and even as interpreters of other scripts. As a caveat, though, some early versions of kernel support limited the length of the interpreter directive to roughly 32 characters (just 16 in its first implementation), would fail to split the interpreter name from any parameters in the directive, or had other quirks. Additionally, some modern systems allow the entire mechanism to be constrained or disabled for security purposes (for example, set-user-id support has been disabled for scripts on many systems).
Note that, even in systems with full kernel support for the
#!
magic numberMagic number (programming)
In computer programming, the term magic number has multiple meanings. It could refer to one or more of the following:* A constant numerical or text value used to identify a file format or protocol; for files, see List of file signatures...
, some scripts lacking interpreter directives (although usually still requiring execute permission) are still runnable by virtue of the legacy script handling of the Bourne shell, still present in many of its modern descendants.
Examples
Some typical shebang lines:-
#!/bin/sh
— Execute the file using sh, the Bourne shellBourne shellThe Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...
, or a compatible shell -
#!/bin/csh
— Execute the file using csh, the C shellC shellThe C shell is a Unix shell that was created by Bill Joy while a graduate student at University of California, Berkeley in the late 1970s. It has been distributed widely, beginning with the 2BSD release of the BSD Unix system that Joy began distributing in 1978...
, or a compatible shell -
#!/usr/bin/perl -T
— Execute using PerlPerlPerl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
with the option for taint checks -
#!/usr/bin/python -O
— Execute using PythonPython (programming language)Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
with optimizations to code -
#!/usr/bin/php
— Execute the file using the PHPPHPPHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
command line interpreter
On many systems,
/bin/sh
is a symbolic or hard linkHard link
In computing, a hard link is a directory entry that associates a name with a file on a file system. . The term is used in file systems which allow multiple hard links to be created for the same file. This has the effect of creating multiple names for the same file, causing an aliasing effect: e.g...
to bash. When invoked in this manner, many features of bash are disabled, to comply with POSIX.
Shebang lines may include specific options that are passed to the interpreter (see the Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
example above). However, implementations vary in the parsing behavior of options.
Purpose
Interpreter directives allow scripts and data files to be used as system commands, hiding the details of their implementation from users and other programs, by removing the need to prefix scripts with their interpreter on the command line.Hence, assuming a Bourne shell
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...
script in /usr/local/bin/foo with a first line of
#!/bin/sh -x
...which is run from the command line with the following (where "$" is just one possible prompt):
$
foo bar
...the result would be actual command execution equivalent (except for argv[0] being set to the filename) to:
$
/bin/sh -x /usr/local/bin/foo bar
Since
sh
reads commands from a filename provided on its command line (instead of from the user, as it would normally), the end result is that all the shell commands in /usr/local/bin/foo are run automatically, with bar provided as a parameter, $1
, to those commands to use as they see fit.Since the initial number sign
Number sign
Number sign is a name for the symbol #, which is used for a variety of purposes including, in some countries, the designation of a number...
is also the character introducing comments in the Bourne shell
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...
and many other interpreters, that interpreter directive itself is considered by the interpreter to be merely a comment, and skipped.
However, it is up to the interpreter to ignore the shebang line; thus a script consisting of the following two lines:
#!/bin/cat
Hello world!
will echo both lines to standard output.
Portability
Shebangs must specify absolute paths to system executables; this can cause problems on systems that have a non-standard file system layout. Even when systems have fairly standard paths, it is quite possible for variants of the same operating system to have different locations for the desired interpreter. PythonPython (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
, for example, might be in /usr/bin/python, /usr/local/bin/python, or even something like /home/username/bin/python if being tested by a user.
Because of this it is common to need to edit the shebang line after copying a script from one computer to another because the path that was coded into the script may not apply on a new machine, depending on the consistency in past convention of placement of the interpreter. For this and other reasons, POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...
does not standardize the feature.
Often, the program /usr/bin/env
Env
env is a shell command for Unix and Unix-like operating systems. It is used to either print a list of environment variables or run another utility in an altered environment without having to modify the currently existing environment. Using env, variables may be added or removed, and the values of...
can be used to circumvent this limitation by introducing a level of indirection. #! is followed by
/usr/bin/env,
followed by the desired command without full path,
as in this example:
#!/usr/bin/env sh
This mostly works because the path /usr/bin/env is commonly used for the utility,
and env invokes the first sh found in the user's $PATH, typically /bin/sh, if the user's path is correctly configured.
This approach may introduce vulnerabilities that expose information or gain unauthorized root access and does not grant complete portability. There are still some portability issues with OpenServer 5.0.6 and Unicos 9.0.2 which have only /bin/env and no /usr/bin/env.
Another portability problem is the interpretation of the command arguments.
Some systems, including Linux, do not split up the arguments; for example, when running the script with the first line like,
#!/usr/bin/env python -c
That is, python -c will be passed as one argument to /usr/bin/env, rather than two arguments. Cygwin
Cygwin
Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin provides native integration of Windows-based applications, data, and other system resources with applications, software tools, and data of the Unix-like environment...
also behaves this way. Some other systems handle the arguments differently.
Another common problem is scripts containing a carriage return
Carriage return
Carriage return, often shortened to return, refers to a control character or mechanism used to start a new line of text.Originally, the term "carriage return" referred to a mechanism or lever on a typewriter...
character immediately after the shebang, perhaps as a result of being edited on a system that uses DOS line break
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...
s, such as Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
. Some systems interpret the carriage return character as part of the interpreter
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...
command, resulting in an error message.
POSIX requires that
sh
is a shell capable of a syntax similar to the Bourne shellBourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...
, although it does not require it to be located at
/bin/sh
; for example, some systems such as Solaris have the POSIX-compatible shell at /usr/xpg4/bin/sh
. In many LinuxLinux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
systems and recent releases of Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
,
/bin/sh
is a hard or symbolic linkSymbolic link
In computing, a symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Symbolic links were already present by 1978 in mini-computer operating systems from DEC and Data...
to
/bin/bash
, the Bourne Again shell.Using syntax specific to bash while maintaining a shebang pointing to the Bourne shell is not portable.
Magic number
The shebang is actually a human-readable instance of a magic numberMagic number (programming)
In computer programming, the term magic number has multiple meanings. It could refer to one or more of the following:* A constant numerical or text value used to identify a file format or protocol; for files, see List of file signatures...
in the executable file, the magic byte string being
0x23 0x21
, the two-character encoding in ASCIIASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
. (Executable files that do not require an interpreter program start with other magic combinations. See File format
File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...
for more details of magic numbers.)
Nonetheless, interpreted text files using the shebang are still text files, not binary files; a text editor
Text editor
A text editor is a type of program used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....
that introduces superfluous leading bytes will break the constructions as the file would not start with
0x23 0x21
. In particular, UTF-8UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
—the standard character encoding for text files on many Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....
systems—is ASCII-compatible, assigning all characters in the ASCII character set to the same one-byte codes; but UTF-8 files on Windows usually begin with a three-byte byte order mark
Byte Order Mark
The byte order mark is a Unicode character used to signal the endianness of a text file or stream. Its code point is U+FEFF. BOM use is optional, and, if used, should appear at the start of the text stream...
(
0xEF 0xBB 0xBF
). These bytes change the magic number and thus the interpreter will not be run (unless this other magic number is also recognized). For this and other reasons, use of the byte order mark is strongly recommended against on POSIXPOSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...
(Unix-like) systems. A byte order mark is unneeded for UTF-8 (as opposed to UTF-16) since UTF-8 can reliably be recognised as such by a simple algorithm.
There have been rumors that some old versions of UNIX look for the normal shebang followed by a space and a slash ("#! /"), but this appears to be untrue.
On Unix-like operating systems, new image files are started by the "exec
Exec (operating system)
The exec collection of functions of Unix-like operating systems cause the running process to be completely replaced by the program passed as an argument to the function...
" family functions. This is where the operating system will detect that an image file is either a script or an executable binary. The presence of the shebang will result in the execution of the specified (usually script language) executable. This is described on the Solaris and Linux man page "execve".
Security issues
On some systems, scripts can be marked with the setuidSetuid
setuid and setgid are Unix access rights flags that allow users to run an executable with the permissions of the executable's owner or group...
attribute, set-user-ID, a Unix feature which means that a program is executed with the access rights of the program file's owner instead of the rights of the user running it. Although this mechanism may be safe for compiled code, the extra step introduced by the interpreter directive provides a extra window of opportunity of attack along the following lines:
- An attacker makes a symbolic link in, say, /tmp/sneaky to a system shell script with setuidSetuidsetuid and setgid are Unix access rights flags that allow users to run an executable with the permissions of the executable's owner or group...
enabled, say /usr/bin/admintool (a hypothetical example). - The attacker then runs /tmp/sneaky, but pauses its execution immediately
- If the new process had already gotten as far as opening sneaky, stop and start over, otherwise:
- The new process has already set its user ID to the owner of /usr/bin/admintool, so it's probably now running as rootRootIn vascular plants, the root is the organ of a plant that typically lies below the surface of the soil. This is not always the case, however, since a root can also be aerial or aerating . Furthermore, a stem normally occurring below ground is not exceptional either...
with full system rights (if not, start over) - The attacker now removes the symbolic linkSymbolic linkIn computing, a symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Symbolic links were already present by 1978 in mini-computer operating systems from DEC and Data...
pointing to /usr/bin/admintool - The attacker creates a new script at /tmp/sneaky but with his own illicit commands therein
- The attacker now resumes the paused process, and the shell then opens sneaky and executes the illicit command file with root access rights.
This problem has been corrected on some modern systems, namely those supporting the /dev/fd filesystem can support the change, by opening the script first, producing a file descriptor
File descriptor
In computer programming, a file descriptor is an abstract indicator for accessing a file. The term is generally used in POSIX operating systems...
which is safe from attack, then invoking the interpreter with that safe file descriptor as input. However, the discovery of the problem led many system administrators and developers to the conclusion that scripts couldn't be made secure, a case made more compelling by issues with the shell's internal field separator
Internal field separator
In Unix operating systems, internal field separator refers to the character or characters designated as whitespace by the operating system. IFS is actually a system variable, and it can be modified, which is useful programmatically in a number of ways.IFS typically includes the space and the...
(also since corrected on modern systems); as a result, setuid
Setuid
setuid and setgid are Unix access rights flags that allow users to run an executable with the permissions of the executable's owner or group...
functionality is often made unavailable to scripts.
As a result of these issues, setuid scripts are unsafe on older Unix-like systems, which comprise the majority of such installations. Appropriate research into the security implications of setuid scripts is therefore necessary before permitting their use. The sudo
Sudo
sudo is a program for Unix-like computer operating systems that allows users to run programs with the security privileges of another user...
command is a widely-used alternative for providing similar functionality.
Strengths
When compared to the use of global association lists between command name extensions and the interpreting applications, the interpreter directive method allows users to use interpreters not known at a global system level, and without administrator rights. It also allows specific selection of interpreter, without overloading the filename extensionFilename extension
A filename extension is a suffix to the name of a computer file applied to indicate the encoding of its contents or usage....
namespace
Namespace
In general, a namespace is a container that provides context for the identifiers it holds, and allows the disambiguation of homonym identifiers residing in different namespaces....
, and allows the implementation language of a script to be changed without changing its invocation syntax by other programs.
See also
- Crunchbang GNU/LinuxCrunchBang LinuxCrunchBang Linux is a lightweight Debian based Linux distribution, created by Philip Newborough, designed to offer a good balance of speed and functionality.- Features :...
distribution, commonly referred to as "#!". - interpreter directiveInterpreter directiveAn interpreter directive is a computer language construct that is used to control which interpreter parses and interprets the instructions in a computer program.- See also :* Shebang * Bourne-Again Shell* C Shell...
- binfmt miscBinfmt miscbinfmt_misc is a capability of the Linux kernel which allows arbitrary executable file formats to be recognized and passed to certain user space applications, such as emulators and virtual machines....
- File associationFile associationA file association associates a file with an application capable of opening that file. More commonly, a file association associates a class of files with a corresponding application .-Associations and verbs:A single file extension may have several associations for performing various actions, also...
- Special CharactersSpecial CharactersSpecial characters have been given pronunciations similar to letters and numbers in a radio alphabet. The most common pronunciations originated with users of Unix systems....