Brainfuck
Encyclopedia
The brainfuck programming language is an esoteric programming language
noted for its extreme minimalism. It is a Turing tarpit
, designed to challenge and amuse programmer
s, and is not suitable for practical use. It was created in 1993 by Urban Müller.
The name of the language is generally not capitalized except at the start of a sentence, although it is a proper noun
.
, inspired by the 1024-byte compiler for the FALSE
programming language. Several brainfuck compilers have been made smaller than 200 bytes. The classic distribution is Müller's version 2, containing a compiler for the Amiga
, an interpreter, example programs, and a readme document.
The language consists of eight command
s, listed below. A brainfuck program is a sequence of these commands, possibly interspersed with other characters (which are ignored). The commands are executed sequentially, except as noted below; an instruction pointer
begins at the first command, and each command it points to is executed, after which it normally moves forward to the next command. The program terminates when the instruction pointer moves past the last command.
The brainfuck language uses a simple machine model consisting of the program and instruction pointer, as well as an array of at least 30,000 byte
cells initialized to zero; a movable data pointer (initialized to point to the leftmost byte of the array); and two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII
character encoding).
(Alternatively, the ] command may instead be translated as an unconditional jump to the corresponding [ command, or vice versa; programs will behave the same but will run more slowly, due to unnecessary double searching.)
*[ and ] match as parentheses usually do: each [ matches exactly one ] and vice versa, the [ comes first, and there can be no unmatched [ or ] between the two.
Brainfuck programs can be translated into C
using the following substitutions, assuming
As the name suggests, brainfuck programs tend to be difficult to comprehend. This is partly because any mildly complex task requires a long sequence of commands; partly it is because the program's text gives no direct indications of the program's state
. These, as well as brainfuck's inefficiency and its limited input/output capabilities, are some of the reasons it is not used for serious programming. Nonetheless, like any Turing-complete
language, brainfuck is theoretically capable of computing any computable function or simulating any other computational model, if given access to an unlimited amount of memory. A variety of brainfuck programs have been written. Although brainfuck programs, especially complicated ones, are difficult to write, it is quite trivial to write an interpreter for brainfuck in a more typical language such as C due to its simplicity. There even exists a brainfuck interpreter written in the brainfuck language itself.
in 1964. In fact, using six symbols equivalent to the respective brainfuck commands +, -, <, >, [, ], Böhm provided an explicit program for each of the basic functions that together serve to compute any computable function
. So in a very real sense, the first "brainfuck" programs appear in Böhm's 1964 paper – and they were programs sufficient to prove Turing-completeness.
and a newline to the screen:
For readability, this code has been spread across many lines and blanks and comments have been added. Brainfuck ignores all characters except the eight commands
The first line initialises
code for the character 'H'),
The next line moves the array pointer to
As 'l' happens to be the seventh letter after 'e', to output 'll' another seven are added (
'o' is the third letter after 'l', so
The rest of the program goes on in the same way. For the space and capital letters, different array cells are selected and incremented or decremented as needed.
cipher. To do this, it must map characters A-M (ASCII
65-77) to N-Z (78-90), and vice versa. Also it must map a-m (97-109) to n-z (110-122) and vice versa. It must map all other characters to themselves; it reads characters one at a time and outputs their enciphered equivalents until it reads an EOF (here assumed to be represented as either -1 or "no change"), at which point the program terminates.
The basic approach used is as follows. Calling the input character x, we divide x-1 by 32, keeping quotient and remainder. Unless the quotient is 2 or 3, we can just output x, having kept a copy of it during the division. If the quotient is 2 or 3, we can divide the remainder ((x-1) modulo 32) by 13; if the quotient here is 0, we output x+13; if 1, we output x-13; if 2, we output x.
Regarding the division algorithm, when dividing y by z to get a quotient q and remainder r, there is an outer loop which sets q and r first to the quotient and remainder of 1/z, then to those of 2/z, and so on; after it has executed y times, this outer loop terminates, leaving q and r set to the quotient and remainder of y/z. (The dividend y is used as a diminishing counter that controls how many times this loop is executed.) Within the loop, there is code to increment r and decrement y, which is usually sufficient; however, every zth time through the outer loop, it is necessary to zero r and increment q. This is done with a diminishing counter set to the divisor z; each time through the outer loop, this counter is decremented, and when it reaches zero, it is refilled by moving the value from r back into it.
condition from any possible byte value; thus 16-bit cells have also been used. Some implementations have used 32-bit cells, 64-bit cells, or bignum cells with practically unlimited range, but programs that use this extra range are likely to be slow, since storing the value n into a cell requires Ω(n) time as a cell’s value may only be changed by incrementing and decrementing.
In all these variants, the
Fortunately, it is usually easy to write brainfuck programs that do not ever cause integer wraparound or overflow, and therefore don't depend on cell size. Generally this means avoiding increment of +255 (unsigned 8-bit wraparound), or avoiding overstepping the boundaries of [-128, +127] (signed 8-bit wraparound) (since there are no comparison operators, a program cannot distinguish between a signed and unsigned two's complement
fixed-bit-size cell and negativeness of numbers is a matter of interpretation). For more details on integer wraparound, see the Integer overflow
article.
, and the easiest way to make the language Turing-complete is to make the array unlimited on the right.
A few implementations extend the array to the left as well; this is an uncommon feature, and therefore portable brainfuck programs do not depend on it.
When the pointer moves outside the bounds of the array, some implementations will give an error message, some will try to extend the array dynamically, some will not notice and will produce undefined behavior, and a few will move the pointer to the opposite end of the array. Some tradeoffs are involved: expanding the array dynamically to the right is the most user-friendly approach and is good for memory-hungry programs, but it carries a speed penalty. If a fixed-size array is used it is helpful to make it very large, or better yet let the user set the size. Giving an error message for bounds violations is very useful for debugging but even that carries a speed penalty unless it can be handled by the operating system's memory protections.
This assumption is also consistent with most of the world's sample code for C
and other languages, in that they use '\n', or 10, for their newlines. On systems that use CRLF line endings, the C standard library transparently remaps "\n" to "\r\n" on output and "\r\n" to "\n" on input for streams not opened in binary mode.
condition has been encountered varies. Some implementations set the cell at the pointer to 0, some set it to the C constant EOF (in practice this is usually -1), some leave the cell's value unchanged. There is no real consensus; arguments for the three behaviors are as follows.
Setting the cell to 0 avoids the use of negative numbers, and makes it marginally more concise to write a loop that reads characters until EOF occurs. This is a language extension devised by Panu Kalliokoski.
Setting the cell to -1 allows EOF to be distinguished from any byte value (if the cells are larger than bytes), which is necessary for reading non-textual data; also, it is the behavior of the C translation of
Leaving the cell's value unchanged is the behavior of Urban Müller's brainfuck compiler. This behavior can easily coexist with either of the others; for instance, a program that assumes EOF=0 can set the cell to 0 before each
Esoteric programming language
An esoteric programming language is a programming language designed as a test of the boundaries of computer programming language design, as a proof of concept, or as a joke...
noted for its extreme minimalism. It is a Turing tarpit
Turing tarpit
A Turing tarpit is any programming language or computer interface that allows for a great deal of flexibility in function but is difficult to learn and use because it offers little or no support for common tasks...
, designed to challenge and amuse programmer
Programmer
A programmer, computer programmer or coder is someone who writes computer software. The term computer programmer can refer to a specialist in one area of computer programming or to a generalist who writes code for many kinds of software. One who practices or professes a formal approach to...
s, and is not suitable for practical use. It was created in 1993 by Urban Müller.
The name of the language is generally not capitalized except at the start of a sentence, although it is a proper noun
Proper noun
A proper noun or proper name is a noun representing a unique entity , as distinguished from a common noun, which represents a class of entities —for example, city, planet, person or corporation)...
.
Language design
Urban Müller created brainfuck in 1993 with the intention of designing a language which could be implemented with the smallest possible compilerCompiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
, inspired by the 1024-byte compiler for the FALSE
FALSE
FALSE is an esoteric programming language designed by Wouter van Oortmerssen in 1993, named after his favorite Boolean value. It is a small Forth-like stack-oriented language, with syntax designed to make the code inherently obfuscated, confusing, and unreadable. It is also noteworthy for having a...
programming language. Several brainfuck compilers have been made smaller than 200 bytes. The classic distribution is Müller's version 2, containing a compiler for the Amiga
Amiga
The Amiga is a family of personal computers that was sold by Commodore in the 1980s and 1990s. The first model was launched in 1985 as a high-end home computer and became popular for its graphical, audio and multi-tasking abilities...
, an interpreter, example programs, and a readme document.
The language consists of eight command
Command (computing)
In computing, a command is a directive to a computer program acting as an interpreter of some kind, in order to perform a specific task. Most commonly a command is a directive to some kind of command line interface, such as a shell....
s, listed below. A brainfuck program is a sequence of these commands, possibly interspersed with other characters (which are ignored). The commands are executed sequentially, except as noted below; an instruction pointer
Program counter
The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...
begins at the first command, and each command it points to is executed, after which it normally moves forward to the next command. The program terminates when the instruction pointer moves past the last command.
The brainfuck language uses a simple machine model consisting of the program and instruction pointer, as well as an array of at least 30,000 byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...
cells initialized to zero; a movable data pointer (initialized to point to the leftmost byte of the array); and two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
character encoding).
Commands
The eight language commands, each consisting of a single character:Character | Meaning |
---|---|
> |
increment the data pointer (to point to the next cell to the right). |
< |
decrement the data pointer (to point to the next cell to the left). |
+ |
increment (increase by one) the byte at the data pointer. |
- |
decrement (decrease by one) the byte at the data pointer. |
. |
output a character, the ASCII value of which being the byte at the data pointer. |
, |
accept one byte of input, storing its value in the byte at the data pointer. |
[ |
if the byte at the data pointer is zero, then instead of moving the instruction pointer Program counter The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence... forward to the next command, jump Branch (computer science) A branch is sequence of code in a computer program which is conditionally executed depending on whether the flow of control is altered or not . The term can be used when referring to programs in high level languages as well as program written in machine code or assembly language... it forward to the command after the matching ] command*. |
] |
if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching [ command*. |
(Alternatively, the ] command may instead be translated as an unconditional jump to the corresponding [ command, or vice versa; programs will behave the same but will run more slowly, due to unnecessary double searching.)
*[ and ] match as parentheses usually do: each [ matches exactly one ] and vice versa, the [ comes first, and there can be no unmatched [ or ] between the two.
Brainfuck programs can be translated into C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
using the following substitutions, assuming
ptr
is of type unsigned char*
and has been initialized to point to an array of zeroed bytes:
brainfuck command | C equivalent | Perl equivalent |
---|---|---|
> |
++ptr; |
$pointer++; |
< |
--ptr; |
$pointer--; |
+ |
++*ptr; |
$tape[$pointer]++; |
- |
--*ptr; |
$tape[$pointer]--; |
. |
putchar(*ptr); |
print chr$tape[$pointer]; |
, |
*ptr=getchar; |
$tape[$pointer]=ord(<>); |
[ |
while (*ptr) { |
while($tape[$pointer]){ |
] |
} |
} |
As the name suggests, brainfuck programs tend to be difficult to comprehend. This is partly because any mildly complex task requires a long sequence of commands; partly it is because the program's text gives no direct indications of the program's state
State (computer science)
In computer science and automata theory, a state is a unique configuration of information in a program or machine. It is a concept that occasionally extends into some forms of systems programming such as lexers and parsers....
. These, as well as brainfuck's inefficiency and its limited input/output capabilities, are some of the reasons it is not used for serious programming. Nonetheless, like any Turing-complete
Turing completeness
In computability theory, a system of data-manipulation rules is said to be Turing complete or computationally universal if and only if it can be used to simulate any single-taped Turing machine and thus in principle any computer. A classic example is the lambda calculus...
language, brainfuck is theoretically capable of computing any computable function or simulating any other computational model, if given access to an unlimited amount of memory. A variety of brainfuck programs have been written. Although brainfuck programs, especially complicated ones, are difficult to write, it is quite trivial to write an interpreter for brainfuck in a more typical language such as C due to its simplicity. There even exists a brainfuck interpreter written in the brainfuck language itself.
Brainfuck's formal "parent language"
Except for its two I/O commands, brainfuck is a minor variation of the formal programming language P′′ created by Corrado BöhmCorrado Böhm
Corrado Böhm , Professor Emeritus at the University of Rome "La Sapienza", is a computer scientist known especially for his contributions to the theory of structured programming, constructive mathematics, combinatory logic, lambda-calculus, and the semantics and implementation of functional...
in 1964. In fact, using six symbols equivalent to the respective brainfuck commands +, -, <, >, [, ], Böhm provided an explicit program for each of the basic functions that together serve to compute any computable function
Computable function
Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithm. They are used to discuss computability without referring to any concrete model of computation such as Turing machines or register...
. So in a very real sense, the first "brainfuck" programs appear in Böhm's 1964 paper – and they were programs sufficient to prove Turing-completeness.
Hello World!
The following program prints "Hello World!"Hello world program
A "Hello world" program is a computer program that outputs "Hello world" on a display device. Because it is typically one of the simplest programs possible in most programming languages, it is by tradition often used to illustrate to beginners the most basic syntax of a programming language, or to...
and a newline to the screen:
For readability, this code has been spread across many lines and blanks and comments have been added. Brainfuck ignores all characters except the eight commands
+-<>[],.
so no special syntax for comments is needed (as long as the comments don't contain the command characters). The code could just as well have been written as:The first line initialises
a[0] = 10
by simply incrementing ten times from 0. The loop from line 2 effectively sets the initial values for the array: a[1] = 70 (close to 72, the ASCIIASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
code for the character 'H'),
a[2] = 100
(close to 101 or 'e'), a[3] = 30
(close to 32, the code for space) and a[4] = 10
(newline). The loop works by multiplying the value of a[0]
, 10
, by 7, 10, 3, and 1, saving the results in other cells. After the loop is finished, a[0] is zero. >++.
then moves the pointer to a[1]
which holds 70
, adds two to it (producing 72 which is the ASCII character code of a capital H), and outputs it.The next line moves the array pointer to
a[2]
and adds one to it, producing 101
, a lower-case 'e', which is then output.As 'l' happens to be the seventh letter after 'e', to output 'll' another seven are added (
+++++++
) to a[2]
and the result is output twice.'o' is the third letter after 'l', so
a[2]
is incremented three more times and output the result.The rest of the program goes on in the same way. For the space and capital letters, different array cells are selected and incremented or decremented as needed.
ROT13
This program enciphers its input with the ROT13ROT13
ROT13 is a simple substitution cipher used in online forums as a means of hiding spoilers, punchlines, puzzle solutions, and offensive materials from the casual glance. ROT13 has been described as the "Usenet equivalent of a magazine printing the answer to a quiz upside down"...
cipher. To do this, it must map characters A-M (ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
65-77) to N-Z (78-90), and vice versa. Also it must map a-m (97-109) to n-z (110-122) and vice versa. It must map all other characters to themselves; it reads characters one at a time and outputs their enciphered equivalents until it reads an EOF (here assumed to be represented as either -1 or "no change"), at which point the program terminates.
The basic approach used is as follows. Calling the input character x, we divide x-1 by 32, keeping quotient and remainder. Unless the quotient is 2 or 3, we can just output x, having kept a copy of it during the division. If the quotient is 2 or 3, we can divide the remainder ((x-1) modulo 32) by 13; if the quotient here is 0, we output x+13; if 1, we output x-13; if 2, we output x.
Regarding the division algorithm, when dividing y by z to get a quotient q and remainder r, there is an outer loop which sets q and r first to the quotient and remainder of 1/z, then to those of 2/z, and so on; after it has executed y times, this outer loop terminates, leaving q and r set to the quotient and remainder of y/z. (The dividend y is used as a diminishing counter that controls how many times this loop is executed.) Within the loop, there is code to increment r and decrement y, which is usually sufficient; however, every zth time through the outer loop, it is necessary to zero r and increment q. This is done with a diminishing counter set to the divisor z; each time through the outer loop, this counter is decremented, and when it reaches zero, it is refilled by moving the value from r back into it.
Portability issues
Partly because Urban Müller did not write a thorough language specification, the many subsequent brainfuck interpreters and compilers have come to use slightly different dialects of brainfuck.Cell size
In the classic distribution, the cells are of 8-bit size (cells are bytes), and this is still the most common size. However, to read non-textual data, a brainfuck program may need to distinguish an end-of-fileEnd-of-file
In computing, end of file is a condition in a computer operating system where no more data can be read from a data source...
condition from any possible byte value; thus 16-bit cells have also been used. Some implementations have used 32-bit cells, 64-bit cells, or bignum cells with practically unlimited range, but programs that use this extra range are likely to be slow, since storing the value n into a cell requires Ω(n) time as a cell’s value may only be changed by incrementing and decrementing.
In all these variants, the
,
and .
commands still read and write data in bytes. In most of them, the cells wrap around, i.e. incrementing a cell which holds its maximal value (with the +
command) will bring it to its minimal value and vice versa. The exceptions are implementations which are distant from the underlying hardware, implementations that use bignums, and implementations that try to enforce portability.Fortunately, it is usually easy to write brainfuck programs that do not ever cause integer wraparound or overflow, and therefore don't depend on cell size. Generally this means avoiding increment of +255 (unsigned 8-bit wraparound), or avoiding overstepping the boundaries of [-128, +127] (signed 8-bit wraparound) (since there are no comparison operators, a program cannot distinguish between a signed and unsigned two's complement
Two's complement
The two's complement of a binary number is defined as the value obtained by subtracting the number from a large power of two...
fixed-bit-size cell and negativeness of numbers is a matter of interpretation). For more details on integer wraparound, see the Integer overflow
Integer overflow
In computer programming, an integer overflow occurs when an arithmetic operation attempts to create a numeric value that is too large to be represented within the available storage space. For instance, adding 1 to the largest value that can be represented constitutes an integer overflow...
article.
Array size
In the classic distribution, the array has 30,000 cells, and the pointer begins at the leftmost cell. Even more cells are needed to store things like the millionth Fibonacci numberFibonacci number
In mathematics, the Fibonacci numbers are the numbers in the following integer sequence:0,\;1,\;1,\;2,\;3,\;5,\;8,\;13,\;21,\;34,\;55,\;89,\;144,\; \ldots\; ....
, and the easiest way to make the language Turing-complete is to make the array unlimited on the right.
A few implementations extend the array to the left as well; this is an uncommon feature, and therefore portable brainfuck programs do not depend on it.
When the pointer moves outside the bounds of the array, some implementations will give an error message, some will try to extend the array dynamically, some will not notice and will produce undefined behavior, and a few will move the pointer to the opposite end of the array. Some tradeoffs are involved: expanding the array dynamically to the right is the most user-friendly approach and is good for memory-hungry programs, but it carries a speed penalty. If a fixed-size array is used it is helpful to make it very large, or better yet let the user set the size. Giving an error message for bounds violations is very useful for debugging but even that carries a speed penalty unless it can be handled by the operating system's memory protections.
End-of-line code
Different operating systems (and sometimes different programming environments) use subtly different versions of ASCII. The most important difference is in the code used for the end of a line of text. MS-DOS and Microsoft Windows use a CRLF, i.e. a 13 followed by a 10, in most contexts. UNIX and its descendants, including Linux and Mac OS X, use just 10, and older Macs use just 13. It would be unfortunate if brainfuck programs had to be rewritten for different operating systems. Fortunately, a unified standard is easy to find. Urban Müller's compiler and his example programs use 10, on both input and output; so do a large majority of existing brainfuck programs; and 10 is also more convenient to use than CRLF. Thus, brainfuck implementations should make sure that brainfuck programs that assume newline=10 will run properly; many do so, but some do not.This assumption is also consistent with most of the world's sample code for C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
and other languages, in that they use '\n', or 10, for their newlines. On systems that use CRLF line endings, the C standard library transparently remaps "\n" to "\r\n" on output and "\r\n" to "\n" on input for streams not opened in binary mode.
End-of-file behavior
The behavior of the,
command when an end-of-fileEnd-of-file
In computing, end of file is a condition in a computer operating system where no more data can be read from a data source...
condition has been encountered varies. Some implementations set the cell at the pointer to 0, some set it to the C constant EOF (in practice this is usually -1), some leave the cell's value unchanged. There is no real consensus; arguments for the three behaviors are as follows.
Setting the cell to 0 avoids the use of negative numbers, and makes it marginally more concise to write a loop that reads characters until EOF occurs. This is a language extension devised by Panu Kalliokoski.
Setting the cell to -1 allows EOF to be distinguished from any byte value (if the cells are larger than bytes), which is necessary for reading non-textual data; also, it is the behavior of the C translation of
,
given in Müller's readme file. However, it is not obvious that those C translations are to be taken as normative.Leaving the cell's value unchanged is the behavior of Urban Müller's brainfuck compiler. This behavior can easily coexist with either of the others; for instance, a program that assumes EOF=0 can set the cell to 0 before each
,
command, and will then work correctly on implementations that do either EOF=0 or EOF="no change". It is so easy to accommodate the "no change" behavior that any brainfuck programmer interested in portability should do so.Miscellaneous dialects
Many people have modified brainfuck in order to produce their own languages, often by adding commands, occasionally by removing them. Many of the resulting languages are included in the brainfuck derivatives list on the Esoteric Languages wiki. However, there are also unnamed minor variants, formed possibly as a result of inattention, of which some of the more common are:- forbidding, rather than ignoring, any non-command characters in brainfuck programs
- introducing a comment marker which comments out the rest of the line
- various alterations of the loop semantics, sometimes destroying Turing completenessTuring completenessIn computability theory, a system of data-manipulation rules is said to be Turing complete or computationally universal if and only if it can be used to simulate any single-taped Turing machine and thus in principle any computer. A classic example is the lambda calculus...
- requiring a special character to mark the end of the program
External links
- The Brainfuck Archive
- Brainfuck on the Esolang (Esoteric Languages) wiki
- Visual brainfuck, a brainfuck IDE compatible with Windows 7 x86 and x64.
- Brainfuck Developer, Brainfuck IDE for Windows (also works with WINE under Linux)
- bf2c, a Brainfuck to C translator.
- ookie, a Brainfuck and Ook! language interpreter
- Ook.jar, another Ook! and Brainfuck language interpreter, this time written in Java.
- BrainForce, compiler (wrap gcc) and C translator (has lots of options to control wrapping values, cell sizes, etc.)
- esolang, a Brainfuck interpreter for iPhone written in objective c.
- Le Brainfuck, a Javascript based optimizing interpreter. Also has many options, including memory dumping.
- Brainfuck C, a Brainfuck interpreter written in C.
- Brainfuck Java, a Brainfuck interpreter written in Java.