Salsa20
Encyclopedia
Salsa20 is a stream cipher
submitted to eSTREAM
by Daniel Bernstein. It is built on a pseudorandom function
based on 32-bit addition, bitwise addition (XOR) and rotation operations, which maps a 256-bit
key
, a 64-bit nonce
(number used once), and a 64-bit stream position to a 512-bit output (a version with a 128-bit key also exists). This gives Salsa20 the unusual advantage that the user can efficiently seek to any position in the output stream. It offers speeds of around 4–14 cycles per byte
in software on modern x86 processors, and reasonable hardware performance. It is not patented, and Bernstein has written several public domain
implementations optimized for common architectures.
232, and constant-distance rotation operations on an internal state of 16 32-bit words. This choice of operations avoids the possibility of timing attack
s in software implementations.
The initial state is made up of 8 words of key, 2 words of stream position, 2 words of nonce (essentially additional stream position bits), and 4 fixed words. 20 rounds of mixing produce 16 words of stream cipher output.
Each round consists of four quarter-round operations, performed on either the columns or the rows of the 16-word state, arranged as a 4×4 matrix. Every 2 rounds, the pattern repeats. Pseudocode for two rounds is as follows. ⊞ is addition modulo 232, <<< is the left-rotate operation, and ^ is exclusive-or.
x[ 4] ^= (x[ 0] ⊞ x[12])<<<7; x[ 9] ^= (x[ 5] ⊞ x[ 1])<<<7;
x[14] ^= (x[10] ⊞ x[ 6])<<<7; x[ 3] ^= (x[15] ⊞ x[11])<<<7;
x[ 8] ^= (x[ 4] ⊞ x[ 0])<<<9; x[13] ^= (x[ 9] ⊞ x[ 5])<<<9;
x[ 2] ^= (x[14] ⊞ x[10])<<<9; x[ 7] ^= (x[ 3] ⊞ x[15])<<<9;
x[12] ^= (x[ 8] ⊞ x[ 4])<<<13; x[ 1] ^= (x[13] ⊞ x[ 9])<<<13;
x[ 6] ^= (x[ 2] ⊞ x[14])<<<13; x[11] ^= (x[ 7] ⊞ x[ 3])<<<13;
x[ 0] ^= (x[12] ⊞ x[ 8])<<<18; x[ 5] ^= (x[ 1] ⊞ x[13])<<<18;
x[10] ^= (x[ 6] ⊞ x[ 2])<<<18; x[15] ^= (x[11] ⊞ x[ 7])<<<18;
x[ 1] ^= (x[ 0] ⊞ x[ 3])<<<7; x[ 6] ^= (x[ 5] ⊞ x[ 4])<<<7;
x[11] ^= (x[10] ⊞ x[ 9])<<<7; x[12] ^= (x[15] ⊞ x[14])<<<7;
x[ 2] ^= (x[ 1] ⊞ x[ 0])<<<9; x[ 7] ^= (x[ 6] ⊞ x[ 5])<<<9;
x[ 8] ^= (x[11] ⊞ x[10])<<<9; x[13] ^= (x[12] ⊞ x[15])<<<9;
x[ 3] ^= (x[ 2] ⊞ x[ 1])<<<13; x[ 4] ^= (x[ 7] ⊞ x[ 6])<<<13;
x[ 9] ^= (x[ 8] ⊞ x[11])<<<13; x[14] ^= (x[13] ⊞ x[12])<<<13;
x[ 0] ^= (x[ 3] ⊞ x[ 2])<<<18; x[ 5] ^= (x[ 4] ⊞ x[ 7])<<<18;
x[10] ^= (x[ 9] ⊞ x[ 8])<<<18; x[15] ^= (x[14] ⊞ x[13])<<<18;
These operations can be performed very efficiently using the x86 SSE2
instruction set.
Salsa20 performs 20 rounds of mixing on its input. However, reduced round variants Salsa20/8 and Salsa20/12 using 8 and 12 rounds respectively have also been introduced. These variants were introduced to complement the original Salsa20, not to replace it, and perform even better in the eSTREAM benchmarks than the already competitive Salsa20.
project, receiving the highest weighted voting score of any Profile 1 algorithm at the end of Phase 2. Salsa20 had previously been selected as Phase 2 Focus design for Profile 1 (software) and as a Phase 2 design for Profile 2 (hardware) by the eSTREAM project , but was not advanced to Phase 3 for Profile 2 because eSTREAM felt that it was probably not a good candidate for extremely resource constrained hardware environments.
. In 2006, Fischer, Meier, Berbain, Biasse, and Robshaw reported an attack on Salsa20/6 with estimated time complexity of 2177, and a related-key attack on Salsa20/7 with estimated time complexity of 2217.
In 2007, Tsunoo et al. announced a cryptanalysis of Salsa20 which breaks 8 out of 20 rounds to recover the 256-bit secret key in 2255 operations, using 211.37 keystream pairs.. However, this attack does not seem to be comparative with the brute force attack.
, Aumasson, Fischer, Khazaei, Meier, and Rechberger reported a cryptanalytic attack against Salsa20/7 with a time complexity of 2153, and they reported the first attack against Salsa20/8 with an estimated time complexity of 2251. This attack makes use of the new concept of probabilistic neutral key bits for probabilistic detection of a truncated differential. The attack can be adapted to break Salsa20/7 with a 128-bit key. There are no published attacks on Salsa20/12 or the full Salsa20/20.
ChaCha replaces the basic Salsa20 round primitive
b ⊕= (a ⊞ c) <<< k;
with the modified computation:
b ⊞= c;
a ⊕= b;
a <<<= k;
The rotation amounts are also updated. A full quarter-round becomes:
a ⊞= b; d ⊕= a; d <<<= 16;
c ⊞= d; b ⊕= c; b <<<= 12;
a ⊞= b; d ⊕= a; d <<<= 8;
c ⊞= d; b ⊕= c; b <<<= 7;
In addition to being more efficient on 2-operand instruction sets (like x86), this updates each word twice per quarter-round.
The fact that two of the rotates are multiple of 8 allows some optimization. Additionally, the input formatting is rearranged to support an efficient SSE
implementation optimization discovered for Salsa20. Rather than alternating rounds down columns and across rows, they are performed down columns and along diagonals.
ChaCha is the basis of the BLAKE hash function
, a finalist in the NIST hash function competition
.
Stream cipher
In cryptography, a stream cipher is a symmetric key cipher where plaintext digits are combined with a pseudorandom cipher digit stream . In a stream cipher the plaintext digits are encrypted one at a time, and the transformation of successive digits varies during the encryption...
submitted to eSTREAM
ESTREAM
eSTREAM is a project to "identify new stream ciphers suitable for widespread adoption", organised by the EU ECRYPT network. It was set up as a result of the failure of all six stream ciphers submitted to the NESSIE project. The call for primitives was first issued in November 2004. The project was...
by Daniel Bernstein. It is built on a pseudorandom function
Pseudorandom function
In cryptography, a pseudorandom function family, abbreviated PRF, is a collection of efficiently-computable functions which emulate a random oracle in the following way: no efficient algorithm can distinguish between a function chosen randomly from the PRF family and a random oracle...
based on 32-bit addition, bitwise addition (XOR) and rotation operations, which maps a 256-bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
key
Key size
In cryptography, key size or key length is the size measured in bits of the key used in a cryptographic algorithm . An algorithm's key length is distinct from its cryptographic security, which is a logarithmic measure of the fastest known computational attack on the algorithm, also measured in bits...
, a 64-bit nonce
Cryptographic nonce
In security engineering, nonce is an arbitrary number used only once to sign a cryptographic communication. It is similar in spirit to a nonce word, hence the name. It is often a random or pseudo-random number issued in an authentication protocol to ensure that old communications cannot be reused...
(number used once), and a 64-bit stream position to a 512-bit output (a version with a 128-bit key also exists). This gives Salsa20 the unusual advantage that the user can efficiently seek to any position in the output stream. It offers speeds of around 4–14 cycles per byte
Cycles per byte
Cycles per byte is a unit of measurement which indicates the number of clock cycles a microprocessor will perform per byte of data processed in an algorithm. It is commonly used as a partial indicator of real-world performance in cryptographic functions....
in software on modern x86 processors, and reasonable hardware performance. It is not patented, and Bernstein has written several public domain
Public domain
Works are in the public domain if the intellectual property rights have expired, if the intellectual property rights are forfeited, or if they are not covered by intellectual property rights at all...
implementations optimized for common architectures.
Structure
Internally, the cipher uses bitwise addition (exclusive OR), 32-bit addition modModular arithmetic
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....
232, and constant-distance rotation operations on an internal state of 16 32-bit words. This choice of operations avoids the possibility of timing attack
Timing attack
In cryptography, a timing attack is a side channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms...
s in software implementations.
The initial state is made up of 8 words of key, 2 words of stream position, 2 words of nonce (essentially additional stream position bits), and 4 fixed words. 20 rounds of mixing produce 16 words of stream cipher output.
Each round consists of four quarter-round operations, performed on either the columns or the rows of the 16-word state, arranged as a 4×4 matrix. Every 2 rounds, the pattern repeats. Pseudocode for two rounds is as follows. ⊞ is addition modulo 232, <<< is the left-rotate operation, and ^ is exclusive-or.
x ^= y
is an abbreviation for x = x ^ y
.x[ 4] ^= (x[ 0] ⊞ x[12])<<<7; x[ 9] ^= (x[ 5] ⊞ x[ 1])<<<7;
x[14] ^= (x[10] ⊞ x[ 6])<<<7; x[ 3] ^= (x[15] ⊞ x[11])<<<7;
x[ 8] ^= (x[ 4] ⊞ x[ 0])<<<9; x[13] ^= (x[ 9] ⊞ x[ 5])<<<9;
x[ 2] ^= (x[14] ⊞ x[10])<<<9; x[ 7] ^= (x[ 3] ⊞ x[15])<<<9;
x[12] ^= (x[ 8] ⊞ x[ 4])<<<13; x[ 1] ^= (x[13] ⊞ x[ 9])<<<13;
x[ 6] ^= (x[ 2] ⊞ x[14])<<<13; x[11] ^= (x[ 7] ⊞ x[ 3])<<<13;
x[ 0] ^= (x[12] ⊞ x[ 8])<<<18; x[ 5] ^= (x[ 1] ⊞ x[13])<<<18;
x[10] ^= (x[ 6] ⊞ x[ 2])<<<18; x[15] ^= (x[11] ⊞ x[ 7])<<<18;
x[ 1] ^= (x[ 0] ⊞ x[ 3])<<<7; x[ 6] ^= (x[ 5] ⊞ x[ 4])<<<7;
x[11] ^= (x[10] ⊞ x[ 9])<<<7; x[12] ^= (x[15] ⊞ x[14])<<<7;
x[ 2] ^= (x[ 1] ⊞ x[ 0])<<<9; x[ 7] ^= (x[ 6] ⊞ x[ 5])<<<9;
x[ 8] ^= (x[11] ⊞ x[10])<<<9; x[13] ^= (x[12] ⊞ x[15])<<<9;
x[ 3] ^= (x[ 2] ⊞ x[ 1])<<<13; x[ 4] ^= (x[ 7] ⊞ x[ 6])<<<13;
x[ 9] ^= (x[ 8] ⊞ x[11])<<<13; x[14] ^= (x[13] ⊞ x[12])<<<13;
x[ 0] ^= (x[ 3] ⊞ x[ 2])<<<18; x[ 5] ^= (x[ 4] ⊞ x[ 7])<<<18;
x[10] ^= (x[ 9] ⊞ x[ 8])<<<18; x[15] ^= (x[14] ⊞ x[13])<<<18;
These operations can be performed very efficiently using the x86 SSE2
SSE2
SSE2, Streaming SIMD Extensions 2, is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2001. It extends the earlier SSE instruction set, and is intended to fully supplant MMX. Intel extended SSE2 to create SSE3...
instruction set.
Salsa20 performs 20 rounds of mixing on its input. However, reduced round variants Salsa20/8 and Salsa20/12 using 8 and 12 rounds respectively have also been introduced. These variants were introduced to complement the original Salsa20, not to replace it, and perform even better in the eSTREAM benchmarks than the already competitive Salsa20.
eSTREAM selection
Salsa20 has been selected as a Phase 3 design for Profile 1 (software) by the eSTREAMESTREAM
eSTREAM is a project to "identify new stream ciphers suitable for widespread adoption", organised by the EU ECRYPT network. It was set up as a result of the failure of all six stream ciphers submitted to the NESSIE project. The call for primitives was first issued in November 2004. The project was...
project, receiving the highest weighted voting score of any Profile 1 algorithm at the end of Phase 2. Salsa20 had previously been selected as Phase 2 Focus design for Profile 1 (software) and as a Phase 2 design for Profile 2 (hardware) by the eSTREAM project , but was not advanced to Phase 3 for Profile 2 because eSTREAM felt that it was probably not a good candidate for extremely resource constrained hardware environments.
Cryptanalysis
In 2005, Paul Crowley reported an attack on Salsa20/5 with an estimated time complexity of 2165, and won Bernstein's US$1000 prize for "most interesting Salsa20 cryptanalysis". This attack, and all subsequent attacks are based on truncated differential cryptanalysisTruncated differential cryptanalysis
In cryptography, truncated differential cryptanalysis is a generalization of differential cryptanalysis, an attack against block ciphers. Lars Knudsen developed the technique in 1994. Whereas ordinary differential cryptanalysis analyzes the full difference between two texts, the truncated variant...
. In 2006, Fischer, Meier, Berbain, Biasse, and Robshaw reported an attack on Salsa20/6 with estimated time complexity of 2177, and a related-key attack on Salsa20/7 with estimated time complexity of 2217.
In 2007, Tsunoo et al. announced a cryptanalysis of Salsa20 which breaks 8 out of 20 rounds to recover the 256-bit secret key in 2255 operations, using 211.37 keystream pairs.. However, this attack does not seem to be comparative with the brute force attack.
, Aumasson, Fischer, Khazaei, Meier, and Rechberger reported a cryptanalytic attack against Salsa20/7 with a time complexity of 2153, and they reported the first attack against Salsa20/8 with an estimated time complexity of 2251. This attack makes use of the new concept of probabilistic neutral key bits for probabilistic detection of a truncated differential. The attack can be adapted to break Salsa20/7 with a 128-bit key. There are no published attacks on Salsa20/12 or the full Salsa20/20.
ChaCha variant
In 2008, Bernstein published the closely related "ChaCha" family of ciphers, which aim to increase the diffusion per round while achieving the same or slightly greater performance. The Aumasson et al. paper also attacks ChaCha, achieving one round less: ChaCha6 with complexity 2140 and ChaCha7 with complexity 2231.ChaCha replaces the basic Salsa20 round primitive
R(a,b,c,k)
b ⊕= (a ⊞ c) <<< k;
with the modified computation:
b ⊞= c;
a ⊕= b;
a <<<= k;
The rotation amounts are also updated. A full quarter-round becomes:
a ⊞= b; d ⊕= a; d <<<= 16;
c ⊞= d; b ⊕= c; b <<<= 12;
a ⊞= b; d ⊕= a; d <<<= 8;
c ⊞= d; b ⊕= c; b <<<= 7;
In addition to being more efficient on 2-operand instruction sets (like x86), this updates each word twice per quarter-round.
The fact that two of the rotates are multiple of 8 allows some optimization. Additionally, the input formatting is rearranged to support an efficient SSE
Streaming SIMD Extensions
In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...
implementation optimization discovered for Salsa20. Rather than alternating rounds down columns and across rows, they are performed down columns and along diagonals.
ChaCha is the basis of the BLAKE hash function
BLAKE (hash function)
BLAKE is a cryptographic hash function submitted to the NIST hash function competition by Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and Raphael C.-W. Phan. It is based on Dan Bernstein's ChaCha stream cipher, but a permuted copy of the input block, XORed with some round constants, is added...
, a finalist in the NIST hash function competition
NIST hash function competition
The NIST hash function competition is an open competition held by the US National Institute of Standards and Technology for a new SHA-3 function to replace the older SHA-1 and SHA-2, which was formally announced in the Federal Register on November 2, 2007...
.