Format-Preserving Encryption
Encyclopedia
In cryptography
, format-preserving encryption (FPE) refers to encrypting in such a way that the output (the ciphertext
) is in the same format as the input (the plaintext
). The meaning of "format" varies. Typically only finite domains are discussed, for example:
For such finite domains, and for the purposes of the discussion below, the cipher is equivalent to a permutation of N integers } where N is the size of the domain.
Using AES-128-ECB, this credit card number might get encrypted to the hexadecimal
value
encoded to ensure that it only contains valid characters, the size of the encrypted credit card number increases from 16 bytes to 24 bytes, changing the encrypted credit card number to
Using AES-128-CBC, this credit card number might get encrypted to the hexadecimal value
that is used to initialize the encryption algorithm and is included as part of the encrypted value is different for each encryption operation. Because of this, it is impossible to use data that has been encrypted with the CBC mode as a unique key to identify a row in a database.
Comparison to truly random permutations
Although a truly random permutation is the ideal FPE cipher, for large domains it is infeasible to pre-generate and remember a truly random permutation. So the problem of FPE is to generate a pseudorandom permutation from a secret key, in such a way that the computation time for a single value is small (ideally constant, but most importantly not O(N)).
Comparison to block ciphers
An n-bit block cipher (such as AES) technically is a FPE on the set }. If you need an FPE on one of these standard sized sets (where n=128, 192, 256) you may as well use a block cipher of the right size.
However, in typical usage, a block cipher is used in a mode of operation
that allows it to encrypt arbitrarily long messages, and with an initialization vector
as discussed above. In this mode, a block cipher is not a FPE.
Definition of security for FPE's
In cryptographic literature (see most of the references below), the measure of a "good" FPE is whether an attacker can distinguish the FPE from a truly random permutation. Various types of attackers are postulated, depending on whether they have access to oracles or known ciphertext/plaintext pairs.
FPE algorithms
In most of the approaches listed here, a well-understood block cipher
(such as AES
) is used as a primitive to take the place of an ideal random function. This has the advantage that incorporation of a secret key into the algorithm is easy. Where AES is mentioned in the following discussion, any other good block cipher would work as well.
, which described three ways to do this. They proved that each of these techniques is as secure as the block cipher that is used to construct it. This means that if the AES algorithm is used to create an FPE algorithm, then the resulting FPE algorithm is as secure as AES because an adversary capable of defeating the FPE algorithm can also defeat the AES algorithm. So if we assume that AES is secure, then the FPE algorithms constructed from it are also secure. In all of the following, we use E to denote the AES encryption operation that is used to construct an FPE algorithm and F to denote the FPE encryption operation.
Thus, to create a FPE on the domain {0,1,2,3}, given a key K apply AES(K) to each integer, giving, for example,
Sorting [0,1,2,3] by weight gives [3,1,2,0], so your cipher is
This method is only useful for small values of N. For larger values, the size of the lookup table and the required number of encryptions to initialize the table gets too big to be practical.
The recursion is guaranteed to terminate. (Because P is one-to-one and the domain is finite, repeated application of P forms a cycle, so starting with a point in M the cycle will eventually terminate in M.)
This has the advantage that you don't have to map the elements of M to a consecutive sequence {0,...,N-1} of integers. It has the disadvantage, when M is much smaller than P's domain, that too many iterations might be required for each operation. If P is a block cipher of a fixed size, such as AES, this is a severe restriction on the sizes of M for which this method is efficient.
For example, suppose that we want to encrypt 100-bit values with AES in a way that creates another 100-bit value. With this technique, apply AES-128-ECB encryption until it reaches a value which has all of its 28 highest bits set to 0, which will take an average of 214 iterations to happen.
. A Feistel network needs a source of pseudo-random values for the sub-keys for each round, and the output of the AES algorithm can be used as these pseudo-random values. When this is done, the resulting Feistel construction is good if enough rounds are used.
One way to implement an FPE algorithm using AES and a Feistel network is to use as many bits of AES output as are needed to equal the length of the left or right halves of the Feistel network. If a 24-bit value is needed as a sub-key, for example, it is possible to use the lowest 24 bits of the output of AES for this value.
This may not result in the output of the Feistel network preserving the format of the input, but it is possible to iterate the Feistel network in the same way that the cycle-walking technique does to ensure that format can be preserved. Because it is possible to adjust the size of the inputs to a Feistel network, it is possible to make it very likely that this iteration ends very quickly on average. In the case of credit card numbers, for example, there are 1016 possible 16-digit credit card numbers, and because the 1016 = 253.1, using a 54-bit wide Feistel network along with cycle walking will create an FPE algorithm that encrypts fairly quickly on average.
uses custom constructions (not depending on existing block ciphers as primitives) to encrypt arbitrary finite small domains.
As of February 2010, FFSEM has been superseded by the FFX mode written by Mihir Bellare
, Phillip Rogaway, and Terence Spies. (specification).
Section 8 of the FIPS
74, Federal Information Processing Standards Publication 1981 Guidelines for Implementing and Using the NBS Data Encryption Standard, describes a way to use the DES encryption algorithm in a manner that preserves the format of the data via modulo-n addition followed by an unbiasing operation. This standard was withdrawn on May 19, 2005, so the technique should be considered obsolete in terms of being a formal standard.
Another early mechanism for format-preserving encryption was Peter Gutmann
s "Encrypting data with a restricted range of values" which again performs modulo-n addition on any cipher with some adjustments to make the result uniform, with the resulting encryption being as strong as the underlying encryption algorithm that it's based on.
The paper "Using Datatype-Preserving Encryption to Enhance Data Warehouse Security" by Michael Brightwell and Harry Smith describes a way to use the DES
encryption algorithm in a way that preserves the format of the plaintext. This technique doesn't appear to apply an unbiasing step as do the other modulo-n techniques referenced here.
The paper "Format-Preserving Encryption" by Mihir Bellare
and Thomas Ristenpart describes using "nearly balanced" Feistel networks to create secure FPE algorithms.
The paper "Format Controlling Encryption Using Datatype Preserving Encryption" by Ulf Mattsson describes other ways to create FPE algorithms.
Acceptance of FPE algorithms by standards authorities
An approach based on one of these techniques has been accepted by NIST for consideration as an approved mode of the AES algorithm under the name "The FFX Mode of Operation for Format-Preserving Encryption" (defined in ). FFX is a proposed mode of AES. FFX mode is also in consideration by ANSI X9 (X9.119). VISA
has also issued guidance on the use of Format Preserving Encryption in .
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...
, format-preserving encryption (FPE) refers to encrypting in such a way that the output (the ciphertext
Ciphertext
In cryptography, ciphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher...
) is in the same format as the input (the plaintext
Plaintext
In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is often used as a synonym. Before the computer era, plaintext most commonly meant message text in the language of the communicating parties....
). The meaning of "format" varies. Typically only finite domains are discussed, for example:
- To encrypt a 16-digit credit card number so that the ciphertext is another 16-digit number.
- To encrypt an English word so that the ciphertext is another English word.
- To encrypt an n-bit number so that the ciphertext is another n-bit number. (That's actually the definition of an n-bit block cipher.)
For such finite domains, and for the purposes of the discussion below, the cipher is equivalent to a permutation of N integers } where N is the size of the domain.
Restricted field lengths or formats
One motivation for using FPE comes from the problems associated with integrating encryption into existing applications. Suppose that you want to encrypt the 16-digit credit card number1234567812345670
. Using other modes of the AES algorithm like ECB or CBC will transform a credit card number into a large, fixed-length, binary value.Using AES-128-ECB, this credit card number might get encrypted to the hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
value
0x96a45cbcf9c2a9425cde9e274948cb67
, which contains many bytes that are considered invalid when compared to a typical credit card number. If a credit card number is stored in a column of a database whose entries are char
or varchar
data, then the encrypted data cannot be stored in same column without changing the format of the column. If the encryped data is Base64Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...
encoded to ensure that it only contains valid characters, the size of the encrypted credit card number increases from 16 bytes to 24 bytes, changing the encrypted credit card number to
lqRcvPnCqUJc3p4nSUjLZw
, for example. In either case, applications that process the credit number may similarly be unable to handle an encrypted value without some modification.Using AES-128-CBC, this credit card number might get encrypted to the hexadecimal value
0xde015724b081ea7003de4593d792fd8b695b39e095c98f3a220ff43522a2df02
. In addition to the problems caused by creating invalid characters and increasing the size of the data, data encrypted using the CBC mode of an encryption algorithm also changes its value when it is decrypted and encrypted again. This happens because the random seed valueInitialization vector
In cryptography, an initialization vector is a fixed-size input to a cryptographic primitive that is typically required to be random or pseudorandom...
that is used to initialize the encryption algorithm and is included as part of the encrypted value is different for each encryption operation. Because of this, it is impossible to use data that has been encrypted with the CBC mode as a unique key to identify a row in a database.
Generating pseudorandom unique tokens
Another practical application is the generation of pseudorandom unique tokens in a fixed format. For example, suppose product serial numbers need to be "randomly" generated, but must be unique and exactly 10 digits long. For this purpose, start with a sequence (0, 1, 2, ...) and apply to each sequence value a FPE cipher on the domain {0000000000,...,9999999999}. The encrypted value becomes the product serial number.Comparison to truly random permutations
Although a truly random permutation is the ideal FPE cipher, for large domains it is infeasible to pre-generate and remember a truly random permutation. So the problem of FPE is to generate a pseudorandom permutation from a secret key, in such a way that the computation time for a single value is small (ideally constant, but most importantly not O(N)).
Comparison to block ciphers
An n-bit block cipher (such as AES) technically is a FPE on the set }. If you need an FPE on one of these standard sized sets (where n=128, 192, 256) you may as well use a block cipher of the right size.
However, in typical usage, a block cipher is used in a mode of operation
Block cipher modes of operation
In cryptography, modes of operation is the procedure of enabling the repeated and secure use of a block cipher under a single key.A block cipher by itself allows encryption only of a single data block of the cipher's block length. When targeting a variable-length message, the data must first be...
that allows it to encrypt arbitrarily long messages, and with an initialization vector
Initialization vector
In cryptography, an initialization vector is a fixed-size input to a cryptographic primitive that is typically required to be random or pseudorandom...
as discussed above. In this mode, a block cipher is not a FPE.
Definition of security for FPE's
In cryptographic literature (see most of the references below), the measure of a "good" FPE is whether an attacker can distinguish the FPE from a truly random permutation. Various types of attackers are postulated, depending on whether they have access to oracles or known ciphertext/plaintext pairs.
FPE algorithms
In most of the approaches listed here, a well-understood block cipher
Block cipher
In cryptography, a block cipher is a symmetric key cipher operating on fixed-length groups of bits, called blocks, with an unvarying transformation. A block cipher encryption algorithm might take a 128-bit block of plaintext as input, and output a corresponding 128-bit block of ciphertext...
(such as AES
Advanced Encryption Standard
Advanced Encryption Standard is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. It supersedes DES...
) is used as a primitive to take the place of an ideal random function. This has the advantage that incorporation of a secret key into the algorithm is easy. Where AES is mentioned in the following discussion, any other good block cipher would work as well.
The FPE constructions of Black and Rogaway
Implementing FPE with security provably related to that of the underlying block cipher was first undertaken in a paper by cryptographers John Black and Phillip RogawayPhillip Rogaway
Phillip Rogaway is a professor of computer science at the University of California, Davis. He graduated with an BA in computer science from UC Berkeley and completed his PhD in cryptography at MIT, in the Theory of Computation group. He has taught at UC Davis since 1994.Dr...
, which described three ways to do this. They proved that each of these techniques is as secure as the block cipher that is used to construct it. This means that if the AES algorithm is used to create an FPE algorithm, then the resulting FPE algorithm is as secure as AES because an adversary capable of defeating the FPE algorithm can also defeat the AES algorithm. So if we assume that AES is secure, then the FPE algorithms constructed from it are also secure. In all of the following, we use E to denote the AES encryption operation that is used to construct an FPE algorithm and F to denote the FPE encryption operation.
FPE from a prefix cipher
One simple way to create an FPE algorithm on {0,...,N-1} is to assign a pseudorandom weight to each integer, then sort by weight. The weights are defined by applying an existing block cipher to each integer. Black and Rogaway call this technique a "prefix cipher" and showed it was provably as good as the block cipher used.Thus, to create a FPE on the domain {0,1,2,3}, given a key K apply AES(K) to each integer, giving, for example,
weight(0) = 0x56c644080098fc5570f2b329323dbf62
weight(1) = 0x08ee98c0d05e3dad3eb3d6236f23e7b7
weight(2) = 0x47d2e1bf72264fa01fb274465e56ba20
weight(3) = 0x077de40941c93774857961a8a772650d
Sorting [0,1,2,3] by weight gives [3,1,2,0], so your cipher is
F(0) = 3
F(1) = 1
F(2) = 2
F(3) = 0
.This method is only useful for small values of N. For larger values, the size of the lookup table and the required number of encryptions to initialize the table gets too big to be practical.
FPE from cycle walking
If we have a set M of allowed values within the domain of a pseudorandom permutation P (for example P can be a block cipher like AES), we can create an FPE algorithm from the block cipher by repeatedly applying the block cipher until the result is one of the allowed values (within M).CycleWalkingFPE(x)
{
if P(x) is an element of M
return P(x)
else
return CycleWalkingFPE(P(x))
}
The recursion is guaranteed to terminate. (Because P is one-to-one and the domain is finite, repeated application of P forms a cycle, so starting with a point in M the cycle will eventually terminate in M.)
This has the advantage that you don't have to map the elements of M to a consecutive sequence {0,...,N-1} of integers. It has the disadvantage, when M is much smaller than P's domain, that too many iterations might be required for each operation. If P is a block cipher of a fixed size, such as AES, this is a severe restriction on the sizes of M for which this method is efficient.
For example, suppose that we want to encrypt 100-bit values with AES in a way that creates another 100-bit value. With this technique, apply AES-128-ECB encryption until it reaches a value which has all of its 28 highest bits set to 0, which will take an average of 214 iterations to happen.
FPE from a Feistel network
It is also possible to make a FPE algorithm using a Feistel networkFeistel cipher
In cryptography, a Feistel cipher is a symmetric structure used in the construction of block ciphers, named after the German-born physicist and cryptographer Horst Feistel who did pioneering research while working for IBM ; it is also commonly known as a Feistel network. A large proportion of block...
. A Feistel network needs a source of pseudo-random values for the sub-keys for each round, and the output of the AES algorithm can be used as these pseudo-random values. When this is done, the resulting Feistel construction is good if enough rounds are used.
One way to implement an FPE algorithm using AES and a Feistel network is to use as many bits of AES output as are needed to equal the length of the left or right halves of the Feistel network. If a 24-bit value is needed as a sub-key, for example, it is possible to use the lowest 24 bits of the output of AES for this value.
This may not result in the output of the Feistel network preserving the format of the input, but it is possible to iterate the Feistel network in the same way that the cycle-walking technique does to ensure that format can be preserved. Because it is possible to adjust the size of the inputs to a Feistel network, it is possible to make it very likely that this iteration ends very quickly on average. In the case of credit card numbers, for example, there are 1016 possible 16-digit credit card numbers, and because the 1016 = 253.1, using a 54-bit wide Feistel network along with cycle walking will create an FPE algorithm that encrypts fairly quickly on average.
The Thorp shuffle
A Thorp shuffle is like an idealized card-shuffle, or equivalently a maximally-unbalanced Feistel cipher where one side is a single bit. It is easier to prove security for unbalanced Feistel ciphers than for balanced ones.VIL mode
For domain sizes that are a power of two, and an existing block cipher with a smaller block size, a new cipher may be created using VIL mode as described by Bellare, Rogaway.Hasty Pudding Cipher
The Hasty Pudding CipherHasty Pudding Cipher
The Hasty Pudding Cipher is a variable-block-size block cipher designed by Richard Schroeppel, which was an unsuccessful candidate in the competition for selecting the U.S. Advanced Encryption Standard...
uses custom constructions (not depending on existing block ciphers as primitives) to encrypt arbitrary finite small domains.
The FFSEM/FFX mode of AES
The FFSEM mode of AES (specification) that has been accepted for consideration by NIST uses the Feistel network construction of Black and Rogaway described above, with AES for the round function, with one slight modification: a single key is used and is tweaked slightly for each round.As of February 2010, FFSEM has been superseded by the FFX mode written by Mihir Bellare
Mihir Bellare
Mihir Bellare is a cryptographer and professor at the University of California, San Diego. He has published several seminal papers in the field of cryptography , many coauthored with Phillip Rogaway. Bellare has published a number of papers in the field of Format-Preserving Encryption...
, Phillip Rogaway, and Terence Spies. (specification).
Other FPE constructions
Several FPE constructs are based on adding the output of a standard cipher, modulo n, to the data to be encrypted, with various methods of unbiasing the result. The modulo-n addition shared by many of the constructs is the immediately obvious solution to the FPE problem (thus its use in a number of cases), with the main differences being the unbiasing mechanisms used.Section 8 of the FIPS
Federal Information Processing Standard
A Federal Information Processing Standard is a publicly announced standardization developed by the United States federal government for use in computer systems by all non-military government agencies and by government contractors, when properly invoked and tailored on a contract...
74, Federal Information Processing Standards Publication 1981 Guidelines for Implementing and Using the NBS Data Encryption Standard, describes a way to use the DES encryption algorithm in a manner that preserves the format of the data via modulo-n addition followed by an unbiasing operation. This standard was withdrawn on May 19, 2005, so the technique should be considered obsolete in terms of being a formal standard.
Another early mechanism for format-preserving encryption was Peter Gutmann
Peter Gutmann (computer scientist)
Peter Gutmann is a computer scientist in the Department of Computer Science at the University of Auckland, Auckland, New Zealand. He has a Ph.D. in computer science from the University of Auckland. His Ph.D. thesis and a book based on the thesis were about a cryptographic security architecture...
s "Encrypting data with a restricted range of values" which again performs modulo-n addition on any cipher with some adjustments to make the result uniform, with the resulting encryption being as strong as the underlying encryption algorithm that it's based on.
The paper "Using Datatype-Preserving Encryption to Enhance Data Warehouse Security" by Michael Brightwell and Harry Smith describes a way to use the DES
Data Encryption Standard
The Data Encryption Standard is a block cipher that uses shared secret encryption. It was selected by the National Bureau of Standards as an official Federal Information Processing Standard for the United States in 1976 and which has subsequently enjoyed widespread use internationally. It is...
encryption algorithm in a way that preserves the format of the plaintext. This technique doesn't appear to apply an unbiasing step as do the other modulo-n techniques referenced here.
The paper "Format-Preserving Encryption" by Mihir Bellare
Mihir Bellare
Mihir Bellare is a cryptographer and professor at the University of California, San Diego. He has published several seminal papers in the field of cryptography , many coauthored with Phillip Rogaway. Bellare has published a number of papers in the field of Format-Preserving Encryption...
and Thomas Ristenpart describes using "nearly balanced" Feistel networks to create secure FPE algorithms.
The paper "Format Controlling Encryption Using Datatype Preserving Encryption" by Ulf Mattsson describes other ways to create FPE algorithms.
Acceptance of FPE algorithms by standards authorities
An approach based on one of these techniques has been accepted by NIST for consideration as an approved mode of the AES algorithm under the name "The FFX Mode of Operation for Format-Preserving Encryption" (defined in ). FFX is a proposed mode of AES. FFX mode is also in consideration by ANSI X9 (X9.119). VISA
Visa
Visa or VISA may refer to:* Visa , a document issued by a country's government allowing the holder to enter or to leave that country...
has also issued guidance on the use of Format Preserving Encryption in .