Steganalysis
Encyclopedia
Steganalysis is the art and science of detecting messages hidden using steganography
; this is analogous to cryptanalysis
applied to cryptography
.
Unlike cryptanalysis, where it is obvious that intercepted data contains a message (though that message is encrypted), steganalysis generally starts with a pile of suspect data files, but little information about which of the files, if any, contain a payload. The steganalyst is usually something of a forensic statistician, and must start by reducing this set of data files (which is often quite large; in many cases, it may be the entire set of files on a computer) to the subset most likely to have been altered.
and MP3
, they also attempt to look for inconsistencies in the way this data has been compressed. For example, a common artifact in JPEG compression is "edge ringing", where high-frequency components (such as the high-contrast edges of black text on a white background) distort neighboring pixels. This distortion is predictable, and simple steganographic encoding algorithms will produce artifacts that are detectably unlikely.
One case where detection of suspect files is straightforward is when the original, unmodified carrier is available for comparison. Comparing the package against the original file will yield the differences caused by encoding the payload—and, thus, the payload can be extracted.
as closely as possible, rather than analyzing, modeling, and then consistently emulating the actual noise characteristics of the carrier. In particular, many simple steganographic systems simply modify the least-significant bit
(LSB) of a sample; this causes the modified samples to have not only different noise profiles than unmodified samples, but also for their LSBs to have different noise profiles than could be expected from analysis of their higher-order bits, which will still show some amount of noise. Such LSB-only modification can be detected with appropriate algorithms, in some cases detecting encoding densities as low as 1% with reasonable reliability.
s have the desirable property of making the payload appear indistinguishable from uniformly-distributed noise, which can make detection efforts more difficult, and save the steganographic encoding technique the trouble of having to distribute the signal energy evenly (but see above concerning errors emulating the native noise of the carrier).
Steganography
Steganography is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity...
; this is analogous to cryptanalysis
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...
applied to cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...
.
Overview
The goal of steganalysis is to identify suspected packages, determine whether or not they have a payload encoded into them, and, if possible, recover that payload.Unlike cryptanalysis, where it is obvious that intercepted data contains a message (though that message is encrypted), steganalysis generally starts with a pile of suspect data files, but little information about which of the files, if any, contain a payload. The steganalyst is usually something of a forensic statistician, and must start by reducing this set of data files (which is often quite large; in many cases, it may be the entire set of files on a computer) to the subset most likely to have been altered.
Basic techniques
The problem is generally handled with statistical analysis. A set of unmodified files of the same type, and ideally from the same source (for example, the same model of digital camera, or if possible, the same digital camera; digital audio from a CD MP3 files have been "ripped" from; etc.) as the set being inspected, are analyzed for various statistics. Some of these are as simple as spectrum analysis, but since most image and audio files these days are compressed with lossy compression algorithms, such as JPEGJPEG
In computing, JPEG . The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality....
and MP3
MP3
MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression...
, they also attempt to look for inconsistencies in the way this data has been compressed. For example, a common artifact in JPEG compression is "edge ringing", where high-frequency components (such as the high-contrast edges of black text on a white background) distort neighboring pixels. This distortion is predictable, and simple steganographic encoding algorithms will produce artifacts that are detectably unlikely.
One case where detection of suspect files is straightforward is when the original, unmodified carrier is available for comparison. Comparing the package against the original file will yield the differences caused by encoding the payload—and, thus, the payload can be extracted.
Noise floor consistency analysis
In some cases, such as when only a single image is available, more complicated analysis techniques may be required. In general, steganography attempts to make distortion to the carrier indistinguishable from the carrier's noise floor. In practice, however, this is often improperly simplified to deciding to make the modifications to the carrier resemble white noiseWhite noise
White noise is a random signal with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency...
as closely as possible, rather than analyzing, modeling, and then consistently emulating the actual noise characteristics of the carrier. In particular, many simple steganographic systems simply modify the least-significant bit
Least significant bit
In computing, the least significant bit is the bit position in a binary integer giving the units value, that is, determining whether the number is even or odd. The lsb is sometimes referred to as the right-most bit, due to the convention in positional notation of writing less significant digits...
(LSB) of a sample; this causes the modified samples to have not only different noise profiles than unmodified samples, but also for their LSBs to have different noise profiles than could be expected from analysis of their higher-order bits, which will still show some amount of noise. Such LSB-only modification can be detected with appropriate algorithms, in some cases detecting encoding densities as low as 1% with reasonable reliability.
Encrypted payloads
Detecting a probable steganographic payload is often only part of the problem, as the payload may have been encrypted first. Encrypting the payload is not always done solely to make recovery of the payload more difficult. Most strong cipherCipher
In cryptography, a cipher is an algorithm for performing encryption or decryption — a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. In non-technical usage, a “cipher” is the same thing as a “code”; however, the concepts...
s have the desirable property of making the payload appear indistinguishable from uniformly-distributed noise, which can make detection efforts more difficult, and save the steganographic encoding technique the trouble of having to distribute the signal energy evenly (but see above concerning errors emulating the native noise of the carrier).
Barrage noise
If inspection of a storage device is considered very likely, the steganographer may attempt to barrage a potential analyst with, effectively, misinformation. This may be a large set of files encoded with anything from random data, to white noise, to meaningless drivel, to deliberately misleading information. The encoding density on these files may be slightly higher than the "real" ones; likewise, the possible use of multiple algorithms of varying detectability should be considered. The steganalyst may be forced into checking these decoys first, potentially wasting significant time and computing resources. The downside to this technique is it makes it much more obvious that steganographic software was available, and was used.Conclusions and further action
Obtaining a warrant or taking other action based solely on steganalytic evidence is a very dicey proposition unless a payload has been completely recovered and decrypted, because otherwise all the analyst has is a statistic indicating that a file may have been modified, and that modification may have been the result of steganographic encoding. Because this is likely to frequently be the case, steganalytic suspicions will often have to be backed up with other investigative techniques.See also
- Covert channelCovert channelIn computer security, a covert channel is a type of computer security attack that creates a capability to transfer information objects between processes that are not supposed to be allowed to communicate by the computer security policy...
- Audio watermark detectionAudio watermark detectionWatermarking is the process of embedding information into a signal in a way that is difficult to remove. If the signal is copied, then the information is also carried in the copy. A signal may carry several different watermarks at the same time...
- CryptographyCryptographyCryptography is the practice and study of techniques for secure communication in the presence of third parties...
- SteganographySteganographySteganography is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity...
- Data compressionData compressionIn computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
- Computer forensicsComputer forensicsComputer forensics is a branch of digital forensic science pertaining to legal evidence found in computers and digital storage media...
- Steganography toolsSteganography toolsA steganography software tool implements a subset of the most general digital steganography process, allowing users to insert and extract hidden data into and from carrier files.-Architecture:...
- BPCS-SteganographyBPCS-SteganographyBPCS-Steganography is a type of digital steganography. Digital steganography can hide confidential data very securely by embedding them into some media data called "vessel data." The vessel data is also referred to as "carrier, cover, or dummy data"...
- Steganographic file systemSteganographic file systemSteganographic file systems are a kind of file system first proposed by Ross Anderson, Roger Needham, and Adi Shamir. Their paper proposed two main methods of hiding data: in a series of fixed size files originally consisting of random bits on top of which 'vectors' could be superimposed in such a...
- Steganography detection
External links
- Steganalysis research and papers by Neil F. Johnson addressing attacks against Steganography and Watermarking, and Countermeasures to these attacks.
- Research Group. Ongoing research in Steganalysis.
- Digital Invisible Ink Toolkit An open-source image steganography suite that includes both steganography and steganalysis implementations.
- Steganography - Implementation and detection Short introduction on steganography, discussing several information sources in which information can be stored
- StegSecret. StegSecret is a java-based multiplatform steganalysis tool. This tool allows the detection of hidden information by using the most known steganographic methods. It detects EOF, LSB, DCTs and other techniques. (steganography - stegoanalysis).
- Virtual Steganographic Laboratory (VSL). Contains LSB RS-Analysis and blind (universal) BSM-SVMSupport vector machineA support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis...
steganalysis. Application is free, platform-independent graphical block diagramming tool that allows complex using, testing and adjusting of methods both for image steganography and steganalysis. Provides modular, plug-in architecture along with simple GUI.