Executable compression
Encyclopedia
Executable compression is any means of compressing
an executable
file and combining the compressed data with decompression code into a single executable. When this compressed executable is executed, the decompression code recreates the original code from the compressed code before executing it. In most cases this happens transparently so the compressed executable can be used in exactly the same way as the original.
A compressed executable can be considered a self-extracting archive
, where compressed data is packaged along with the relevant decompression code in an executable file. Some compressed executable can be decompressed to reconstruct the original executable without directly executing it. Two programs that can be used to do this are CUP386 and UNP.
Most compressed executables decompress the original code in memory and most require slightly more memory to run (because they need to store the decompressor code, the compressed data and the decompressed code). However, some compressed executables have additional requirements, such as those that write the decompressed executable to the file system before executing it.
Executable compression is not limited to binary executables, but can also be applied to scripts, such as JavaScript
. Because most scripting languages are designed to work on human-readable code
, which has a high redundancy
, compression can be very effective and as simple as replacing long names used to identify variables
and functions
with shorter versions and/or removing white-space.
than standard data compression
facilities such as gzip
, zip
or bzip2
. This allows software distributors to stay within the constraints of their chosen distribution media (such as CD-ROM
, DVD-ROM, or Floppy disk
), or to reduce the time and bandwidth customers require to access software distributed via the Internet
.
Executable compression is also frequently used to deter reverse engineering
or to obfuscate the contents of the executable (for example, to hide the presence of malware
from antivirus scanners) by proprietary methods of compression and/or added encryption
. Executable compression can be used to prevent direct disassembly, mask string literals
and modify signatures. Although this does not eliminate the chance of reverse engineering, it can make the process more costly.
A compressed executable requires less storage space in the file system, thus less time to transfer data from the file system into memory. On the other hand, it requires some time to decompress the data before execution begins. However, the speed of various storage media has not kept up with average processor speeds, so the storage is very often the bottleneck. Thus the compressed executable will load faster on most common systems. On modern desktop computers, this is rarely noticeable unless the executable is unusually big, so loading speed is not a primary reason for or against compressing an executable.
On operating systems which read executable images on demand from the disk (see virtual memory
), compressed executables make this process less efficient. The decompressor stub allocates a block of memory to hold the decompressed data, which stays allocated as long as the executable stays loaded, whether it is used or not, competing for memory resources with other applications all along. If the operating system uses a swap file, the decompressed data has to be written to it to free up the memory instead of simply discarding unused data blocks and reloading them from the executable image if needed again. This is usually not noticeable, but it becomes a problem when an executable is loaded more than once at the same time—the operating system cannot reuse data blocks it has already loaded, the data has to be decompressed into a new memory block, and will be swapped out independently if not used. The additional storage and time requirements mean that it has to be weighed carefully whether to compress executables which are typically run more than once at the same time.
Another disadvantage is that some utilities can no longer identify run-time library dependencies, as only the statically linked extractor stub is visible.
Also, some older virus scanners simply report all compressed executables as viruses
because the decompressor stubs share some characteristics with those. Most modern virus scanners can unpack several different executable compression layers to check the actual executable inside, but some popular anti-virus and anti-malware scanners have had troubles with false alarms on compressed executables.
Executable compression used to be more popular when computers were limited to the storage capacity of floppy disk
s and small hard drives; it allowed the computer to store more software in the same amount of space, without the inconvenience of having to manually unpack an archive file every time the user wanted to use the software. However, executable compression has become less popular because of increased storage capacity on computers.
Portable Executable
Note: Clients in blue are no longer in development.
DOS
JavaScript
There are two types of compression that can be applied to scripts:
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
an executable
Executable
In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...
file and combining the compressed data with decompression code into a single executable. When this compressed executable is executed, the decompression code recreates the original code from the compressed code before executing it. In most cases this happens transparently so the compressed executable can be used in exactly the same way as the original.
A compressed executable can be considered a self-extracting archive
Self-extracting archive
A self-extracting archive is a computer application which contains a file archive, as well as programming to extract this information. Such file archives do not require a second executable file or program to extract from the archive, as archive files usually require...
, where compressed data is packaged along with the relevant decompression code in an executable file. Some compressed executable can be decompressed to reconstruct the original executable without directly executing it. Two programs that can be used to do this are CUP386 and UNP.
Most compressed executables decompress the original code in memory and most require slightly more memory to run (because they need to store the decompressor code, the compressed data and the decompressed code). However, some compressed executables have additional requirements, such as those that write the decompressed executable to the file system before executing it.
Executable compression is not limited to binary executables, but can also be applied to scripts, such as JavaScript
JavaScript
JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....
. Because most scripting languages are designed to work on human-readable code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
, which has a high redundancy
Redundancy (information theory)
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data...
, compression can be very effective and as simple as replacing long names used to identify variables
Variable (programming)
In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...
and functions
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
with shorter versions and/or removing white-space.
Advantages and disadvantages
Software distributors use executable compression for a variety of reasons, primarily to reduce the secondary storage requirements of their software; as executable compressors are specifically designed to compress executable code, they often achieve better compression ratioCompression ratio
The 'compression ratio' of an internal-combustion engine or external combustion engine is a value that represents the ratio of the volume of its combustion chamber from its largest capacity to its smallest capacity...
than standard data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
facilities such as gzip
Gzip
Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...
, zip
ZIP (file format)
Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is...
or bzip2
Bzip2
bzip2 is a free and open source implementation of the Burrows–Wheeler algorithm. It is developed and maintained by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996.-Compression efficiency:...
. This allows software distributors to stay within the constraints of their chosen distribution media (such as CD-ROM
CD-ROM
A CD-ROM is a pre-pressed compact disc that contains data accessible to, but not writable by, a computer for data storage and music playback. The 1985 “Yellow Book” standard developed by Sony and Philips adapted the format to hold any form of binary data....
, DVD-ROM, or Floppy disk
Floppy disk
A floppy disk is a disk storage medium composed of a disk of thin and flexible magnetic storage medium, sealed in a rectangular plastic carrier lined with fabric that removes dust particles...
), or to reduce the time and bandwidth customers require to access software distributed via the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
.
Executable compression is also frequently used to deter reverse engineering
Reverse engineering
Reverse engineering is the process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation...
or to obfuscate the contents of the executable (for example, to hide the presence of malware
Malware
Malware, short for malicious software, consists of programming that is designed to disrupt or deny operation, gather information that leads to loss of privacy or exploitation, or gain unauthorized access to system resources, or that otherwise exhibits abusive behavior...
from antivirus scanners) by proprietary methods of compression and/or added encryption
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...
. Executable compression can be used to prevent direct disassembly, mask string literals
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....
and modify signatures. Although this does not eliminate the chance of reverse engineering, it can make the process more costly.
A compressed executable requires less storage space in the file system, thus less time to transfer data from the file system into memory. On the other hand, it requires some time to decompress the data before execution begins. However, the speed of various storage media has not kept up with average processor speeds, so the storage is very often the bottleneck. Thus the compressed executable will load faster on most common systems. On modern desktop computers, this is rarely noticeable unless the executable is unusually big, so loading speed is not a primary reason for or against compressing an executable.
On operating systems which read executable images on demand from the disk (see virtual memory
Virtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...
), compressed executables make this process less efficient. The decompressor stub allocates a block of memory to hold the decompressed data, which stays allocated as long as the executable stays loaded, whether it is used or not, competing for memory resources with other applications all along. If the operating system uses a swap file, the decompressed data has to be written to it to free up the memory instead of simply discarding unused data blocks and reloading them from the executable image if needed again. This is usually not noticeable, but it becomes a problem when an executable is loaded more than once at the same time—the operating system cannot reuse data blocks it has already loaded, the data has to be decompressed into a new memory block, and will be swapped out independently if not used. The additional storage and time requirements mean that it has to be weighed carefully whether to compress executables which are typically run more than once at the same time.
Another disadvantage is that some utilities can no longer identify run-time library dependencies, as only the statically linked extractor stub is visible.
Also, some older virus scanners simply report all compressed executables as viruses
Computer virus
A computer virus is a computer program that can replicate itself and spread from one computer to another. The term "virus" is also commonly but erroneously used to refer to other types of malware, including but not limited to adware and spyware programs that do not have the reproductive ability...
because the decompressor stubs share some characteristics with those. Most modern virus scanners can unpack several different executable compression layers to check the actual executable inside, but some popular anti-virus and anti-malware scanners have had troubles with false alarms on compressed executables.
Executable compression used to be more popular when computers were limited to the storage capacity of floppy disk
Floppy disk
A floppy disk is a disk storage medium composed of a disk of thin and flexible magnetic storage medium, sealed in a rectangular plastic carrier lined with fabric that removes dust particles...
s and small hard drives; it allowed the computer to store more software in the same amount of space, without the inconvenience of having to manually unpack an archive file every time the user wanted to use the software. However, executable compression has become less popular because of increased storage capacity on computers.
Portable ExecutablePortable ExecutableThe Portable Executable format is a file format for executables, object code and DLLs, used in 32-bit and 64-bit versions of Windows operating systems. The term "portable" refers to the format's versatility in numerous environments of operating system software architecture...
Note: Clients in blue are no longer in development.
Name | Software license | Win64 |
---|---|---|
Armadillo Packer | ||
ASPack | ||
ASPR (ASProtect) | ||
BeRoEXEPacker | ||
BoxedApp Packer | ||
CExe | ||
Enigma Protector | ||
EXE Bundle | ||
EXE Stealth | ||
exe32pack | ||
EXECryptor | ||
eXPressor | ||
FSG (Fast Small Good) | ||
kkrunchy | ||
MEW | ||
MPRESS | ||
Npack | ||
NeoLite | ||
Obsidium | ||
PECompact | ||
PELock | ||
PEPack | ||
PESpin | ||
PEtite | ||
PKLite32 | ||
RLPack Basic | ||
Smart Packer Pro | ||
tElock | ||
Themida | ||
UniKey Enveloper | ||
UPX UPX UPX, the Ultimate Packer for eXecutables, is a free and open source executable packer supporting a number of file formats from different operating systems.- Compression :... |
||
VMProtect | ||
WWPack | ||
XComp/XPack |
DOSDOSDOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...
executable
- 32LiTE
- 624
- AINEXE
- aPACK
- DIET
- HASP Envelope
- LGLZ
- LZEXE – First widely publicly used executable compressor for microcomputers.
- PKLite
- PMWLITE
- UCEXE
- UPXUPXUPX, the Ultimate Packer for eXecutables, is a free and open source executable packer supporting a number of file formats from different operating systems.- Compression :...
- WDOSX
- WWpack
- XE
.NET assembly files
- .NETZ
- NsPack
- Mpress
- HASP Envelope
- .netshrink
- Exepack.NET
- DotProtect: Commercial protector/packer for .net and mono. Features on-line verifications and "industry standard encryption".
JavaScriptJavaScriptJavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....
scripts
There are two types of compression that can be applied to scripts:
- Reduce the redundancy in the script (by removing comments, white space and shorten variable and functions names). This does not alter the behavior of the script.
- Compress the original script and create a new script that contains decompression code and compressed data. This is similar to binary executable compression.
Self decompressing compressors
These compress the original script and output a new script that has a decompressor and compressed data.Redundancy reducing compressors
These remove white space, comments and shorten vairables names/ functions but do not alter the behavior of the script.See also
- Data compressionData compressionIn computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
- Disk compressionDisk compressionA disk compression software utility increases the amount of information that can be stored on a hard disk drive of given size. Unlike a file compression utility which compresses only specified files - and which requires the user designate the files to be compressed - a disk compression utility...
- ExecutableExecutableIn computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...
- Kolmogorov complexityKolmogorov complexityIn algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...
- UPXUPXUPX, the Ultimate Packer for eXecutables, is a free and open source executable packer supporting a number of file formats from different operating systems.- Compression :...
- Self-extracting archiveSelf-extracting archiveA self-extracting archive is a computer application which contains a file archive, as well as programming to extract this information. Such file archives do not require a second executable file or program to extract from the archive, as archive files usually require...