Portable Executable
Encyclopedia
The Portable Executable (PE) format is a file format
File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...

 for executable
Executable
In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...

s, object code
Object file
An object file is a file containing relocatable format machine code that is usually not directly executable. Object files are produced by an assembler, compiler, or other language translator, and used as input to the linker....

 and DLL
Dynamic-link library
Dynamic-link library , or DLL, is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems...

s, used in 32-bit and 64-bit versions of Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

 operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s. The term "portable" refers to the format's versatility in numerous environments of operating system software architecture. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code. This includes dynamic library references for linking, API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

 export and import tables, resource management data and thread-local storage
Thread-local storage
Thread-local storage is a computer programming method that uses static or global memory local to a thread.This is sometimes needed because normally all threads in a process share the same address space, which is sometimes undesirable...

 (TLS) data. On NT
Windows NT
Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was intended to complement...

 operating systems, the PE format is used for EXE
EXE
EXE is the common filename extension denoting an executable file in the DOS, OpenVMS, Microsoft Windows, Symbian, and OS/2 operating systems....

, DLL
Dynamic-link library
Dynamic-link library , or DLL, is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems...

, SYS (device driver), and other file types. The Extensible Firmware Interface (EFI)
Extensible Firmware Interface
The Unified Extensible Firmware Interface is a specification that defines a software interface between an operating system and platform firmware...

 specification states that PE is the standard executable format in EFI environments.

PE is a modified version of the Unix COFF
COFF
The Common Object File Format is a specification of a format for executable, object code, and shared library computer files used on Unix systems...

 file format. PE/COFF is an alternative term in Windows development.

On Windows NT operating systems, PE currently supports the IA-32
IA-32
IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...

, IA-64, and x86-64
X86-64
x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...

 (AMD64/Intel64) instruction set architectures (ISAs). Prior to Windows 2000
Windows 2000
Windows 2000 is a line of operating systems produced by Microsoft for use on personal computers, business desktops, laptops, and servers. Windows 2000 was released to manufacturing on 15 December 1999 and launched to retail on 17 February 2000. It is the successor to Windows NT 4.0, and is the...

, Windows NT (and thus PE) supported the MIPS
MIPS architecture
MIPS is a reduced instruction set computer instruction set architecture developed by MIPS Technologies . The early MIPS architectures were 32-bit, and later versions were 64-bit...

, Alpha
DEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...

, and PowerPC
PowerPC
PowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...

 ISAs. Because PE is used on Windows CE
Windows CE
Microsoft Windows CE is an operating system developed by Microsoft for embedded systems. Windows CE is a distinct operating system and kernel, rather than a trimmed-down version of desktop Windows...

, it continues to support several variants of the MIPS, ARM
ARM architecture
ARM is a 32-bit reduced instruction set computer instruction set architecture developed by ARM Holdings. It was named the Advanced RISC Machine, and before that, the Acorn RISC Machine. The ARM architecture is the most widely used 32-bit ISA in numbers produced...

 (including Thumb), and SuperH
SuperH
SuperH is a 32-bit reduced instruction set computer instruction set architecture developed by Hitachi. It is implemented by microcontrollers and microprocessors for embedded systems....

 ISAs.

The main competitors to PE are ELF
Executable and Linkable Format
In computing, the Executable and Linkable Format is a common standard file format for executables, object code, shared libraries, and core dumps. First published in the System V Application Binary Interface specification, and later in the Tool Interface Standard, it was quickly accepted among...

 (used in Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 and most other versions of Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

) and Mach-O
Mach-O
Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically-loaded code, and core dumps. A replacement for the a.out format, Mach-O offered more extensibility and faster access to information in the symbol table.Mach-O was once used by...

 (used in Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

).

Brief history

Microsoft migrated to the PE format with the introduction of the Windows NT 3.1 operating system. All later versions of Windows, including Windows 95/98/ME, support the file structure. The format has retained limited legacy support to bridge the gap between DOS-based and NT systems. For example, PE/COFF headers still include an MS-DOS executable program, which is by default a stub
Method stub
A method stub or simply stub in software development is a piece of code used to stand in for some other programming functionality. A stub may simulate the behavior of existing code or be a temporary substitute for yet-to-be-developed code...

 that displays the simple message "This program cannot be run in DOS mode" (or similar). PE also continues to serve the changing Windows platform. Some extensions include the .NET PE format (see below), a 64-bit version called PE32+ (sometimes PE+), and a specification for Windows CE.

Layout

A PE file consists of a number of headers and sections that tell the dynamic linker
Dynamic linker
In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries for an executable when it is executed. The specific operating system and executable format determine how the dynamic linker functions and how it is implemented...

 how to map the file into memory. An executable image consists of several different regions, each of which require different memory protection; so the start of each section must be aligned to a page boundary. For instance, typically the .text section (which holds program code) is mapped as execute/readonly, and the .data section (holding global variables) is mapped as no-execute/readwrite. However, to avoid wasting space, the different sections are not page aligned on disk. Part of the job of the dynamic linker is to map each section to memory individually and assign the correct permissions to the resulting regions, according to the instructions found in the headers.

Import Table

One section of note is the import address table (IAT), which is used as a lookup table when the application is calling a function in a different module. It can be in form of both import by ordinal and import by name. Because a compiled program cannot know the memory location of the libraries it depends upon, an indirect jump is required whenever an API call is made. As the dynamic linker loads modules and joins them together, it writes actual addresses into the IAT slots, so that they point to the memory locations of the corresponding library functions. Though this adds an extra jump over the cost of an intra-module call resulting in a performance penalty, it provides a key benefit: The number of memory pages that need to be copy-on-write changed by the loader is minimized, saving memory and disk I/O time. If the compiler knows ahead of time that a call will be inter-module (via a dllimport attribute) it can produce more optimized code that simply results in an indirect call opcode
Opcode
In computer science engineering, an opcode is the portion of a machine language instruction that specifies the operation to be performed. Their specification and format are laid out in the instruction set architecture of the processor in question...

.

Relocations

PE files do not contain position-independent code
Position-independent code
In computing, position-independent code or position-independent executable is machine instruction code that executes properly regardless of where in memory it resides...

. Instead they are compiled to a preferred base address, and all addresses emitted by the compiler/linker are fixed ahead of time. If a PE file cannot be loaded at its preferred address (because it's already taken by something else), the operating system will rebase it. This involves recalculating every absolute address and modifying the code to use the new values. The loader does this by comparing the preferred and actual load addresses, and calculating a delta
Delta encoding
Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than complete files; more generally this is known as data differencing...

 value. This is then added to the preferred address to come up with the new address of the memory location. Base relocations are stored in a list and added, as needed, to an existing memory location. The resulting code is now private to the process and no longer shareable, so many of the memory saving benefits of DLLs are lost in this scenario. It also slows down loading of the module significantly. For this reason rebasing is to be avoided wherever possible, and the DLLs shipped by Microsoft have base addresses pre-computed so as not to overlap. In the no rebase case PE therefore has the advantage of very efficient code, but in the presence of rebasing the memory usage hit can be expensive. This contrasts with ELF
Executable and Linkable Format
In computing, the Executable and Linkable Format is a common standard file format for executables, object code, shared libraries, and core dumps. First published in the System V Application Binary Interface specification, and later in the Tool Interface Standard, it was quickly accepted among...

 which uses fully position independent code and a global offset table, which trades off execution time against memory usage in favor of the latter.

.NET, metadata, and the PE format

Microsoft's .NET Framework
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

 has extended the PE format with features which support the Common Language Runtime
Common Language Runtime
The Common Language Runtime is the virtual machine component of Microsoft's .NET framework and is responsible for managing the execution of .NET programs. In a process known as just-in-time compilation, the CLR compiles the intermediate language code known as CIL into the machine instructions...

 (CLR). Among the additions are a CLR Header and CLR Data section. Upon loading a binary, the OS loader yields execution to the CLR via a reference in the PE/COFF IMPORT table. The CLR then loads the CLR Header and Data sections.

The CLR Data section contains two important segments: Metadata and Intermediate Language (IL) code:
  • Metadata contains information relevant to the assembly, including the assembly manifest. A manifest describes the assembly in detail including unique identification (via a hash, version number, etc.), data on exported components, extensive type information (supported by the Common Type System (CTS)), external references, and a list of files within the assembly. The CLR environment makes extensive use of metadata.
  • Intermediate Language (IL) code is abstracted, language independent code that satisfies the .NET CLR's Common Intermediate Language
    Common Intermediate Language
    Common Intermediate Language is the lowest-level human-readable programming language defined by the Common Language Infrastructure specification and is used by the .NET Framework and Mono...

     (CIL) requirement. The term "Intermediate" refers to the nature of IL code being cross-language and cross-platform compatible. This intermediate language, similar to Java bytecode
    Java bytecode
    Java bytecode is the form of instructions that the Java virtual machine executes. Each bytecode opcode is one byte in length, although some require parameters, resulting in some multi-byte instructions. Not all of the possible 256 opcodes are used. 51 are reserved for future use...

    , allows platforms and languages to support the common .NET CLR. IL supports object-oriented programming
    Object-oriented programming
    Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...

     (polymorphism, inheritance, abstract types, etc.), exceptions, events, and various data structures.

Use on other operating systems

The PE format is also used by ReactOS
ReactOS
ReactOS is an open source computer operating system intended to be binary compatible with application software and device drivers made for Microsoft Windows NT versions 5.x and up...

, as ReactOS is intended to be binary-compatible with Windows. It has also historically been used by a number of other operating systems, including SkyOS
SkyOS
SkyOS was a prototype commercial, proprietary, graphical desktop operating system written for the x86 computer architecture. As of January 30, 2009 development has halted and no plans to resume its development have been announced.- History :...

 and BeOS
BeOS
BeOS is an operating system for personal computers which began development by Be Inc. in 1991. It was first written to run on BeBox hardware. BeOS was optimized for digital media work and was written to take advantage of modern hardware facilities such as symmetric multiprocessing by utilizing...

 R3. However, both SkyOS and BeOS eventually moved to ELF
Executable and Linkable Format
In computing, the Executable and Linkable Format is a common standard file format for executables, object code, shared libraries, and core dumps. First published in the System V Application Binary Interface specification, and later in the Tool Interface Standard, it was quickly accepted among...

.

As the Mono development platform
Mono (software)
Mono, pronounced , is a free and open source project led by Xamarin to create an Ecma standard compliant .NET-compatible set of tools including, among others, a C# compiler and a Common Language Runtime....

 intends to be binary compatible with Microsoft .NET, it uses the same PE format as the Microsoft implementation.

On x86, Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....

 operating systems, some Windows binaries (in PE format) can be executed with Wine
Wine (software)
Wine is a free software application that aims to allow computer programs written for Microsoft Windows to run on Unix-like operating systems. Wine also provides a software library, known as Winelib, against which developers can compile Windows applications to help port them to Unix-like...

. The HX DOS Extender
HX DOS Extender
The HX DOS Extender is a free DOS extender with built-in Win32 PE file format support. Usually the purpose of a DOS extender is to make protected mode features, especially large memory and 32-bit addressing, available for DOS applications. HX fully supports this goal, but goes some steps further...

 also uses the PE format for native DOS 32-bit binaries, plus it can to some degree execute existing Windows binaries in DOS, thus acting like a Wine
Wine (software)
Wine is a free software application that aims to allow computer programs written for Microsoft Windows to run on Unix-like operating systems. Wine also provides a software library, known as Winelib, against which developers can compile Windows applications to help port them to Unix-like...

 for DOS.

Mac OS X 10.5 has the ability to load and parse PE files, but is not binary compatible with Windows.

See also

  • EXE
    EXE
    EXE is the common filename extension denoting an executable file in the DOS, OpenVMS, Microsoft Windows, Symbian, and OS/2 operating systems....

  • Executable and Linkable Format
    Executable and Linkable Format
    In computing, the Executable and Linkable Format is a common standard file format for executables, object code, shared libraries, and core dumps. First published in the System V Application Binary Interface specification, and later in the Tool Interface Standard, it was quickly accepted among...

  • a.out
  • Comparison of executable file formats
    Comparison of executable file formats
    This is a comparison of executable file formats.Among the above formats, the ones in most common use are PE , ELF and Mach-O .- References :...

  • Executable compression
    Executable compression
    Executable compression is any means of compressing an executable file and combining the compressed data with decompression code into a single executable. When this compressed executable is executed, the decompression code recreates the original code from the compressed code before executing it...

  • Application virtualization
    Application Virtualization
    Application virtualization is an umbrella term that describes software technologies that improve portability, manageability and compatibility of applications by encapsulating them from the underlying operating system on which they are executed. A fully virtualized application is not installed in...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK