COM file
Encyclopedia
In many computer operating systems, a COM file is a type of executable file
Executable
In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...

; the name is derived from the file name extension .COM. Originally, the term stood for "Command file", a text file containing commands to be issued to the operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 (similar to a DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

 batch file
Batch file
In DOS, OS/2, and Microsoft Windows, batch file is the name given to a type of script file, a text file containing a series of commands to be executed by the command interpreter....

), on many of the Digital Equipment Corporation
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...

 mini
Minicomputer
A minicomputer is a class of multi-user computers that lies in the middle range of the computing spectrum, in between the largest multi-user systems and the smallest single-user systems...

 and mainframe
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...

 operating systems going back to the 1970s.

With the introduction of microcomputer
Microcomputer
A microcomputer is a computer with a microprocessor as its central processing unit. They are physically small compared to mainframe and minicomputers...

s, the type of files commonly associated with the extension .com changed; in 8-bit
8-bit
The first widely adopted 8-bit microprocessor was the Intel 8080, being used in many hobbyist computers of the late 1970s and early 1980s, often running the CP/M operating system. The Zilog Z80 and the Motorola 6800 were also used in similar computers...

 CP/M
CP/M
CP/M was a mass-market operating system created for Intel 8080/85 based microcomputers by Gary Kildall of Digital Research, Inc...

, and later in MS-DOS
MS-DOS
MS-DOS is an operating system for x86-based personal computers. It was the most commonly used member of the DOS family of operating systems, and was the main operating system for IBM PC compatible personal computers during the 1980s to the mid 1990s, until it was gradually superseded by operating...

 and compatible DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

es, they are binary
Binary file
A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text...

 executable files by convention. Executables in the COM file format
File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...

 do not necessarily need to have the file name extension .COM in any but CP/M and very early versions of MS-DOS.

The .COM file name extension has no relation to the .com
.com
The domain name com is a generic top-level domain in the Domain Name System of the Internet. Its name is derived from commercial, indicating its original intended purpose for domains registered by commercial organizations...

 (for "commercial") top-level Internet domain name. However, this similarity in name has been exploited by malicious computer virus
Computer virus
A computer virus is a computer program that can replicate itself and spread from one computer to another. The term "virus" is also commonly but erroneously used to refer to other types of malware, including but not limited to adware and spyware programs that do not have the reproductive ability...

 writers.

MS-DOS binary format

The COM format is the original binary executable format used in CP/M
CP/M
CP/M was a mass-market operating system created for Intel 8080/85 based microcomputers by Gary Kildall of Digital Research, Inc...

 and MS-DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

. It is very simple; it has no header (with the exception of CP/M 3 files), and contains no metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

, only code and data. This simplicity exacts a price: the binary
Object code
Object code, or sometimes object module, is what a computer compiler produces. In a general sense object code is a sequence of statements in a computer language, usually a machine code language....

 has a maximum size of 65,280 (FF00h
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

) bytes (256 bytes short of 64 KiB) and stores all its code
Code segment
In computing, a code segment, also known as a text segment or simply as text, is one of the sections of a program in an object file or in memory, which contains executable instructions....

 and data
Data segment
A data segment is a portion of virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer...

 in one segment.

Since it lacks relocation
Relocation (computer science)
"Relocation is the process of assigning load addresses to various parts of [a] program and adjusting the code and data in the program to reflect the assigned addresses."...

 information, it is loaded
Loader (computing)
In computing, a loader is the part of an operating system that is responsible for loading programs. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution...

 by the operating system at a pre-set address, at offset 0100h immediately following the PSP
Program Segment Prefix
The Program Segment Prefix is a data structure used in DOS systems to store the state of a program. It resembles the Zero Page in the CP/M operating system...

, where it is executed (hence the limitation of the executable's size). This was not an issue on early 8-bit
8-bit
The first widely adopted 8-bit microprocessor was the Intel 8080, being used in many hobbyist computers of the late 1970s and early 1980s, often running the CP/M operating system. The Zilog Z80 and the Motorola 6800 were also used in similar computers...

 machines because of how the segmentation model works, but it is the main reason why the format fell into disuse soon after the introduction of 16-
16-bit
-16-bit architecture:The HP BPC, introduced in 1975, was the world's first 16-bit microprocessor. Prominent 16-bit processors include the PDP-11, Intel 8086, Intel 80286 and the WDC 65C816. The Intel 8088 was program-compatible with the Intel 8086, and was 16-bit in that its registers were 16...

 and then 32-bit
32-bit
The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....

 processors with their much larger, segmented
Memory segment
x86 memory segmentation refers to the implementation of memory segmentation on the x86 architecture. Memory is divided into portions that may be addressed by a single index register without changing a 16-bit segment selector. In real mode or V86 mode, a segment is always 64 kilobytes in size . In...

 memory.

In the Intel 8080
Intel 8080
The Intel 8080 was the second 8-bit microprocessor designed and manufactured by Intel and was released in April 1974. It was an extended and enhanced variant of the earlier 8008 design, although without binary compatibility...

 CPU architecture, only 65,536 bytes of memory could be addressed (address range 0000h to FFFFh). Under CP/M, the first 256 bytes of this memory, from 0000h to 00FFh were reserved for system use by the zero page
Zero page
The zero page is the series of memory addresses at the absolute beginning of a computer's address space; that is, the page whose starting address is zero. The size of a "page" depends on the context, and the significance of zero-page memory versus higher addressed memory is highly dependent on...

, and any user program had to be loaded at exactly 0100h to be executed. COM files fit this model perfectly. Note that before the introduction of MP/M
MP/M
MP/M was a multi-user version of the CP/M operating system, created by Digital Research developer Tom Rolander in 1979. It allowed multiple users to connect to a single computer, each using a separate terminal....

 and Concurrent CP/M there was no possibility of running more than one program or command at a time: the program loaded at 0100h was run, and no other.

Although the file format is the same in MS-DOS and CP/M, .COM files for the two operating systems are not compatible; MS-DOS COM files contain x86 instructions and possibly MS-DOS system call
System call
In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...

s, while CP/M COM files contain 8080
Intel 8080
The Intel 8080 was the second 8-bit microprocessor designed and manufactured by Intel and was released in April 1974. It was an extended and enhanced variant of the earlier 8008 design, although without binary compatibility...

 instructions (programs restricted to certain machines could also contain additional instructions for 8085
Intel 8085
The Intel 8085 is an 8-bit microprocessor introduced by Intel in 1977. It was binary-compatible with the more-famous Intel 8080 but required less supporting hardware, thus allowing simpler and less expensive microcomputer systems to be built....

 or Z80
Zilog Z80
The Zilog Z80 is an 8-bit microprocessor designed by Zilog and sold from July 1976 onwards. It was widely used both in desktop and embedded computer designs as well as for military purposes...

) and CP/M system calls.

It is possible to make a .COM file which will run under both operating systems, a "fat binary". There is no true compatibility at the instruction level; the instructions at the entry point are chosen to be meaningful but different in both operating systems, and make program execution jump to the section for the operating system in use. It is basically two different programs with the same functionality in a single file, preceded by code selecting the one to use.

Under CP/M 3, if the first byte of a COM file is C9h there is a 256-byte header; since C9h corresponds to the 8080
Intel 8080
The Intel 8080 was the second 8-bit microprocessor designed and manufactured by Intel and was released in April 1974. It was an extended and enhanced variant of the earlier 8008 design, although without binary compatibility...

 instruction RET, this means that the COM file will immediately terminate if run on an earlier version of CP/M that does not support this extension. (Because the instruction sets of the 8085 and Z80 are supersets of the 8080 instruction set, this works on all three processors.) C9h is an invalid opcode on the 8088/8086, and it will cause an INT 6 exception in v86 mode since the 386
386
Year 386 was a common year starting on Thursday of the Julian calendar. At the time, it was known as the Year of the Consulship of Honorius and Euodius...

. Since C9h is the opcode for LEAVE since the 80188/80186 and therefore not used as the first instruction in a valid program, the executable loader in some versions of DOS rejects COM files that start with C9h, avoiding a crash.

Files may have names ending in .COM, but not be in the simple format described above; this is indicated by a magic number
Magic number (programming)
In computer programming, the term magic number has multiple meanings. It could refer to one or more of the following:* A constant numerical or text value used to identify a file format or protocol; for files, see List of file signatures...

 at the start of the file. For example, the COMMAND.COM
COMMAND.COM
COMMAND.COM is the filename of the default operating system shell for DOS operating systems and the default command line interpreter on Windows 95, Windows 98 and Windows Me...

 file in DR DOS 6.0 is actually in DOS executable
DOS executable
The DOS MZ executable format is the executable file format used for .EXE files in DOS.The file can be identified by the ASCII string "MZ" or the hexadecimal 4D 5A at the beginning of the file . "MZ" are the initials of Mark Zbikowski, one of the developers of MS-DOS...

 format, indicated by the first two bytes being MZ (4Dh 5Ah), the initials of Mark Zbikowski
Mark Zbikowski
Mark "Zibo" Joseph Zbikowski is a former Microsoft Architect and an early computer hacker. He started working at the company only a few years after its inception, leading efforts in MS-DOS, OS/2, Cairo and Windows NT. In 2006 he was honored for 25 years of service with the company, the third...

.

Large programs

In MS-DOS
MS-DOS
MS-DOS is an operating system for x86-based personal computers. It was the most commonly used member of the DOS family of operating systems, and was the main operating system for IBM PC compatible personal computers during the 1980s to the mid 1990s, until it was gradually superseded by operating...

 and compatible DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

es, there is no memory management
Memory management
Memory management is the act of managing computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed. This is critical to the computer system.Several...

 provided for COM files by the loader
Loader (computing)
In computing, a loader is the part of an operating system that is responsible for loading programs. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution...

 or execution environment. All memory is simply available to the COM file. After execution, the operating system command shell, COMMAND.COM
COMMAND.COM
COMMAND.COM is the filename of the default operating system shell for DOS operating systems and the default command line interpreter on Windows 95, Windows 98 and Windows Me...

, is reloaded. This leaves the possibilities that the COM file can either be very simple, using a single segment, or arbitrarily complex, providing its own memory management system. An example of a complex program is COMMAND.COM, the MS-DOS shell, which provided a loader to load other COM or EXE
EXE
EXE is the common filename extension denoting an executable file in the DOS, OpenVMS, Microsoft Windows, Symbian, and OS/2 operating systems....

 programs. In the .COM system, larger programs (up to the available memory size) can be loaded and run, but the system loader assumes that all code and data is in the first segment, and it is up to the .COM program to provide any further organization. Programs larger than available memory, or large data segment
Data segment
A data segment is a portion of virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer...

s, can be handled by dynamic linking, if the necessary code is included in the .COM program. The advantage of using the .COM rather than .EXE format is that the binary image is usually smaller and easier to program using an assembler. Once compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...

s and linkers of sufficient power became available it was no longer advantageous to use the .COM format for complex programs.

Execution preference

In MS-DOS, if a directory contains both a COM file and an EXE
EXE
EXE is the common filename extension denoting an executable file in the DOS, OpenVMS, Microsoft Windows, Symbian, and OS/2 operating systems....

 file with same name (not including extension), the COM file is preferentially selected for execution. For example, if a directory in the system path
Path (variable)
PATH is an environment variable on Unix-like operating systems, DOS, OS/2, and Microsoft Windows, specifying a set of directories where executable programs are located...

 contains two files named foo.com and foo.exe, the following would execute foo.com:

C:\>foo

A user wishing to run foo.exe can explicitly use the complete filename:

C:\>foo.exe

Taking advantage of this default behaviour, virus
Computer virus
A computer virus is a computer program that can replicate itself and spread from one computer to another. The term "virus" is also commonly but erroneously used to refer to other types of malware, including but not limited to adware and spyware programs that do not have the reproductive ability...

 writers and other malicious programmers have used names like notepad.com for their creations, hoping that if it is placed in the same directory as the corresponding EXE file, a command or batch file may accidentally trigger their program instead of the text editor notepad.exe.

On Windows NT
Windows NT
Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was intended to complement...

 and derivatives (Windows 2000
Windows 2000
Windows 2000 is a line of operating systems produced by Microsoft for use on personal computers, business desktops, laptops, and servers. Windows 2000 was released to manufacturing on 15 December 1999 and launched to retail on 17 February 2000. It is the successor to Windows NT 4.0, and is the...

, Windows XP
Windows XP
Windows XP is an operating system produced by Microsoft for use on personal computers, including home and business desktops, laptops and media centers. First released to computer manufacturers on August 24, 2001, it is the second most popular version of Windows, based on installed user base...

, Windows Vista
Windows Vista
Windows Vista is an operating system released in several variations developed by Microsoft for use on personal computers, including home and business desktops, laptops, tablet PCs, and media center PCs...

, and Windows 7), the PATHEXT variable is used to override the order of preference (and acceptable extensions) for calling files without specifying the extension from the command line. The default value still places .com files before .exe files. This closely resembles a feature previously found in JP Software's line of extended command line processors 4DOS
4DOS
4DOS is a command line interpreter by JP Software, designed to replace the default command interpreter COMMAND.COM in DOS and Windows 95/98/Me. The 4DOS family of programs are meant to replace the default command processor. 4OS2 and 4NT replace CMD.EXE in OS/2 and Windows NT respectively...

, 4OS2, and 4NT
4NT
Take Command Console , formerly known as 4DOS for Windows NT and 4NT, is a command line interpreter by JP Software, designed as a substitute for the default command interpreter in Microsoft Windows...

.

Platform support

The format is still executable
Executable
In computing, an executable file causes a computer "to perform indicated tasks according to encoded instructions," as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU...

 on many modern Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

-based platforms
Platform (computing)
A computing platform includes some sort of hardware architecture and a software framework , where the combination allows software, particularly application software, to run...

, but it is run in an MS-DOS-emulating subsystem, NTVDM
Virtual DOS machine
Virtual DOS machine is Microsoft's technology that allows running legacy DOS and 16-bit Windows programs on Intel 80386 or higher computers when there is already another operating system running and controlling the hardware.-Overview:...

, which is not present in 64-bit
64-bit
64-bit is a word size that defines certain classes of computer architecture, buses, memory and CPUs, and by extension the software that runs on them. 64-bit CPUs have existed in supercomputers since the 1970s and in RISC-based workstations and servers since the early 1990s...

 variants. COM files can be executed also on DOS emulators such as DOSBox
DOSBox
DOSBox is emulator software that emulates an IBM PC compatible computer running MS-DOS. It is intended especially for use with old PC games. DOSBox is free software....

, on any platform supported by these emulators.

Malicious usage of the .com extension

Some computer virus writers have hoped to take advantage of modern computer users' likely lack of knowledge of the .com file extension and associated binary format, along with their more likely familiarity with the .com
.com
The domain name com is a generic top-level domain in the Domain Name System of the Internet. Its name is derived from commercial, indicating its original intended purpose for domains registered by commercial organizations...

 Internet domain name. E-mail has been sent with attachment names similar to "www.example.com". Unwary Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

 users clicking on such an attachment would expect to begin browsing a site named http://www.example.com/, but instead would run the attached binary command file named www.example, giving it full permission to do to their machine whatever its author had in mind.

Note that there is nothing malicious about the COM file format itself; this is an exploitation of the coincidental name collision between .com command files and .com commercial web sites.

See also

  • MS-DOS API
    MS-DOS API
    The MS-DOS API is an API used originally in MS-DOS/PC-DOS, and later by other DOS systems. Most calls to the DOS API invoke software interrupt 21h . By calling INT 21h with a subfunction number in the AH processor register and other parameters in other registers, one invokes various DOS services...

  • CMD file (CP/M)
    CMD file (CP/M)
    In CP/M-86, CMD is the filename extension used by executable programs. It corresponds to COM in CP/M-80 and EXE in DOS. The same extension is used by Microsoft Windows for unrelated batch files.-Binary format:...

     Replacement used by CP/M-86
    CP/M-86
    CP/M-86 was a version of the CP/M operating system that Digital Research made for the Intel 8086 and Intel 8088. The commands are those of CP/M-80. Executable files used the relocatable .CMD file format...

  • Comparison of executable file formats
    Comparison of executable file formats
    This is a comparison of executable file formats.Among the above formats, the ones in most common use are PE , ELF and Mach-O .- References :...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK