Joel McCormack
Encyclopedia
Joel McCormack is the designer of the NCR Corporation
version of the p-code machine
which is a kind of Stack machine
popular in the 1970s as the preferred way to implement new computing architectures and languages such as Pascal
and BCPL
. The NCR design shares no common architecture with the Pascal MicroEngine
designed by Western Digital
but both were meant to execute the UCSD p-System.[1,2]
, originally presented p-code
in his PhD thesis (see Urs Ammann, On Code Generation in a Pascal Compiler, Software—Practice and Experience, Vol. 7, No. 3, 1977, pp. 391–423). The central idea is that a complex software system is coded for a non-existent, fictitious, minimal computer or virtual machine
and that computer is realized on specific real hardware with an interpreting computer program that is typically small, simple, and quickly developed. The Pascal programming language had to be re-written for every new computer being acquired, so Ammann proposed writing the system one time to a virtual architecture. The successful academic implementation of Pascal
was the UCSD p-System developed by Kenneth Bowles
, a professor at UCSD, who began the project of developing a universal Pascal
programming environment using the P-machine architecture for the multitude of different computing platforms in use at that time. McCormack was part of a team of undergraduates working on the project.[3] He took this familiarity and experience with him to NCR
.
implementation of the p-code machine
using the Am2900 chip set. This CPU had a myriad of timing and performance problems so
McCormack proposed a total redesign of the processor using a programmable logic device
based Microsequencer
. McCormack left NCR to start a company called Volition Systems but continued the work on the CPU as a contractor.
The new CPU used an 80-bit wide microword, so parallelism in the microcode was radically enhanced. There were several loops
in the microcode that were a single instruction long and many of the simpler p-code ops took 1 or 2 microcode instructions. With the wide microword and the way the busses were carefully arranged, as well as incrementing memory address registers, the cpu could execute operations inside the ALU while transferring a memory word directly to the onboard stack, or feed one source into the ALU while sending a previously computed register to the destination bus in a single microcycle.
The cpu ran at three different clock speeds (using delay lines for a selectable clock); two bits in the microword selected the cycle time for that instruction. The clocks around 130, 150, and 175 nanoseconds. Newer parts from AMD would have allowed
a faster 98 nsec cycle for the fastest instructions, but they didn't come out with a correspondingly faster branch control unit.
There was a separate prefetch/instruction formatting unit (again, using stoppable delay line clocks for synchronization...asynchronous logic allows for skewed timings). It had a 32-bit buffer and could deliver up the next data as a signed byte, unsigned byte, 16-bit word, or "big" operand (the one-or-two byte format where 0..127 was encoded as one byte, and 128..32767 was encoded as two bytes).
There was an onboard stack of 1024 16-bit words, so that both scalars and sets could be operated on there. The top of
the stack was actually kept in one of the AMD 2901's registers, so that simple operations like integer addition took a single cycle.
before we stole the technique of keeping the top word of the stack in one of the AMD 2901 registers. These often resulted in one fewer microinstructions. (The stack doesn't quite operate this way...it decrements before data is written to it, and increments
after data is read.)
Since next-address control and next microcode location were in each wide microword, there was no penalty for any-order execution of
the microcode. Thus, we had a table of 256 labels, and the microcode compiler moved the first instruction at each of those labels to the first 256 locations of microcode memory. The only restriction this placed upon the microcode was that if the p-code required more than one microinstruction, then the first microinstruction couldn't have any flow control specified (as it would be filled in with a "goto).
microinstructions. For example, here are a few p-codes the way they ended up. tos is a register, and q is a register. "|" means parallel activities in a single cycle. (The stack doesn't quite operate this way...it decrements before data is written to it, and increments after data is read.)
Since next-address control and next microcode location were in each wide microword, there was no penalty for any-order execution of
the microcode. A table of 256 labels, and the microcode compiler moved the first instruction at each of those labels to the
first 256 locations of microcode memory. The only restriction this placed upon the microcode was that if the p-code required more than one microinstruction, then the first microinstruction couldn't have any flow control specified (as it would be filled in with a "goto).
This architecture should be compared to the original P-code machine
specification as proposed by Niklaus Wirth
.
did by replacing the LSI-11 microcode with p-code microcode. It also ran faster than the Niklaus Wirth
Lilith
machine but lacked the bit-mapped graphics capabilities, and around the same speed as a VAX-11/750
running native code. (But the VAX was hampered by the poor code coming out of the Berkeley Pascal compiler, and was also a 32-bit machine.)
NCR Corporation
NCR Corporation is an American technology company specializing in kiosk products for the retail, financial, travel, healthcare, food service, entertainment, gaming and public sector industries. Its main products are self-service kiosks, point-of-sale terminals, automated teller machines, check...
version of the p-code machine
P-Code machine
In computer programming, a p-code machine, or portable code machine is a virtual machine designed to execute p-code...
which is a kind of Stack machine
Stack machine
A stack machine may be* A real or emulated computer that evaluates each sub-expression of a program statement via a pushdown data stack and uses a reverse Polish notation instruction set....
popular in the 1970s as the preferred way to implement new computing architectures and languages such as Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...
and BCPL
BCPL
BCPL is a procedural, imperative, and structured computer programming language designed by Martin Richards of the University of Cambridge in 1966.- Design :...
. The NCR design shares no common architecture with the Pascal MicroEngine
Pascal MicroEngine
The Pascal MicroEngine was a series of microcomputer products manufactured by Western Digital from 1979 through the mid 1980s, designed specifically to efficiently run the UCSD p-System...
designed by Western Digital
Western Digital
Western Digital Corporation is one of the largest computer hard disk drive manufacturers in the world. It has a long history in the electronics industry as an integrated circuit maker and a storage products company. Western Digital was founded on April 23, 1970 by Alvin B...
but both were meant to execute the UCSD p-System.[1,2]
P-machine Theory
Urs Ammann, a student of Niklaus WirthNiklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...
, originally presented p-code
P-Code machine
In computer programming, a p-code machine, or portable code machine is a virtual machine designed to execute p-code...
in his PhD thesis (see Urs Ammann, On Code Generation in a Pascal Compiler, Software—Practice and Experience, Vol. 7, No. 3, 1977, pp. 391–423). The central idea is that a complex software system is coded for a non-existent, fictitious, minimal computer or virtual machine
Virtual machine
A virtual machine is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software emulation or hardware virtualization or both together.-VM Definitions:A virtual machine is a software...
and that computer is realized on specific real hardware with an interpreting computer program that is typically small, simple, and quickly developed. The Pascal programming language had to be re-written for every new computer being acquired, so Ammann proposed writing the system one time to a virtual architecture. The successful academic implementation of Pascal
UCSD Pascal
UCSD Pascal was a Pascal programming language system that ran on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1978...
was the UCSD p-System developed by Kenneth Bowles
Kenneth Bowles
Dr. Kenneth L "Ken" Bowles is best known for his work in initiating and directing the UCSD Pascal project, when he was a professor of Computer Science at the University of California, San Diego .- Education :Bowles received his PhD under Prof...
, a professor at UCSD, who began the project of developing a universal Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...
programming environment using the P-machine architecture for the multitude of different computing platforms in use at that time. McCormack was part of a team of undergraduates working on the project.[3] He took this familiarity and experience with him to NCR
NCR Corporation
NCR Corporation is an American technology company specializing in kiosk products for the retail, financial, travel, healthcare, food service, entertainment, gaming and public sector industries. Its main products are self-service kiosks, point-of-sale terminals, automated teller machines, check...
.
P-machine Design
In 1979 McCormack was employed by NCR right out of college, and they had developed a Bit slicingBit slicing
Bit slicing is a technique for constructing a processor from modules of smaller bit width. Each of these components processes one bit field or "slice" of an operand...
implementation of the p-code machine
P-Code machine
In computer programming, a p-code machine, or portable code machine is a virtual machine designed to execute p-code...
using the Am2900 chip set. This CPU had a myriad of timing and performance problems so
McCormack proposed a total redesign of the processor using a programmable logic device
Programmable logic device
A programmable logic device or PLD is an electronic component used to build reconfigurable digital circuits. Unlike a logic gate, which has a fixed function, a PLD has an undefined function at the time of manufacture...
based Microsequencer
Microsequencer
In computer architecture and engineering, a sequencer or microsequencer is a part of the control unit of a CPU. It generates the addresses used to step through the microprogram of a control store....
. McCormack left NCR to start a company called Volition Systems but continued the work on the CPU as a contractor.
The new CPU used an 80-bit wide microword, so parallelism in the microcode was radically enhanced. There were several loops
in the microcode that were a single instruction long and many of the simpler p-code ops took 1 or 2 microcode instructions. With the wide microword and the way the busses were carefully arranged, as well as incrementing memory address registers, the cpu could execute operations inside the ALU while transferring a memory word directly to the onboard stack, or feed one source into the ALU while sending a previously computed register to the destination bus in a single microcycle.
The cpu ran at three different clock speeds (using delay lines for a selectable clock); two bits in the microword selected the cycle time for that instruction. The clocks around 130, 150, and 175 nanoseconds. Newer parts from AMD would have allowed
a faster 98 nsec cycle for the fastest instructions, but they didn't come out with a correspondingly faster branch control unit.
There was a separate prefetch/instruction formatting unit (again, using stoppable delay line clocks for synchronization...asynchronous logic allows for skewed timings). It had a 32-bit buffer and could deliver up the next data as a signed byte, unsigned byte, 16-bit word, or "big" operand (the one-or-two byte format where 0..127 was encoded as one byte, and 128..32767 was encoded as two bytes).
There was an onboard stack of 1024 16-bit words, so that both scalars and sets could be operated on there. The top of
the stack was actually kept in one of the AMD 2901's registers, so that simple operations like integer addition took a single cycle.
before we stole the technique of keeping the top word of the stack in one of the AMD 2901 registers. These often resulted in one fewer microinstructions. (The stack doesn't quite operate this way...it decrements before data is written to it, and increments
after data is read.)
Since next-address control and next microcode location were in each wide microword, there was no penalty for any-order execution of
the microcode. Thus, we had a table of 256 labels, and the microcode compiler moved the first instruction at each of those labels to the first 256 locations of microcode memory. The only restriction this placed upon the microcode was that if the p-code required more than one microinstruction, then the first microinstruction couldn't have any flow control specified (as it would be filled in with a "goto
P-machine Architecture
The CPU used the technique of keeping the top word of the stack in one of the AMD 2901 registers. This often resulted in one fewermicroinstructions. For example, here are a few p-codes the way they ended up. tos is a register, and q is a register. "|" means parallel activities in a single cycle. (The stack doesn't quite operate this way...it decrements before data is written to it, and increments after data is read.)
Since next-address control and next microcode location were in each wide microword, there was no penalty for any-order execution of
the microcode. A table of 256 labels, and the microcode compiler moved the first instruction at each of those labels to the
first 256 locations of microcode memory. The only restriction this placed upon the microcode was that if the p-code required more than one microinstruction, then the first microinstruction couldn't have any flow control specified (as it would be filled in with a "goto
fetch % Fetch and save in an AMD register the next byte opcode from
% the prefetch unit, and go to that location in the microcode.
q := ubyte | goto ubyte
SLDCI % Short load constant integer (push opcode byte)
% Push top-of-stack AMD register onto real stack, load
% the top-of-stack register with the fetched opcode that got us here
dec(sp) | stack := tos | tos := q | goto fetch
LDCI % Load constant integer (push opcode word)
% A lot like SLDCI, except fetch 2-byte word and "push" on stack
dec(sp) | stack := tos | tos := word | goto fetch
SLDL1 % Short load local variable at offset 1
% mpd0 is a pointer to local data at offset 0. Write appropriate
% data address into the byte-addressed memory-address-register
mar := mpd0+2
% Push tos, load new tos from memory
SLDX dec(sp) | stack := tos | tos := memword | goto fetch
LDL % Load local variable at offset specified by "big" operand
r0 := big
mar := mpd0 + r0 | goto sldx
INCR % Increment top-of-stack by big operand
tos := tos + big | goto fetch
ADI % Add two words on top of stack
tos := tos + stack | inc(sp) | goto fetch
EQUI % Top two words of stack equal?
test tos - stack | inc(sp)
tos := 0 | if ~zero goto fetch
tos := 1 | goto fetch
This architecture should be compared to the original P-code machine
P-Code machine
In computer programming, a p-code machine, or portable code machine is a virtual machine designed to execute p-code...
specification as proposed by Niklaus Wirth
Niklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...
.
P-machine Performance
The end result was a 9"x11" board for the CPU that ran UCSD p-System faster than anything else, by a wide margin. As much as 35-50 times faster than the LSI-11 interpreter, and 7-9 times faster than the Western Digital Pascal MicroEnginePascal MicroEngine
The Pascal MicroEngine was a series of microcomputer products manufactured by Western Digital from 1979 through the mid 1980s, designed specifically to efficiently run the UCSD p-System...
did by replacing the LSI-11 microcode with p-code microcode. It also ran faster than the Niklaus Wirth
Niklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...
Lilith
Lilith (computer)
Lilith is the name of custom built workstation using the AMD 2901 bit-slice processor by the group of Niklaus Wirth at ETH Zürich. The project started in 1977 and by 1984 several hundred workstations were in use. It had a high resolution full page display, a mouse, a laser printer interface, and a...
machine but lacked the bit-mapped graphics capabilities, and around the same speed as a VAX-11/750
VAX-11
The VAX-11 was a family of minicomputers developed and manufactured by Digital Equipment Corporation using processors implementing the VAX instruction set architecture . The VAX-11/780 was the first VAX computer.- VAX-11/780 :...
running native code. (But the VAX was hampered by the poor code coming out of the Berkeley Pascal compiler, and was also a 32-bit machine.)
Education
- University of California, San DiegoUniversity of California, San DiegoThe University of California, San Diego, commonly known as UCSD or UC San Diego, is a public research university located in the La Jolla neighborhood of San Diego, California, United States...
: BA, 1978 - University of California, San DiegoUniversity of California, San DiegoThe University of California, San Diego, commonly known as UCSD or UC San Diego, is a public research university located in the La Jolla neighborhood of San Diego, California, United States...
: MS, 1979
Later Employment
- Digital Equipment CorporationDigital Equipment CorporationDigital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
- Compaq Computer Corporation
- Hewlett-PackardHewlett-PackardHewlett-Packard Company or HP is an American multinational information technology corporation headquartered in Palo Alto, California, USA that provides products, technologies, softwares, solutions and services to consumers, small- and medium-sized businesses and large enterprises, including...
Publications
- Joel McCormack, Robert McNamara. Efficient and Tiled Polygon Traversal Using Half-Plane Edge Functions, to appear as Research Report 2000/4, Compaq Western Research Laboratory, August 2000. [Superset of Workshop paper listed immediately below.]
- Joel McCormack, Robert McNamara. Tiled Polygon Traversal Using Half-Plane Edge Functions, Proceedings of the 2000 EUROGRAPHICS/SIGGRAPH Workshop on Graphics Hardware, ACM Press, New York, August 2000, pp. 15-21.
- Robert McNamara, Joel McCormack, Norman P. Jouppi. Prefiltered Antialiased Lines Using Half-Plane Distance Functions, Research Report 98/2, Compaq Western Research Laboratory, August 2000. [Superset of Workshop paper listed immediately below.]
- Robert McNamara, Joel McCormack, Norman P. Jouppi. Prefiltered Antialiased Lines Using Half-Plane Distance Functions, Proceedings of the 2000 EUROGRAPHICS/SIGGRAPH Workshop on Graphics Hardware, ACM Press, New York, August 2000, pp. 77-85.
- Joel McCormack, Keith I. Farkas, Ronald Perry, Norman P. Jouppi. Simple and Table Feline: Fast Elliptical Lines for Anisotropic Texture Mapping, Research Report 99/1, Compaq Western Research Laboratory, October 1999. [Superset of SIGGRAPH paper listed immediately below.]
- Joel McCormack, Ronald Perry, Keith I. Farkas, Norman P. Jouppi. Feline: Fast Elliptical Lines for Anisotropic Texture Mapping, SIGGRAPH 99 Conference Proceedings, ACM Press, New York, August 1999, pp. 243-250.
- Joel McCormack, Robert McNamara, Christopher Gianos, Larry Seiler, Norman P. Jouppi, Ken Correll, Todd Dutton, John Zurawski. Neon: A (Big) (Fast) Single-Chip 3D Workstation Graphics Accelerator, Research Report 98/1, Compaq Western Research Laboratory, Revised July 1999. [Superset of Workshop and IEEE Neon papers listed immediately below.]
- Joel McCormack, Robert McNamara, Christopher Gianos, Larry Seiler, Norman P. Jouppi, Ken Correll, Todd Dutton, John Zurawski. Implementing Neon: A 256-bit Graphics Accelerator, IEEE Micro, Vol. 19, No. 2, March/April 1999, pp. 58-69.
- Joel McCormack, Robert McNamara, Christopher Gianos, Larry Seiler, Norman P. Jouppi, Ken Correll. Neon: A Single-Chip 3D Workstation Graphics Accelerator, Proceedings of the 1998 EUROGRAPHICS/SIGGRAPH Workshop on Graphics Hardware, ACM Press, New York, August 1998, pp. 123-132. [Voted Best Paper/Presentation.]
- Joel McCormack, Robert McNamara. A Smart Frame Buffer, Research Report 93/1, Digital Equipment Corporation, Western Research Laboratory, January 1993. [Superset of USENIX paper listed immediately below.]
- Joel McCormack, Robert McNamara. A Sketch of the Smart Frame Buffer, Proceedings of the 1993 Winter USENIX Conference, USENIX Association, Berkeley, January 1993, pp. 169-179.
- Joel McCormack. Writing Fast X Servers for Dumb Color Frame Buffers, Research Report 91/1, Digital Equipment Corporation, Western Research Laboratory, February 1991. [Superset of the Software: Practice and Experience paper listed immediately below.]
- Joel McCormack. Writing Fast X Servers for Dumb Color Frame Buffers, Software - Practice and Experience, Vol 20(S2), John Wiley & Sons, Ltd., West Sussex, England, October 1990, pp. 83-108. [Translated and reprinted in the Japanese edition of UNIX Magazine, ASCII Corp., October 1991, pp. 76-96.]
- Hania Gajewska, Mark S. Manasse, Joel McCormack. Why X is Not Our Ideal Window System, Software - Practice and Experience, Vol 20(S2), John Wiley & Sons, Ltd., West Sussex, England, October 1990, pp. 137-171.
- Paul J. Asente and Ralph R. Swick, with Joel McCormack. X Window System Toolkit: The Complete Programmer's Guide and Specification, X Version 11, Release 4, Digital Press, Maynard, Massachusetts, 1990.
- Joel McCormack, Paul Asente. An Overview of the X Toolkit, Proceedings of the ACM SIGGRAPH Symposium on User Interface Software, ACM Press, New York, October 1988, pp. 46-55.
- Joel McCormack, Paul Asente. Using the X Toolkit, or, How to Write a Widget. Proceedings of the Summer 1988 USENIX Conference, USENIX Association, Berkeley, June 1988, pp. 1-14.
- Joel McCormack. The Right Language for the Job. UNIX ReviewUNIX ReviewUNIX Review was an American magazine covering technical aspects of the UNIX operating system and C programming. Recognized for its in-depth technical analyses, the journal also reported on industry confabs and included some lighter fare....
, REVIEW Publications Co., Renton, Washington, Vol. 3, No. 9, September 1985, pp. 22-32.
- Joel McCormack, Richard Gleaves. Modula-2: A Worthy Successor to Pascal, BYTE, Byte Publications, Peterborough, New Hampshire, Vol. 8, No. 4, April 1983, pp. 385-395.
See also
- UCSD p-System
- p-code machineP-Code machineIn computer programming, a p-code machine, or portable code machine is a virtual machine designed to execute p-code...
- Pascal MicroEnginePascal MicroEngineThe Pascal MicroEngine was a series of microcomputer products manufactured by Western Digital from 1979 through the mid 1980s, designed specifically to efficiently run the UCSD p-System...