Common Intermediate Language
Encyclopedia
Common Intermediate Language (CIL, pronounced either "sil" or "kil") (formerly called Microsoft Intermediate Language or MSIL) is the lowest-level human-readable
Human-readable
A human-readable medium or human-readable format is a representation of data or information that can be naturally read by humans.In computing, human-readable data is often encoded as ASCII or Unicode text, rather than presented in a binary representation...

 programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 defined by the Common Language Infrastructure
Common Language Infrastructure
The Common Language Infrastructure is an open specification developed by Microsoft and standardized by ISO and ECMA that describes the executable code and runtime environment that form the core of the Microsoft .NET Framework and the free and open source implementations Mono and Portable.NET...

 (CLI) specification and is used by the .NET Framework
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

 and Mono
Mono (software)
Mono, pronounced , is a free and open source project led by Xamarin to create an Ecma standard compliant .NET-compatible set of tools including, among others, a C# compiler and a Common Language Runtime....

. Languages which target a CLS
Common Language Infrastructure
The Common Language Infrastructure is an open specification developed by Microsoft and standardized by ISO and ECMA that describes the executable code and runtime environment that form the core of the Microsoft .NET Framework and the free and open source implementations Mono and Portable.NET...

-compatible runtime environment compile to CIL, which is assembled into an object code
Object code
Object code, or sometimes object module, is what a computer compiler produces. In a general sense object code is a sequence of statements in a computer language, usually a machine code language....

 that has a bytecode
Bytecode
Bytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...

-style format. CIL is an object-oriented assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

, and is entirely stack-based
Stack machine
A stack machine may be* A real or emulated computer that evaluates each sub-expression of a program statement via a pushdown data stack and uses a reverse Polish notation instruction set....

. Its bytecode is translated into native code or executed by a virtual machine
Virtual machine
A virtual machine is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software emulation or hardware virtualization or both together.-VM Definitions:A virtual machine is a software...

.

CIL was originally known as Microsoft Intermediate Language (MSIL) during the beta releases of the .NET languages. Due to standardization of C# and the Common Language Infrastructure
Common Language Infrastructure
The Common Language Infrastructure is an open specification developed by Microsoft and standardized by ISO and ECMA that describes the executable code and runtime environment that form the core of the Microsoft .NET Framework and the free and open source implementations Mono and Portable.NET...

, the bytecode is now officially known as CIL.

General information

During compilation of .NET programming languages, the source code
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...

 is translated into CIL code rather than platform or processor-specific object code
Object file
An object file is a file containing relocatable format machine code that is usually not directly executable. Object files are produced by an assembler, compiler, or other language translator, and used as input to the linker....

. CIL is a CPU- and platform-independent instruction set that can be executed in any environment supporting the Common Language Infrastructure, such as the .NET runtime
Common Language Runtime
The Common Language Runtime is the virtual machine component of Microsoft's .NET framework and is responsible for managing the execution of .NET programs. In a process known as just-in-time compilation, the CLR compiles the intermediate language code known as CIL into the machine instructions...

 on Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

, or the cross-platform
Cross-platform
In computing, cross-platform, or multi-platform, is an attribute conferred to computer software or computing methods and concepts that are implemented and inter-operate on multiple computer platforms...

 Mono
Mono (software)
Mono, pronounced , is a free and open source project led by Xamarin to create an Ecma standard compliant .NET-compatible set of tools including, among others, a C# compiler and a Common Language Runtime....

 runtime. In theory, this eliminates the need to distribute different executable files for different platforms and CPU types. CIL code is verified for safety during runtime, providing better security and reliability than natively compiled executable files.

The execution process looks like this:
  1. Source code is converted to Common Intermediate Language, CIL's equivalent to Assembly language for a CPU.
  2. CIL is then assembled into a form of so called bytecode
    Bytecode
    Bytecode, also known as p-code , is a term which has been used to denote various forms of instruction sets designed for efficient execution by a software interpreter as well as being suitable for further compilation into machine code...

     and a .NET assembly
    .NET assembly
    In the .NET framework, an assembly is a compiled code library used for deployment, versioning, and security. There are two types: process assemblies and library assemblies . A process assembly represents a process that will use classes defined in library assemblies...

     is created.
  3. Upon execution of a .NET assembly, its code is passed through the runtime's JIT compiler to generate native code. Ahead-of-time compilation may also be used, which eliminates this step, but at the cost of executable file portability.
  4. The native code is executed by the computer's processor.

Instructions

CIL bytecode has instructions for the following groups of tasks:
  • Load and store
  • Arithmetic
    Arithmetic
    Arithmetic or arithmetics is the oldest and most elementary branch of mathematics, used by almost everyone, for tasks ranging from simple day-to-day counting to advanced science and business calculations. It involves the study of quantity, especially as the result of combining numbers...

  • Type conversion
    Type conversion
    In computer science, type conversion, typecasting, and coercion are different ways of, implicitly or explicitly, changing an entity of one data type into another. This is done to take advantage of certain features of type hierarchies or type representations...

  • Object creation and manipulation
  • Operand stack management (push / pop)
    Stack (data structure)
    In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...

  • Control transfer (branching)
    Branch (computer science)
    A branch is sequence of code in a computer program which is conditionally executed depending on whether the flow of control is altered or not . The term can be used when referring to programs in high level languages as well as program written in machine code or assembly language...

  • Method invocation and return
    Subroutine
    In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

  • Throwing exceptions
    Exception handling
    Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution....

  • Monitor-based concurrency
    Monitor (synchronization)
    In concurrent programming, a monitor is an object or module intended to be used safely by more than one thread. The defining characteristic of a monitor is that its methods are executed with mutual exclusion. That is, at each point in time, at most one thread may be executing any of its methods...


Computational model

The Common Intermediate Language is object-oriented and stack-based. That means that data is pushed on a stack instead of pulled from registers like in most CPU architectures.

In x86 it might look like this:


add eax, edx


The corresponding code in IL
Intermediate language
In computer science, an intermediate language is the language of an abstract machine designed to aid in the analysis of computer programs. The term comes from their use in compilers, where a compiler first translates the source code of a program into a form more suitable for code-improving...

 can be rendered as this:


ldloc.0
ldloc.1
add
stloc.0 // a = a + b or a += b;


Here are two locals that are pushed on the stack. When the add-instruction is called the operands get popped and the result is pushed. The remaining value is then popped and stored in the first local.

Object-oriented concepts

This extends to object-oriented concepts as well. You may create objects, call methods and use other types of members such as fields.

CIL is designed to be object-oriented and every method needs (with some exceptions) to reside in a class. So does this static method:


.class public Foo
{
.method public static int32 Add(int32, int32) cil managed
{
.maxstack 2
ldarg.0 // load the first argument;
ldarg.1 // load the second argument;
add // add them;
ret // return the result;
}
}


This method does not require any instance of Foo to be declared because it is static. That means it belongs to the class and it may then be used like this in C#:


int r = Foo.Add(2, 3); // 5


In CIL:


ldc.i4.2
ldc.i4.3
call int32 Foo::Add(int32, int32)
stloc.0


Instance classes

An instance class contains at least one constructor and some instance members. This class has a set of methods representing actions of a Car-object.


.class public Car
{
.method public specialname rtspecialname instance void .ctor(int32, int32) cil managed
{
/* Constructor */
}

.method public void Move(int32) cil managed
{
/* Omitting implementation */
}

.method public void TurnRight cil managed
{
/* Omitting implementation */
}

.method public void TurnLeft cil managed
{
/* Omitting implementation */
}

.method public void Brake cil managed
{
/* Omitting implementation */
}
}


Creating objects

In C# class instances are created like this:


Car myCar = new Car(1, 4);
Car yourCar = new Car(1, 3);


And these statements are roughly the same as these instructions:


ldc.i4.1
ldc.i4.4
newobj instance void Car::.ctor(int, int)
stloc.0 // myCar = new Car(1, 4);
ldc.i4.1
ldc.i4.3
newobj instance void Car::.ctor(int, int)
stloc.1 // yourCar = new Car(1, 3);


Invoking instance methods

Instance methods are invoked like the one that follows:


myCar.Move(3);


In CIL:


ldloc.0 // Load the object "myCar" on the stack
ldc.i4.3
call instance void Car::Move(int32)

Metadata

.NET records information about compiled classes as Metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

. Like the type library in the Component Object Model
Component Object Model
Component Object Model is a binary-interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages...

, this enables applications to support and discover the interfaces, classes, types, methods, and fields in the assembly. The process of reading such metadata is called reflection
Reflection (computer science)
In computer science, reflection is the process by which a computer program can observe and modify its own structure and behavior at runtime....

.

Metadata can be data in the form of attributes. Attributes can be custom made by extending from the Attribute class. This is a very powerful feature.

Example

Below is a basic Hello, World program written in CIL. It will display the string "Hello, world!".


.assembly Hello {}
.assembly extern mscorlib {}
.method static void Main
{
.entrypoint
.maxstack 1
ldstr "Hello, world!"
call void [mscorlib]System.Console::WriteLine(string)
call string[mscorlib]System.Console::ReadLine
pop
ret
}


The following code is more complex in number of opcodes.

This code can also be compared with the corresponding code in the article about Java bytecode
Java bytecode
Java bytecode is the form of instructions that the Java virtual machine executes. Each bytecode opcode is one byte in length, although some require parameters, resulting in some multi-byte instructions. Not all of the possible 256 opcodes are used. 51 are reserved for future use...

.



static void Main(string[] args)
{
outer:
for (int i = 2; i < 1000; i++)
{
for (int j = 2; j < i; j++)
{
if (i % j 0)
goto outer;
}
Console.WriteLine(i);
}
}


In CIL syntax it looks like this:


.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
.maxstack 2
.locals init (int32 V_0,
int32 V_1)

IL_0000: ldc.i4.2
stloc.0
br.s IL_001f
IL_0004: ldc.i4.2
stloc.1
br.s IL_0011
IL_0008: ldloc.0
ldloc.1
rem
brfalse.s IL_0000
ldloc.1
ldc.i4.1
add
stloc.1
IL_0011: ldloc.1
ldloc.0
blt.s IL_0008
ldloc.0
call void [mscorlib]System.Console::WriteLine(int32)
ldloc.0
ldc.i4.1
add
stloc.0
IL_001f: ldloc.0
ldc.i4 0x3e8
blt.s IL_0004
ret
}


This is just a representation of how CIL looks like near VM-level. When compiled the methods are stored in tables and the instructions are stored as bytes inside the assembly, which is a Portable Executable-file (PE).
Generation
A CIL assembly and instructions are generated by either a compiler or a utility called the IL Assembler (ILASM) that is shipped with the execution environment.

Assembled IL can also be disassembled into code again using the IL Disassembler (ILDASM). There are other tools such as .NET Reflector
.NET Reflector
.NET Reflector is a proprietary software utility for Microsoft .NET combining class browsing, static analysis and decompilation, originally written by Lutz Roeder. MSDN Magazine named it as one of the Ten Must-Have utilities for developers, and Scott Hanselman listed it as part of his "Big Ten...

 that can decompile IL into a high-level language (e.g. C# or Visual Basic
Visual Basic .NET
Visual Basic .NET , is an object-oriented computer programming language that can be viewed as an evolution of the classic Visual Basic , which is implemented on the .NET Framework...

). This makes IL a very easy target for reverse engineering. This trait is shared with Java bytecode
Java bytecode
Java bytecode is the form of instructions that the Java virtual machine executes. Each bytecode opcode is one byte in length, although some require parameters, resulting in some multi-byte instructions. Not all of the possible 256 opcodes are used. 51 are reserved for future use...

. However, there are tools that can obfuscate
Obfuscated code
Obfuscated code is source or machine code that has been made difficult to understand for humans. Programmers may deliberately obfuscate code to conceal its purpose or its logic to prevent tampering, deter reverse engineering, or as a puzzle or recreational challenge for someone reading the source...

 the code, and do it so that the code cannot be easily readable but still be runnable.

Just-in-time compilation

Just-in-time compilation
Just-in-time compilation
In computing, just-in-time compilation , also known as dynamic translation, is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static compilation...

 involves turning the byte-code into code immediately executable by the CPU. The conversion is performed gradually during the program's execution. JIT compilation provides environment-specific optimization, runtime type safety, and assembly verification. To accomplish this, the JIT compiler examines the assembly metadata for any illegal accesses and handles violations appropriately.

Ahead-of-time compilation

CLI
Common Language Infrastructure
The Common Language Infrastructure is an open specification developed by Microsoft and standardized by ISO and ECMA that describes the executable code and runtime environment that form the core of the Microsoft .NET Framework and the free and open source implementations Mono and Portable.NET...

-compatible execution environments also come with the option to do a Ahead-of-time compilation (AOT) of an assembly to make it execute faster by removing the JIT process at runtime.

In the .NET Framework
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

 there is a special tool called the Native Image Generator
Native Image Generator
The Native Image Generator, or simply NGEN is the Ahead-of-time compilation service of the .NET Framework. It allows a .NET assembly to be pre-compiled instead of letting the Common Language Runtime do a Just-in-time compilation at runtime...

 (NGEN) that performs the AOT. In Mono there is also an option to do an AOT.
External links
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK