Name mangling
Encyclopedia
In compiler construction
, name mangling (also called name decoration) is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming language
s.
It provides a way of encoding additional information in the name of a function, structure
, class
or another datatype in order to pass more semantic information from the compiler
s to linkers.
The need arises where the language allows different entities to be named with the same identifier
as long as they occupy a different namespace
(where a namespace is typically defined by a module, class, or explicit namespace directive).
Any object code
produced by compilers is usually linked with other pieces of object code (produced by the same or another compiler) by a type of program called a linker. The linker needs a great deal of information on each program entity. For example, to correctly link a function it needs its name, the number of arguments and their types, and so on.
For example, compilers targeted at Microsoft Windows platforms support a variety of calling convention
s, which determine the manner in which parameters are sent to subroutines and results returned. Because the different calling conventions are not compatible with one another, compilers mangle symbols with codes detailing which convention should be used.
The mangling scheme was established by Microsoft, and has been informally followed by other compilers including Digital Mars, Borland, and GNU gcc, when compiling code for the Windows platforms. The scheme even applies to other languages, such as Pascal, D, Delphi, Fortran
, and C#. This allows subroutines written in those languages to call, or be called by, existing Windows libraries using a calling convention different from their default.
When compiling the following C examples:
32 bit compilers emit, respectively:
_f
_g@4
@h@4
In the stdcall and fastcall mangling schemes, the function is encoded as _name@X and @name@X respectively, where X is the number of bytes, in decimal, of the argument(s) in the parameter list (including those passed in registers, for fastcall). In the case of cdecl, the function name is merely prefixed by an underscore.
Note that the 64-bit convention on Windows (Microsoft C) is no leading underscore. This difference may in some rare cases lead to unresolved externals when porting such code to 64 bits. For example, Fortran code can use 'alias' to link against a C method by name as follows:
This will compile and link fine under 32 bits, but generate an unresolved external '_f' under 64 bits. One work around for this is to not use 'alias' at all (in which the method names typically need to be capitalized in C and Fortran), or to use the BIND option:
compilers are the most widespread and yet least standard users of name mangling. The first C++ compilers were implemented as translators to C
source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers which produced machine code or assembly directly, the system's linker generally did not support C++ symbols, and mangling was still required.
The C++
language does not define a standard decoration scheme, so each compiler uses its own. C++ also has complex language features, such as classes, templates, namespaces, and operator overloading
, that alter the meaning of specific symbols based on context or usage. Meta-data about these features can be disambiguated by mangling (decorating) the name of a symbol
. Because the name-mangling systems for such features are not standardized across compilers, few linkers can link object code that was produced by different compilers.
These are distinct functions, with no relation to each other apart from the name. If they were natively translated into C with no changes, the result would be an error — C does not permit two functions with the same name. The C++ compiler therefore will encode the type information in the symbol name, the result being something resembling:
Notice that g is mangled even though there is no conflict; name mangling applies to all symbols.
All mangled symbols begin with _Z (note that an underscore followed by a capital is a reserved identifier
in C and C++, so conflict with user identifiers is avoided); for nested names (including both namespaces and classes), this is followed by N, then a series of <length, id> pairs (the length being the length of the next identifier), and finally E. For example, wikipedia::article::format becomes
_ZN·9wikipedia·7article·6format·E
For functions, this is then followed by the type information; as format is a void function, this is simply v; hence:
_ZN·9wikipedia·7article·6format·E·v
For print_to, a standard type std::ostream (or more properly std::basic_ostream<char, char_traits<char> >) is used, which has the special alias So; a reference to this type is therefore RSo, with the complete name for the function being:
_ZN·9wikipedia·7article·8print_to·E·RSo
!Compiler
!void h(int)
!void h(int, char)
!void h(void)
|-
|Intel C++ 8.0 for Linux
|_Z1hi
|_Z1hic
|_Z1hv
|-
|HP aC++ A.05.55 IA-64
|_Z1hi
|_Z1hic
|_Z1hv
|-
|GCC 3.x and 4.x
|_Z1hi
|_Z1hic
|_Z1hv
|-
|GCC
2.9x
|h__Fi
|h__Fic
|h__Fv
|-
|HP aC++ A.03.45 PA-RISC
|h__Fi
|h__Fic
|h__Fv
|-
|Microsoft VC++ v6/v7
|?h@@YAXH@Z
|?h@@YAXHD@Z
|?h@@YAXXZ
|-
|Digital Mars
C++
|?h@@YAXH@Z
|?h@@YAXHD@Z
|?h@@YAXXZ
|-
|Borland C++ v3.1
|@h$qi
|@h$qizc
|@h$qv
|-
|OpenVMS C++ V6.5 (ARM mode)
|H__XI
|H__XIC
|H__XV
|-
|OpenVMS C++ V6.5 (ANSI mode)
|CXX$__7H__FI0ARG51T
|CXX$__7H__FIC26CDH77
|CXX$__7H__FV2CB06E8
|-
|OpenVMS C++ X7.1 IA-64
|CXX$_Z1HI2DSQ26A
|CXX$_Z1HIC2NP3LI4
|CXX$_Z1HV0BCA19V
|-
|SunPro CC
|__1cBh6Fi_v_
|__1cBh6Fic_v_
|__1cBh6F_v_
|-
|Tru64 C++ V6.5 (ARM mode)
|h__Xi
|h__Xic
|h__Xv
|-
|Tru64 C++ V6.5 (ANSI mode)
|__7h__Fi
|__7h__Fic
|__7h__Fv
|-
|Watcom C++ 10.6
|W?h$n(i)v
|W?h$n(ia)v
|W?h$nv
|}>
Notes:
is to ensure that the symbols following are "unmangled" – that the compiler emits a binary file with their names undecorated, as a C compiler would do. As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.
For example, the standard strings library, <string.h> usually contains something resembling:
Thus, code such as:
uses the correct, unmangled strcmp and memset. If the extern had not been used, the (SunPro) C++ compiler would produce code equivalent to:
Since those symbols do not exist in the C runtime library (e.g. libc), link errors would result.
issues in a C++ implementation. Other ABI issues like exception handling
, virtual table layout, structure padding
, etc. cause differing C++ implementations to be incompatible. Further, requiring a particular form of mangling would cause issues for systems where implementation limits (e.g. length of symbols) dictate a particular mangling scheme. A standardised requirement for name mangling would also prevent an implementation where mangling was not required at all — for example, a linker which understood the C++ language.
The C++ standard therefore does not attempt to standardise name mangling. On the contrary, the Annotated C++ Reference Manual (also known as ARM, ISBN 0-201-51459-1, section 7.2.1c) actively encourages the use of different mangling schemes to prevent linking when other aspects of the ABI, such as exception handling
and virtual table layout, are incompatible.
and shared object files, the name mangling scheme is not merely a compiler-internal matter. Different compilers (or different versions of the same compiler, in many cases) produce such binaries under different name decoration schemes, meaning that symbols are frequently unresolved if the compilers used to create the library and the program using it employed different schemes. For example, if a system with multiple C++ compilers installed (e.g. GNU GCC and the OS vendor's compiler) wished to install the Boost C++ Libraries, it would have to be compiled twice — once for the vendor compiler and once for GCC.
It is good for safety purposes that compilers producing incompatible object codes (codes based on different ABIs, regarding e.g. classes and exceptions) use different name mangling schemes. This guarantees that these incompatibilities are detected at the linking phase, not when executing the software (which could lead to obscure bugs and serious stability issues).
For this reason name decoration is an important aspect of any C++-related ABI
.
. There are, however, cases where an analogous transformation and qualification of names is necessary.
will produce three .class files:
All of these class names are valid (as $ symbols are permitted in the JVM specification) and these names are "safe" for the compiler to generate, as the Java language definition prohibits $ symbols in normal java class definitions.
Name resolution in Java is further complicated at runtime, as fully qualified class names
are unique only inside a specific classloader instance. Classloaders are ordered hierarchically and each Thread in the JVM has a so called context class loader, so in cases where two different classloader instances contain classes with the same name, the system first tries to load the class using the root (or system) classloader and then goes down the hierarchy to the context class loader.
programmer can explicitly designate that the name of an attribute within a class body should be mangled by using a name with two leading underscores and not more than one trailing underscore. For example,
On encountering name mangled attributes, Python transforms these names by a single underscore and the name of the enclosing class, for example:
will output:
['_Test__mangled_name',
'__doc__',
'__module__',
'normal_name']
+ method name: argument name1:parameter1 ...
– method name: argument name1:parameter1 ...
Class methods are signified by +, instance methods use -. A typical class method declaration may then look like:
+ (id) initWithX: (int) number andY: (int) number;
+ (id) new;
with instance methods looking like
– (id) value;
– (id) setValue: (id) new_value;
Each of these method declarations have a specific internal representation. When compiled, each method is named according to the following scheme for class methods:
_c_Class_methodname_name1_name2_ ...
and this for instance methods:
_i_Class_methodname_name1_name2_ ...
The colons in the Objective-C syntax are translated to underscores. So, the Objective-C class method + (id) initWithX: (int) number andY: (int) number;, if belonging to the Point class would translate as _c_Point_initWithX_andY_, and the instance method (belonging to the same class) - (id) value; would translate to _i_Point_value.
Each of the methods of a class are labeled in this way. However, in order to look up a method that a class may respond to would be tedious if all methods are represented in this fashion. Each of the methods is assigned a unique symbol (such as an integer). Such a symbol is known as a selector. In Objective-C, one can manage selectors directly — they have a specific type in Objective-C — SEL.
During compilation, a table is built that maps the textual representation (such as _i_Point_value) to selectors (which are given a type SEL). Managing selectors is more efficient than manipulating the textual representation of a method. Note that a selector only matches a method's name, not the class it belongs to — different classes can have different implementations of a method with the same name. Because of this, implementations of a method are given a specific identifier too — these are known as implementation pointers, and are given a type also, IMP.
Message sends are encoded by the compiler as calls to the id objc_msgSend (id receiver, SEL selector, ...) function, or one of its cousins, where receiver is the receiver of the message, and SEL determines the method to call. Each class has its own table that maps selectors to their implementations — the implementation pointer specifies where in memory the actual implementation of the method resides. There are separate tables for class and instance methods. Apart from being stored in the SEL to IMP lookup tables, the functions are essentially anonymous.
The SEL value for a selector does not vary between classes. This enables polymorphism.
The Objective-C runtime maintains information about the argument and return types of methods. However, this information is not part of the name of the method, and can vary from class to class.
Since Objective-C does not support namespaces, there is no need for mangling of class names (that do appear as symbols in generated binaries).
compilers, originally because the language is case insensitive. Further mangling requirements were imposed later in the evolution of the language because of the addition of modules and other features in the Fortran 90 standard. The case mangling, especially, is a common issue that must be dealt with in order to call Fortran libraries (such as LAPACK
) from other languages (such as C
).
Because of the case insensitivity, the name of a subroutine or function "FOO" must be converted to a canonical case and format by the Fortran compiler so that it will be linked in the same way regardless of case. Different compilers have implemented this in various ways, and no standardization has occurred. The AIX and HP-UX
Fortran compilers convert all identifiers to lower case ("foo"), while the Cray
Unicos
Fortran compilers converted identifiers
all upper case ("FOO"). The GNU
g77
compiler converts identifiers to lower case plus an underscore ("foo_"), except that identifiers already containing an underscore ("FOO_BAR") have two underscores appended ("foo_bar__"), following a convention established by f2c
. Many other compilers, including SGI
's IRIX
compilers, gfortran
, and Intel's Fortran compiler, convert all identifiers to lower case plus an underscore ("foo_" and "foo_bar_").
Identifiers in Fortran 90 modules must be further mangled, because the same subroutine name may apply to different routines in different modules.
Compiler construction
Compiler construction is an area of computer science that deals with the theory and practice of developing programming languages and their associated compilers....
, name mangling (also called name decoration) is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s.
It provides a way of encoding additional information in the name of a function, structure
Structure
Structure is a fundamental, tangible or intangible notion referring to the recognition, observation, nature, and permanence of patterns and relationships of entities. This notion may itself be an object, such as a built structure, or an attribute, such as the structure of society...
, class
Class (computer science)
In object-oriented programming, a class is a construct that is used as a blueprint to create instances of itself – referred to as class instances, class objects, instance objects or simply objects. A class defines constituent members which enable these class instances to have state and behavior...
or another datatype in order to pass more semantic information from the compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
s to linkers.
The need arises where the language allows different entities to be named with the same identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...
as long as they occupy a different namespace
Namespace (computer science)
A namespace is an abstract container or environment created to hold a logical grouping of unique identifiers or symbols . An identifier defined in a namespace is associated only with that namespace. The same identifier can be independently defined in multiple namespaces...
(where a namespace is typically defined by a module, class, or explicit namespace directive).
Any object code
Object code
Object code, or sometimes object module, is what a computer compiler produces. In a general sense object code is a sequence of statements in a computer language, usually a machine code language....
produced by compilers is usually linked with other pieces of object code (produced by the same or another compiler) by a type of program called a linker. The linker needs a great deal of information on each program entity. For example, to correctly link a function it needs its name, the number of arguments and their types, and so on.
C name decoration in Microsoft Windows
Although name mangling is not generally required or used by languages that do not support function overloading (such as C and classic Pascal), they use it in some cases to provide additional information about a function.For example, compilers targeted at Microsoft Windows platforms support a variety of calling convention
Calling convention
In computer science, a calling convention is a scheme for how subroutines receive parameters from their caller and how they return a result; calling conventions can differ in:...
s, which determine the manner in which parameters are sent to subroutines and results returned. Because the different calling conventions are not compatible with one another, compilers mangle symbols with codes detailing which convention should be used.
The mangling scheme was established by Microsoft, and has been informally followed by other compilers including Digital Mars, Borland, and GNU gcc, when compiling code for the Windows platforms. The scheme even applies to other languages, such as Pascal, D, Delphi, Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
, and C#. This allows subroutines written in those languages to call, or be called by, existing Windows libraries using a calling convention different from their default.
When compiling the following C examples:
32 bit compilers emit, respectively:
_f
_g@4
@h@4
In the stdcall and fastcall mangling schemes, the function is encoded as _name@X and @name@X respectively, where X is the number of bytes, in decimal, of the argument(s) in the parameter list (including those passed in registers, for fastcall). In the case of cdecl, the function name is merely prefixed by an underscore.
Note that the 64-bit convention on Windows (Microsoft C) is no leading underscore. This difference may in some rare cases lead to unresolved externals when porting such code to 64 bits. For example, Fortran code can use 'alias' to link against a C method by name as follows:
This will compile and link fine under 32 bits, but generate an unresolved external '_f' under 64 bits. One work around for this is to not use 'alias' at all (in which the method names typically need to be capitalized in C and Fortran), or to use the BIND option:
Name mangling in C++
C++C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
compilers are the most widespread and yet least standard users of name mangling. The first C++ compilers were implemented as translators to C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers which produced machine code or assembly directly, the system's linker generally did not support C++ symbols, and mangling was still required.
The C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
language does not define a standard decoration scheme, so each compiler uses its own. C++ also has complex language features, such as classes, templates, namespaces, and operator overloading
Operator overloading
In object oriented computer programming, operator overloading—less commonly known as operator ad-hoc polymorphism—is a specific case of polymorphism, where different operators have different implementations depending on their arguments...
, that alter the meaning of specific symbols based on context or usage. Meta-data about these features can be disambiguated by mangling (decorating) the name of a symbol
Debug symbol
A debug symbol is information that expresses which programming-language constructs generated a specific piece of machine code in a given executable module. Sometimes the symbolic information is compiled together with the module's binary file, or distributed in separate file, or simply discarded...
. Because the name-mangling systems for such features are not standardized across compilers, few linkers can link object code that was produced by different compilers.
Simple example
Consider the following two definitions of f in a C++ program:These are distinct functions, with no relation to each other apart from the name. If they were natively translated into C with no changes, the result would be an error — C does not permit two functions with the same name. The C++ compiler therefore will encode the type information in the symbol name, the result being something resembling:
Notice that g is mangled even though there is no conflict; name mangling applies to all symbols.
Complex example
For a more complex example, we'll consider an example of a real-world name mangling implementation: that used by GNU GCC 3.x, and how it mangles the following example class. The mangled symbol is shown below the respective identifier name.All mangled symbols begin with _Z (note that an underscore followed by a capital is a reserved identifier
Reserved identifier
A reserved identifier is an element of computer programming languages that is a fixed part of the language or operating system and may not be redefined by the programmer....
in C and C++, so conflict with user identifiers is avoided); for nested names (including both namespaces and classes), this is followed by N, then a series of <length, id> pairs (the length being the length of the next identifier), and finally E. For example, wikipedia::article::format becomes
_ZN·9wikipedia·7article·6format·E
For functions, this is then followed by the type information; as format is a void function, this is simply v; hence:
_ZN·9wikipedia·7article·6format·E·v
For print_to, a standard type std::ostream (or more properly std::basic_ostream<char, char_traits<char> >) is used, which has the special alias So; a reference to this type is therefore RSo, with the complete name for the function being:
_ZN·9wikipedia·7article·8print_to·E·RSo
How different compilers mangle the same functions
There isn't a standard scheme by which even trivial C++ identifiers are mangled, and consequently different compiler vendors (or even different versions of the same compiler, or the same compiler on different platforms) mangle public symbols in radically different (and thus totally incompatible) ways. Consider how different C++ compilers mangle the same functions:!void h(int)
!void h(int, char)
!void h(void)
|-
|Intel C++ 8.0 for Linux
|_Z1hi
|_Z1hic
|_Z1hv
|-
|HP aC++ A.05.55 IA-64
|_Z1hi
|_Z1hic
|_Z1hv
|-
|GCC 3.x and 4.x
|_Z1hi
|_Z1hic
|_Z1hv
|-
|GCC
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...
2.9x
|h__Fi
|h__Fic
|h__Fv
|-
|HP aC++ A.03.45 PA-RISC
|h__Fi
|h__Fic
|h__Fv
|-
|Microsoft VC++ v6/v7
Microsoft Visual C++ Name Mangling
Microsoft Visual C++ Name Mangling is a mangling scheme used in Microsoft Visual C++ series of compilers. It provides a way of encoding name and additional information about a function, structure, class or another datatype in order to pass more semantic information from the Microsoft Visual C++...
|?h@@YAXH@Z
|?h@@YAXHD@Z
|?h@@YAXXZ
|-
|Digital Mars
Digital Mars
Digital Mars is a small American software company owned by Walter Bright that makes C and C++ compilers for Windows and DOS. They also distribute the compilers for free on their web site....
C++
|?h@@YAXH@Z
|?h@@YAXHD@Z
|?h@@YAXXZ
|-
|Borland C++ v3.1
|@h$qi
|@h$qizc
|@h$qv
|-
|OpenVMS C++ V6.5 (ARM mode)
|H__XI
|H__XIC
|H__XV
|-
|OpenVMS C++ V6.5 (ANSI mode)
|CXX$__7H__FI0ARG51T
|CXX$__7H__FIC26CDH77
|CXX$__7H__FV2CB06E8
|-
|OpenVMS C++ X7.1 IA-64
|CXX$_Z1HI2DSQ26A
|CXX$_Z1HIC2NP3LI4
|CXX$_Z1HV0BCA19V
|-
|SunPro CC
|__1cBh6Fi_v_
|__1cBh6Fic_v_
|__1cBh6F_v_
|-
|Tru64 C++ V6.5 (ARM mode)
|h__Xi
|h__Xic
|h__Xv
|-
|Tru64 C++ V6.5 (ANSI mode)
|__7h__Fi
|__7h__Fic
|__7h__Fv
|-
|Watcom C++ 10.6
|W?h$n(i)v
|W?h$n(ia)v
|W?h$nv
|}>
Notes:
- The Compaq C++ compiler on OpenVMS VAX and Alpha (but not IA-64) and Tru64 has two name mangling schemes. The original, pre-standard scheme is known as ARM model, and is based on the name mangling described in the C++ Annotated Reference Manual (ARM). With the advent of new features in standard C++, particularly templatesTemplate (programming)Templates are a feature of the C++ programming language that allow functions and classes to operate with generic types. This allows a function or class to work on many different data types without being rewritten for each one....
, the ARM scheme became more and more unsuitable — it could not encode certain function types, or produced identical mangled names for different functions. It was therefore replaced by the newer "ANSI" model, which supported all ANSI template features, but was not backwards compatible. - On IA-64, a standard ABIApplication binary interfaceIn computer software, an application binary interface describes the low-level interface between an application program and the operating system or another application.- Description :...
exists (see external links), which defines (among other things) a standard name-mangling scheme, and which is used by all the IA-64 compilers. GNU GCC 3.x, in addition, has adopted the name mangling scheme defined in this standard for use on other, non-Intel platforms.
Handling of C symbols when linking from C++
The job of the common C++ idiom:is to ensure that the symbols following are "unmangled" – that the compiler emits a binary file with their names undecorated, as a C compiler would do. As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.
For example, the standard strings library, <string.h> usually contains something resembling:
Thus, code such as:
uses the correct, unmangled strcmp and memset. If the extern had not been used, the (SunPro) C++ compiler would produce code equivalent to:
Since those symbols do not exist in the C runtime library (e.g. libc), link errors would result.
Standardised name mangling in C++
While it is a relatively common belief that standardised name mangling in the C++ language would lead to greater interoperability between implementations, this is not really the case. Name mangling is only one of several application binary interfaceApplication binary interface
In computer software, an application binary interface describes the low-level interface between an application program and the operating system or another application.- Description :...
issues in a C++ implementation. Other ABI issues like exception handling
Exception handling
Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution....
, virtual table layout, structure padding
Padding
Padding is thin cushioned material sometimes added to clothes. It is often done in an attempt to soften impacts on certain zones of the body or enhance appearance by 'improving' a physical feature, often a sexually significant one...
, etc. cause differing C++ implementations to be incompatible. Further, requiring a particular form of mangling would cause issues for systems where implementation limits (e.g. length of symbols) dictate a particular mangling scheme. A standardised requirement for name mangling would also prevent an implementation where mangling was not required at all — for example, a linker which understood the C++ language.
The C++ standard therefore does not attempt to standardise name mangling. On the contrary, the Annotated C++ Reference Manual (also known as ARM, ISBN 0-201-51459-1, section 7.2.1c) actively encourages the use of different mangling schemes to prevent linking when other aspects of the ABI, such as exception handling
Exception handling
Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution....
and virtual table layout, are incompatible.
Real-world effects of C++ name mangling
Because C++ symbols are routinely exported from DLLDynamic-link library
Dynamic-link library , or DLL, is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems...
and shared object files, the name mangling scheme is not merely a compiler-internal matter. Different compilers (or different versions of the same compiler, in many cases) produce such binaries under different name decoration schemes, meaning that symbols are frequently unresolved if the compilers used to create the library and the program using it employed different schemes. For example, if a system with multiple C++ compilers installed (e.g. GNU GCC and the OS vendor's compiler) wished to install the Boost C++ Libraries, it would have to be compiled twice — once for the vendor compiler and once for GCC.
It is good for safety purposes that compilers producing incompatible object codes (codes based on different ABIs, regarding e.g. classes and exceptions) use different name mangling schemes. This guarantees that these incompatibilities are detected at the linking phase, not when executing the software (which could lead to obscure bugs and serious stability issues).
For this reason name decoration is an important aspect of any C++-related ABI
Application binary interface
In computer software, an application binary interface describes the low-level interface between an application program and the operating system or another application.- Description :...
.
Name mangling in Java
The language, compiler, and .class file format were all designed together (and had object-orientation in mind from the start), so the primary problem solved by name mangling doesn't exist in implementations of the Java runtimeJava (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
. There are, however, cases where an analogous transformation and qualification of names is necessary.
Creating unique names for inner and anonymous classes
The scope of anonymous classes is confined to their parent class, so the compiler must produce a "qualified" public name for the inner class, to avoid conflict where other classes (inner or not) exist in the same namespace. Similarly, anonymous classes must have "fake" public names generated for them (as the concept of anonymous classes exists only in the compiler, not the runtime). So, compiling the following java programwill produce three .class files:
- foo.class, containing the main (outer) class foo
- foo$bar.class, containing the named inner class foo.bar
- foo$1.class, containing the anonymous inner class (local to method foo.zark)
All of these class names are valid (as $ symbols are permitted in the JVM specification) and these names are "safe" for the compiler to generate, as the Java language definition prohibits $ symbols in normal java class definitions.
Name resolution in Java is further complicated at runtime, as fully qualified class names
Fully qualified name
In computer programming, a fully qualified name is an unambiguous name that specifies which object, function, or variable a call refers to without regard to the context of the call...
are unique only inside a specific classloader instance. Classloaders are ordered hierarchically and each Thread in the JVM has a so called context class loader, so in cases where two different classloader instances contain classes with the same name, the system first tries to load the class using the root (or system) classloader and then goes down the hierarchy to the context class loader.
Java Native Interface
Java's native method support allows java language programs to call out to programs written in another language (generally either C or C++). There are two name-resolution concerns here, neither of which is implemented in a particularly standard manner:- Java to native name translation
- normal C++ name mangling
Name mangling in Python
A PythonPython (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
programmer can explicitly designate that the name of an attribute within a class body should be mangled by using a name with two leading underscores and not more than one trailing underscore. For example,
__thing
will be mangled, as will ___thing
and __thing_
, but __thing__
and __thing___
will not.On encountering name mangled attributes, Python transforms these names by a single underscore and the name of the enclosing class, for example:
will output:
['_Test__mangled_name',
'__doc__',
'__module__',
'normal_name']
Name mangling in Borland's Turbo Pascal / Delphi range
To avoid name mangling in Pascal, use:Name mangling in Free Pascal
Free Pascal supports function and operator overloading, thus it also uses name mangling to support these features. On the other hand, Free Pascal is capable of calling symbols defined in external modules created with another language and exporting its own symbols to be called by another language. For further information, consult Chapter 6.2 and Chapter 7.1 of Free Pascal Programmer's Guide.Name mangling in Objective-C
Essentially two forms of method exist in Objective-C, the class ("static") method, and the instance method. A method declaration in Objective-C is of the following form+ method name: argument name1:parameter1 ...
– method name: argument name1:parameter1 ...
Class methods are signified by +, instance methods use -. A typical class method declaration may then look like:
+ (id) initWithX: (int) number andY: (int) number;
+ (id) new;
with instance methods looking like
– (id) value;
– (id) setValue: (id) new_value;
Each of these method declarations have a specific internal representation. When compiled, each method is named according to the following scheme for class methods:
_c_Class_methodname_name1_name2_ ...
and this for instance methods:
_i_Class_methodname_name1_name2_ ...
The colons in the Objective-C syntax are translated to underscores. So, the Objective-C class method + (id) initWithX: (int) number andY: (int) number;, if belonging to the Point class would translate as _c_Point_initWithX_andY_, and the instance method (belonging to the same class) - (id) value; would translate to _i_Point_value.
Each of the methods of a class are labeled in this way. However, in order to look up a method that a class may respond to would be tedious if all methods are represented in this fashion. Each of the methods is assigned a unique symbol (such as an integer). Such a symbol is known as a selector. In Objective-C, one can manage selectors directly — they have a specific type in Objective-C — SEL.
During compilation, a table is built that maps the textual representation (such as _i_Point_value) to selectors (which are given a type SEL). Managing selectors is more efficient than manipulating the textual representation of a method. Note that a selector only matches a method's name, not the class it belongs to — different classes can have different implementations of a method with the same name. Because of this, implementations of a method are given a specific identifier too — these are known as implementation pointers, and are given a type also, IMP.
Message sends are encoded by the compiler as calls to the id objc_msgSend (id receiver, SEL selector, ...) function, or one of its cousins, where receiver is the receiver of the message, and SEL determines the method to call. Each class has its own table that maps selectors to their implementations — the implementation pointer specifies where in memory the actual implementation of the method resides. There are separate tables for class and instance methods. Apart from being stored in the SEL to IMP lookup tables, the functions are essentially anonymous.
The SEL value for a selector does not vary between classes. This enables polymorphism.
The Objective-C runtime maintains information about the argument and return types of methods. However, this information is not part of the name of the method, and can vary from class to class.
Since Objective-C does not support namespaces, there is no need for mangling of class names (that do appear as symbols in generated binaries).
Name mangling in Fortran
Name mangling is also necessary in FortranFortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
compilers, originally because the language is case insensitive. Further mangling requirements were imposed later in the evolution of the language because of the addition of modules and other features in the Fortran 90 standard. The case mangling, especially, is a common issue that must be dealt with in order to call Fortran libraries (such as LAPACK
LAPACK
-External links:* : a modern replacement for PLAPACK and ScaLAPACK* on Netlib.org* * * : a modern replacement for LAPACK that is MultiGPU ready* on Sourceforge.net* * optimized LAPACK for Solaris OS on SPARC/x86/x64 and Linux* * *...
) from other languages (such as C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
).
Because of the case insensitivity, the name of a subroutine or function "FOO" must be converted to a canonical case and format by the Fortran compiler so that it will be linked in the same way regardless of case. Different compilers have implemented this in various ways, and no standardization has occurred. The AIX and HP-UX
HP-UX
HP-UX is Hewlett-Packard's proprietary implementation of the Unix operating system, based on UNIX System V and first released in 1984...
Fortran compilers convert all identifiers to lower case ("foo"), while the Cray
Cray
Cray Inc. is an American supercomputer manufacturer based in Seattle, Washington. The company's predecessor, Cray Research, Inc. , was founded in 1972 by computer designer Seymour Cray. Seymour Cray went on to form the spin-off Cray Computer Corporation , in 1989, which went bankrupt in 1995,...
Unicos
Unicos
UNICOS is the name of a range of Unix-like operating system variants developed by Cray for its supercomputers. UNICOS is the successor of the Cray Operating System . It provides network clustering and source code compatibility layers for some other Unixes. UNICOS was originally introduced in 1985...
Fortran compilers converted identifiers
all upper case ("FOO"). The GNU
GNU
GNU is a Unix-like computer operating system developed by the GNU project, ultimately aiming to be a "complete Unix-compatible software system"...
g77
GNU Compiler Collection
The GNU Compiler Collection is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain...
compiler converts identifiers to lower case plus an underscore ("foo_"), except that identifiers already containing an underscore ("FOO_BAR") have two underscores appended ("foo_bar__"), following a convention established by f2c
F2c
f2c is the name of a program to convert Fortran 77 to C code, developed at Bell Laboratories. The standalone f2c program was based on the core of the first complete Fortran 77 compiler to be implemented, the "f77" program by Feldman and Weinberger...
. Many other compilers, including SGI
Silicon Graphics
Silicon Graphics, Inc. was a manufacturer of high-performance computing solutions, including computer hardware and software, founded in 1981 by Jim Clark...
's IRIX
IRIX
IRIX is a computer operating system developed by Silicon Graphics, Inc. to run natively on their 32- and 64-bit MIPS architecture workstations and servers. It was based on UNIX System V with BSD extensions. IRIX was the first operating system to include the XFS file system.The last major version...
compilers, gfortran
GFortran
gfortran is the name of the GNU Fortran compiler, which is part of the GNU Compiler Collection . gfortran has replaced the g77 compiler, which stopped development before GCC version 4.0. It includes support for the Fortran 95 language and is compatible with most language extensions supported by...
, and Intel's Fortran compiler, convert all identifiers to lower case plus an underscore ("foo_" and "foo_bar_").
Identifiers in Fortran 90 modules must be further mangled, because the same subroutine name may apply to different routines in different modules.
See also
- Language bindingLanguage bindingIn computing, a binding from a programming language to a library or OS service is an API providing that service in the language.Many software libraries are written in systems programming languages such as C or C++...
- Foreign function interfaceForeign function interfaceA foreign function interface is a mechanism by which a program written in one programming language can call routines or make use of services written in another. The term comes from the specification for Common Lisp, which explicitly refers to the language features for inter-language calls as...
- Calling conventionCalling conventionIn computer science, a calling convention is a scheme for how subroutines receive parameters from their caller and how they return a result; calling conventions can differ in:...
- Application programming interfaceApplication programming interfaceAn application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
– API - Application Binary InterfaceApplication binary interfaceIn computer software, an application binary interface describes the low-level interface between an application program and the operating system or another application.- Description :...
– ABI - Comparison of application virtual machinesComparison of Application Virtual MachinesThis article lists some software virtual machines that are typically used for allowing application bytecode to be portably run on many different computer architectures and operating systems. The application is usually run on the computer using an interpreter or just-in-time compilation...
- Java Native InterfaceJava Native InterfaceThe Java Native Interface is a programming framework that enables Java code running in a Java Virtual Machine to call and to be called by native applications and libraries written in other languages such as C, C++ and assembly.-Purpose and features:JNI enables one to write native methods to...
- SWIGSWIGSWIG is an open source software tool used to connect computer programs or libraries written in C or C++ with scripting languages such as Lua, Perl, PHP, Python, R, Ruby, Tcl, and other languages like C#, Java, Modula-3, Objective Caml, Octave, and Scheme...
– opensource interfaces bindings generator from many languages to many languages - Microsoft Visual C++ Name ManglingMicrosoft Visual C++ Name ManglingMicrosoft Visual C++ Name Mangling is a mangling scheme used in Microsoft Visual C++ series of compilers. It provides a way of encoding name and additional information about a function, structure, class or another datatype in order to pass more semantic information from the Microsoft Visual C++...
External links
- Linux Itanium ABI for C++, including name mangling scheme.
- c++filt — filter to demangle encoded C++ symbols
- undname — msvc tool to demangle names.
- The Objective-C Runtime System — From Apple's The Objective-C Programming Language 1.0
- C++ Name Mangling/Demangling Quite detailed explanation of Visual C++ compiler name mangling scheme
- PHP UnDecorateSymbolName a php script that demangles Microsoft Visual C's function names.
- Calling conventions for different C++ compilers contains detailed description of name mangling schemes for various x86 C++ compilers
- Macintosh C/C++ ABI Standard Specification
- Mixing C and C++ Code
- Symbol management – 'Linkers and Loaders' by John R. Levine