Globally Unique Identifier
Encyclopedia
A globally unique identifier (GUID, icon or ˈ) is a unique reference number used as an identifier in computer software
Computer software
Computer software, or just software, is a collection of computer programs and related data that provide the instructions for telling a computer what to do and how to do it....

. The term GUID also is used for Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

's implementation of the Universally unique identifier
Universally Unique Identifier
A universally unique identifier is an identifier standard used in software construction, standardized by the Open Software Foundation as part of the Distributed Computing Environment ....

 (UUID) standard.

The value of a GUID is represented as a 32-character hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 string, such as {21EC2020-3AEA-1069-A2DD-08002B30309D}, and is usually stored as a 128-bit integer. The total number of unique keys is 2128 or 3.4×1038. This number is so large that the probability of the same number being generated randomly twice is negligible.

Still, certain techniques have been developed to help ensure that GUID numbers are not duplicated (see Algorithm below).

Common uses

  • Microsoft Windows
    Microsoft Windows
    Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

     uses GUIDs internally to identify the classes and interfaces of COM
    Component Object Model
    Component Object Model is a binary-interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages...

     objects. A script can activate a specific class or object without having to know the name or location of the dynamic linked library that contains it.
  • Intel's GUID Partition Table
    GUID Partition Table
    In computer hardware, GUID Partition Table is a standard for the layout of the partition table on a physical hard disk. Although it forms a part of the Extensible Firmware Interface standard , it is also used on some BIOS systems because of the limitations of MBR partition tables, which restrict...

    , a system for partitioning
    Disk partitioning
    Disk partitioning is the act of dividing a hard disk drive into multiple logical storage units referred to as partitions, to treat one physical disk drive as if it were multiple disks. Partitions are also termed "slices" for operating systems based on BSD, Solaris or GNU Hurd...

     hard drives.
  • ActiveX
    ActiveX
    ActiveX is a framework for defining reusable software components in a programming language-independent way. Software applications can then be composed from one or more of these components in order to provide their functionality....

    , a system for downloading and installing controls in a web browser, uses GUIDs to uniquely identify each control.
  • Second Life
    Second Life
    Second Life is an online virtual world developed by Linden Lab. It was launched on June 23, 2003. A number of free client programs, or Viewers, enable Second Life users, called Residents, to interact with each other through avatars...

     uses GUIDs for identification of all assets in its world.

Basic structure

The GUID is a 16-byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

 (128-bit) number. The most commonly used structure of the data type is:
BitsBytesDescriptionEndianness
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

32 4 Data1 Native
16 2 Data2 Native
16 2 Data3 Native
64 8 Data4 Big

Data4 stores the bytes in the same order as displayed in the GUID text encoding (see below), but the other three fields are reversed on little-endian systems (for example Intel CPUs).

One to three of the most significant bits of the second byte in Data 4 define the type variant of the GUID:
PatternDescription
0xx Network Computing System
Network Computing System
The Network Computing System was an implementation of the Network Computing Architecture . It was created at Apollo Computer in the 1980s...

 backward compatibility
10x Standard
110 Microsoft Component Object Model
Component Object Model
Component Object Model is a binary-interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages...

 backward compatibility; this includes the GUIDs for important interfaces like IUnknown
IUnknown
In programming, the IUnknown interface is the fundamental interface in the Component Object Model . The published mandates that COM objects must minimally implement this interface...

 and IDispatch
IDispatch
IDispatch is the interface that exposes the OLE Automation protocol. It is one of the standard interfaces that can be exposed by COM objects. The I in IDispatch refers to interface...

111 Reserved for future use

The most significant four bits of Data3 define the version number, and the algorithm used.

Text encoding

A GUID is most commonly written in text as a sequence of hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 digits separated into five groups, such as:
3F2504E0-4F89-11D3-9A0C-0305E82C3301


This text notation contains the following fields, separated by hyphens:
Hex digitsDescription
8 Data1
4 Data2
4 Data3
4 Initial two bytes from Data4
12 Remaining six bytes from Data4

For the first three fields, the most significant digit is on the left. The last two fields are treated as eight separate bytes, each having their most significant digit on the left, and they follow each other from left to right. Note that the digit order of the fourth field may be unexpected, since it's treated differently than in the structure.

Often braces are added to enclose the above format, as such:
{3F2504E0-4F89-11D3-9A0C-0305E82C3301}


When printing fewer characters is desired, GUIDs are sometimes encoded into a base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

 or Ascii85
Ascii85
Ascii85 is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data , it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data...

 string. A base64-encoded GUID consists of 22 to 24 characters (depending on padding), for instance:
7QDBkvCA1+B9K/U0vrQx1A
7QDBkvCA1+B9K/U0vrQx1A


and Ascii85
Ascii85
Ascii85 is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data , it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data...

 encoding gives 20 characters, for example:
5:$Hj:Pf\4RLB9%kU\Lj


In Uniform Resource Name
Uniform Resource Name
A uniform resource name is a uniform resource identifier that uses the urn scheme and does not imply availability of the identified resource. Both URNs and URLs are URIs, and a particular URI may be a name and a locator at the same time.The functional requirements for uniform resource names are...

s (URN), the v1 GUIDs have namespace identifier "uuid", e.g.:
urn:uuid:3F2504E0-4F89-11D3-9A0C-0305E82C3301

Algorithm
In the OSF
Open Software Foundation
The Open Software Foundation was a not-for-profit organization founded in 1988 under the U.S. National Cooperative Research Act of 1984 to create an open standard for an implementation of the UNIX operating system.-History:...

-specified algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 for generating new (V1) GUIDs, the user's network card
Network card
A network interface controller is a computer hardware component that connects a computer to a computer network....

 MAC address
MAC address
A Media Access Control address is a unique identifier assigned to network interfaces for communications on the physical network segment. MAC addresses are used for numerous network technologies and most IEEE 802 network technologies, including Ethernet...

 is used as a base for the last group of GUID digits, which means, for example, that a document can be tracked back to the computer that created it. This privacy
Privacy
Privacy is the ability of an individual or group to seclude themselves or information about themselves and thereby reveal themselves selectively...

 hole was used when locating the creator of the Melissa virus. Most of the other digits are based on the time while generating the GUID.

V1 GUIDs which contain a MAC address and time can be identified by the digit "1" in the first position of the third group of digits, for example {2f1e4fc0-81fd-11da-9156-00036a0f876a}.

V4 GUIDs use the later algorithm, which is a pseudo-random number. These have a "4" in the same position, for example {38a52be4-9352-453e-af97-5c3b448652f0}. More specifically, the 'data3' bit pattern would be 0001xxxxxxxxxxxx in the first case, and 0100xxxxxxxxxxxx in the second. Cryptanalysis
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...

 of the WinAPI GUID generator shows that, since the sequence of V4 GUIDs is pseudo-random, given full knowledge of the internal state, it is possible to predict previous and subsequent values.
Sequential algorithms
GUIDs are commonly used as the primary key of database tables, and with that, often the table has a clustered index on that attribute. This presents a performance issue when inserting records because a fully random GUID means the record may need to be inserted anywhere within the table rather than merely appended near the end of it.

As a way of mitigating this issue while still providing enough randomness to effectively prevent duplicate number collisions, several algorithms have been used to generate sequential GUIDs.

The first technique, described by Jimmy Nilsson in August 2002 and referred to as a "COMB" ("combined guid/timestamp"), replaces the last 6 bytes of Data4 with the least-significant 6 bytes of the current system date/time. While this can result in GUIDs that are generated out of order within the same fraction of a second, his tests showed this had little real-world impact on insertion. One side effect of this approach is that the date and time of insertion can be easily extracted from the value later, if desired.

Starting with Microsoft SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

 version 2005, Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

 added a function to the Transact-SQL
Transact-SQL
Transact-SQL is Microsoft's and Sybase's proprietary extension to SQL. SQL, often expanded to Structured Query Language, is a standardized computer language that was originally developed by IBM for querying, altering and defining relational databases, using declarative statements...

 language called NEWSEQUENTIALID, which generates GUIDs that are guaranteed to increase in value, but may start with a lower number (still guaranteed unique) when the server restarts. This reduces the number of database table pages where insertions can occur, but does not guarantee that the values will always increase in value. The values returned by this function can be easily predicted, so this algorithm is not well-suited for generating obscure numbers for security or hashing purposes.

In 2006, a programmer found that the SYS_GUID function provided by Oracle was returning sequential GUIDs on some platforms, but this appears to be a bug rather than a feature.
Uses
In the Windows registry
Windows registry
The Windows Registry is a hierarchical database that stores configuration settings and options on Microsoft Windows operating systems. It contains settings for low-level operating system components as well as the applications running on the platform: the kernel, device drivers, services, SAM, user...

, in the key sequence "My Computer\HKEY_Classes_Root\CLSID", the DAO
Data Access Objects
Jet Data Access Objects is a deprecated general programming interface for database access on Microsoft Windows systems. It is unrelated to the data access object design pattern used in object-oriented software design.- History :...

 database management system identifies the particular version and type of accessing module of DAO to be used by a group of about a dozen GUIDs that begin with 5 zeros, a three-digit identifier for that particular version and type, with the remainder of the GUID ending in the same value in each case, 0000-0010-8000-00AA006D2EA4, so that the set of GUIDs used by this database system runs from {00000010-0000-0010-8000-00AA006D2EA4} through {00000109-0000-0010-8000-00AA006D2EA4} although not all GUIDs in that range are used.

In the Microsoft Component Object Model
Component Object Model
Component Object Model is a binary-interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages...

 (COM), GUIDs are used to uniquely distinguish different software component interfaces. This means that two (possibly incompatible) versions of a component can have exactly the same name but still be distinguishable by their GUIDs. For example, in the creation of components for Microsoft Windows using COM, all components must implement the IUnknown
IUnknown
In programming, the IUnknown interface is the fundamental interface in the Component Object Model . The published mandates that COM objects must minimally implement this interface...

 interface to allow client code to find all other interfaces
Interface (computer science)
In the field of computer science, an interface is a tool and concept that refers to a point of interaction between components, and is applicable at the level of both hardware and software...

 and features of that component, and they do this by creating a GUID which may be called upon to provide an entry point. The IUnknown interface is defined as a GUID with the value of {00000000-0000-0000-C000-000000000046}, and rather than having a named entry point called "IUnknown", the preceding GUID is used, thus every component that provides an IUnknown entry point gives the same GUID, and every program that looks for an IUnknown interface in a component always uses that GUID to find the entry point, knowing that an application using that particular GUID must always consistently implement IUnknown in the same manner and the same way.

GUIDs are also inserted into documents from Microsoft Office
Microsoft Office
Microsoft Office is a non-free commercial office suite of inter-related desktop applications, servers and services for the Microsoft Windows and Mac OS X operating systems, introduced by Microsoft in August 1, 1989. Initially a marketing term for a bundled set of applications, the first version of...

 programs. Even audio or video streams in the Advanced Systems Format
Advanced Systems Format
Advanced Systems Format is Microsoft's proprietary digital audio/digital video container format, especially meant for streaming media...

 (ASF) are identified by their GUIDs.

GUIDs representation can be little endian or big endian, so all APIs need to ensure that the correct data structure is used.
Subtypes
There are several flavors of GUIDs used in COM:
  • IID – interface identifier; (The ones that are registered on a system are stored in the Windows Registry at the key HKEY_CLASSES_ROOT\Interface)
  • CLSID – class identifier; (Stored in the registry at HKEY_CLASSES_ROOT\CLSID)
  • LIBID – type library identifier;
  • CATID – category identifier; (its presence on a class identifies it as belonging to certain class categories)


DCOM
Distributed component object model
Distributed Component Object Model is a proprietary Microsoft technology for communication among software components distributed across networked computers. DCOM, which originally was called "Network OLE", extends Microsoft's COM, and provides the communication substrate under Microsoft's COM+...

 introduces many additional GUID subtypes:
  • AppID – application identifier;
  • MID – machine identifier;
  • IPID – interface pointer identifier; (applicable to an interface engaged in RPC)
  • CID – causality identifier; (applicable to a RPC session)
  • OID – object identifier; (applicable to an object instance)
  • OXID – object exporter identifier; (applicable to an instance of the system object that performs RPC)
  • SETID – ping set identifier; (applicable to a group of objects)


These GUID subspaces may overlap, as the context of GUID usage defines its subtype. For example, there might be a class using the same GUID for its CLSID as another class is using for its IID — all without a problem. On the other hand, two classes using the same CLSID could not co-exist.
XML syndication formats
There is also a guid element in some versions of the RSS
RSS (file format)
RSS is a family of web feed formats used to publish frequently updated works—such as blog entries, news headlines, audio, and video—in a standardized format...

 specification, and a mandatory id element in Atom
Atom (standard)
The name Atom applies to a pair of related standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol is a simple HTTP-based protocol for creating and updating web resources.Web feeds allow software programs to check for updates published on a...

, which should contain a unique identifier for each individual article or weblog post. In RSS the contents of the GUID can be any text, and in practice is typically a copy of the article URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

. Atom's IDs need to be valid URI
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...

s (usually URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

s pointing to the entry, or URN
Uniform Resource Name
A uniform resource name is a uniform resource identifier that uses the urn scheme and does not imply availability of the identified resource. Both URNs and URLs are URIs, and a particular URI may be a name and a locator at the same time.The functional requirements for uniform resource names are...

s containing any other unique identifier).
See also

  • Security Identifier
    Security Identifier
    In the context of the Microsoft Windows NT line of operating systems, a Security Identifier is a unique name which is assigned by a Windows Domain controller during the log on process that is used to identify a subject, such as a user or a group of users in a network of NT/2000...

     (SID)
  • Universally unique identifier
    Universally Unique Identifier
    A universally unique identifier is an identifier standard used in software construction, standardized by the Open Software Foundation as part of the Distributed Computing Environment ....

     (UUID)
  • Object identifier
    Object identifier
    In computing, an object identifier or OID is an identifier used to name an object . Structurally, an OID consists of a node in a hierarchically-assigned namespace, formally defined using the ITU-T's ASN.1 standard. Successive numbers of the nodes, starting at the root of the tree, identify each...

     (OID)
  • Device fingerprint
    Device fingerprint
    A device fingerprint is a compact summary of software and hardware settings collected from a remote computing device....


External links
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK