Underscore
Encyclopedia
The underscore [ _ ] (also called understrike, low line, or low dash) is a character that originally appeared on the typewriter
and was primarily used to underline
words. To produce an underlined word, the word was typed, the typewriter carriage was moved back to the beginning of the word, and the word was overtyped with the underscore character.
This character is sometimes used to create visual spacing within a sequence of characters, where a white space
character is not permitted, e.g., in computer
filename
s, e-mail address
es, and in World Wide Web
URL
s. Some computer applications will automatically underline text surrounded by underscores: _underlined_ will render underlined. It is often used in ASCII
-only media (E-mail
, IRC, Instant Messaging
) for this purpose. When the underscore is used for emphasis in this fashion, it is usually interpreted as indicating that the enclosed text is underlined or italicized (as opposed to bold, which is indicated by *asterisks*).
The underscore is not the same character as the dash character, although one convention for text news wires is to use an underscore when an em-dash or en-dash is desired, or when other non-standard characters such as bullet
s would be appropriate. A series of underscores (like [ _________ ]) may be used to create a blank to be filled in on a form. It is also sometimes used to create a horizontal line, if no other method is available; hyphen
s and dash
es are often used for a similar purpose.
The ASCII value of this character is 95. On the standard US or UK 101/102 computer keyboard
it shares a key with the hyphen
on the top row, to the right of the 0
key.
mark, "combining low line", in some African languages
(some languages using the Orthography of Gabon languages or Rapidolangue in Gabon, Izere
in Nigeria) and Native American languages (Shoshoni).
Not to be confused is the combining macron below
.
of any significant size, there is a need for descriptive (hence multi-word) identifier
s, like "previous balance" or "end of file". However, spaces are not typically permitted inside identifiers, as they are treated as delimiters between tokens. Writing the words together as in "endoffile" is not satisfactory because the names often become unreadable. Therefore, the programming language COBOL
allowed a hyphen
("-") to be used between words of compound identifiers, as in "END-OF-FILE". LISP
also allowed the hyphen in names, treating the subtraction operator as an identifier.
Most programming languages, however, interpret the hyphen as a subtraction operator and do not allow the character in identifier names. The common punched card
character sets of the early 1960s had no lower-case letters and no special character that would be adequate as a word separator in identifiers. IBM's EBCDIC
character coding system, introduced in 1964 at the same time as the IBM System/360 computer series, uses 8 bits per byte. A modest increase in the character set size over earlier character sets added a few punctuation characters, including the underscore, which IBM referred to as the break character, but not lower case (later editions of EBCDIC added lower case). IBM's report on NPL (the early name of what is now called PL/I
) leaves the character set undefined, but specifically mentions the break character, and gives RATE_OF_PAY as an example identifier. By 1967, the underscore had spread to ASCII. C
, developed at Bell Labs in the early 1970s, allowed the underscore as an alphabetic character.
and Perl
use
described as the “default input and pattern matching space” — any output defaults to that variable, and may be omitted. In Perl,
In some languages with pattern matching
, such as Standard ML
, OCaml, and Haskell
, the pattern
.
Typewriter
A typewriter is a mechanical or electromechanical device with keys that, when pressed, cause characters to be printed on a medium, usually paper. Typically one character is printed per keypress, and the machine prints the characters by making ink impressions of type elements similar to the pieces...
and was primarily used to underline
Underline
An underline, also called an underscore, is one or more horizontal lines immediately below a portion of writing. Single, and occasionally double , underlining was originally used in hand-written or typewritten documents to emphasise text...
words. To produce an underlined word, the word was typed, the typewriter carriage was moved back to the beginning of the word, and the word was overtyped with the underscore character.
This character is sometimes used to create visual spacing within a sequence of characters, where a white space
White space
White space may refer to:* White space , portions of a page left unmarked** Space , the space between two words of text* Whitespace character, a computer character for the space between words...
character is not permitted, e.g., in computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...
filename
Filename
The filename is metadata about a file; a string used to uniquely identify a file stored on the file system. Different file systems impose different restrictions on length and allowed characters on filenames.A filename includes one or more of these components:...
s, e-mail address
E-mail address
An email address identifies an email box to which email messages are delivered. An example format of an email address is lewis@example.net which is read as lewis at example dot net...
es, and in World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...
URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
s. Some computer applications will automatically underline text surrounded by underscores: _underlined_ will render underlined. It is often used in ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
-only media (E-mail
E-mail
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...
, IRC, Instant Messaging
Instant messaging
Instant Messaging is a form of real-time direct text-based chatting communication in push mode between two or more people using personal computers or other devices, along with shared clients. The user's text is conveyed over a network, such as the Internet...
) for this purpose. When the underscore is used for emphasis in this fashion, it is usually interpreted as indicating that the enclosed text is underlined or italicized (as opposed to bold, which is indicated by *asterisks*).
The underscore is not the same character as the dash character, although one convention for text news wires is to use an underscore when an em-dash or en-dash is desired, or when other non-standard characters such as bullet
Bullet (typography)
In typography, a bullet is a typographical symbol or glyph used to introduce items in a list. For example:*Item 1*Item 2*Item 3...
s would be appropriate. A series of underscores (like [ _________ ]) may be used to create a blank to be filled in on a form. It is also sometimes used to create a horizontal line, if no other method is available; hyphen
Hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. The hyphen should not be confused with dashes , which are longer and have different uses, or with the minus sign which is also longer...
s and dash
Dash
A dash is one of several kinds of punctuation mark. Dashes appear similar to hyphens, but differ from them primarily in length, and serve different functions. The most common versions of the dash are the en dash and the em dash .-Common dashes:...
es are often used for a similar purpose.
The ASCII value of this character is 95. On the standard US or UK 101/102 computer keyboard
Computer keyboard
In computing, a keyboard is a typewriter-style keyboard, which uses an arrangement of buttons or keys, to act as mechanical levers or electronic switches...
it shares a key with the hyphen
Hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. The hyphen should not be confused with dashes , which are longer and have different uses, or with the minus sign which is also longer...
on the top row, to the right of the 0
0 (number)
0 is both a numberand the numerical digit used to represent that number in numerals.It fulfills a central role in mathematics as the additive identity of the integers, real numbers, and many other algebraic structures. As a digit, 0 is used as a placeholder in place value systems...
key.
Diacritic
The underscore is used as a diacriticDiacritic
A diacritic is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός . Diacritic is both an adjective and a noun, whereas diacritical is only an adjective. Some diacritical marks, such as the acute and grave are often called accents...
mark, "combining low line", in some African languages
African languages
There are over 2100 and by some counts over 3000 languages spoken natively in Africa in several major language families:*Afro-Asiatic spread throughout the Middle East, North Africa, the Horn of Africa, and parts of the Sahel...
(some languages using the Orthography of Gabon languages or Rapidolangue in Gabon, Izere
Izere language
Izere is a dialect cluster of Plateau languages in Nigeria. According to Blench , it is four languages, though Ethnologue does not distinguish NW and NE Izere. The Cen and Ganang varieties are spoken by only a couple thousand each. Cen has added Berom noun-class prefixes and consonant alternation...
in Nigeria) and Native American languages (Shoshoni).
Not to be confused is the combining macron below
Combining macron below
Macron below, , is a combining diacritical mark used in various orthographies.It is not to be confused are "combining minus below" [[̠|]] , "combining low line" and "low line" _ ). The difference between "macron below" and "low line" is that the latter will result in an unbroken underline when...
.
Origins in identifiers
In programsComputer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
of any significant size, there is a need for descriptive (hence multi-word) identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...
s, like "previous balance" or "end of file". However, spaces are not typically permitted inside identifiers, as they are treated as delimiters between tokens. Writing the words together as in "endoffile" is not satisfactory because the names often become unreadable. Therefore, the programming language COBOL
COBOL
COBOL is one of the oldest programming languages. Its name is an acronym for COmmon Business-Oriented Language, defining its primary domain in business, finance, and administrative systems for companies and governments....
allowed a hyphen
Hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. The hyphen should not be confused with dashes , which are longer and have different uses, or with the minus sign which is also longer...
("-") to be used between words of compound identifiers, as in "END-OF-FILE". LISP
Lisp
A lisp is a speech impediment, historically also known as sigmatism. Stereotypically, people with a lisp are unable to pronounce sibilants , and replace them with interdentals , though there are actually several kinds of lisp...
also allowed the hyphen in names, treating the subtraction operator as an identifier.
Most programming languages, however, interpret the hyphen as a subtraction operator and do not allow the character in identifier names. The common punched card
Punched card
A punched card, punch card, IBM card, or Hollerith card is a piece of stiff paper that contains digital information represented by the presence or absence of holes in predefined positions...
character sets of the early 1960s had no lower-case letters and no special character that would be adequate as a word separator in identifiers. IBM's EBCDIC
EBCDIC
Extended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....
character coding system, introduced in 1964 at the same time as the IBM System/360 computer series, uses 8 bits per byte. A modest increase in the character set size over earlier character sets added a few punctuation characters, including the underscore, which IBM referred to as the break character, but not lower case (later editions of EBCDIC added lower case). IBM's report on NPL (the early name of what is now called PL/I
PL/I
PL/I is a procedural, imperative computer programming language designed for scientific, engineering, business and systems programming applications...
) leaves the character set undefined, but specifically mentions the break character, and gives RATE_OF_PAY as an example identifier. By 1967, the underscore had spread to ASCII. C
C
Ĉ or ĉ is a consonant in Esperanto orthography, representing the sound .Esperanto orthography uses a diacritic for all four of its postalveolar consonants, as do the Latin-based Slavic alphabets...
, developed at Bell Labs in the early 1970s, allowed the underscore as an alphabetic character.
Use in other languages
RubyRuby (programming language)
Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...
and Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
use
$_
as a special variableVariable (programming)
In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...
described as the “default input and pattern matching space” — any output defaults to that variable, and may be omitted. In Perl,
@_
is a special array variable that holds the arguments to a function.In some languages with pattern matching
Pattern matching
In computer science, pattern matching is the act of checking some sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact. The patterns generally have the form of either sequences or tree structures...
, such as Standard ML
Standard ML
Standard ML is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.SML is a modern descendant of the ML...
, OCaml, and Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
, the pattern
_
matches any value, but does not perform bindingData binding
Data binding is a general technique that binds two data/information sources together and maintains synchronization of data. This is usually done with two data/information sources with different types as in XML data binding. However, in UI data binding, data and information objects of the same type...
.