Unicode subscripts and superscripts
Encyclopedia
Unicode
has subscripted and superscripted
versions of a number of characters including a full set of arabic numerals
. These characters allow any polynomial
, chemical
and certain other equation
s to be represented in plain text without using any form of markup
like HTML
or TeX
.
The World Wide Web Consortium
and the Unicode Consortium
have made recommendations on the choice between using markup and using superscript and subscript characters: "When used in mathematical context (MathML
) it is recommended to consistently use style markup for superscripts and subscripts...However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic
or phonemic transcription."
, respectively. When used with the solidus
, these glyphs are useful for making arbitrary diagonal fractions (similar to the ½ glyph).
This was not the intended use of these characters when Unicode was designed. The intended use was to allow chemical and algebra formulas to be written without markup. Proper appearance of these requires true superscript and subscript, H2O probably looks better using a subscript markup than using these characters, which appear in your browser as H₂O.
Another Unicode character, the fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts) was intended to tell a layout system that a fraction, such as ¹¹⁄₁₂, is preferred. Most font layout systems do not actually produce this, your browser for instance produces 11⁄12.
Consolidated for cut-and-pasting purposes, the Unicode standard defines complete sub- and super-scripts for numbers and common mathematical symbols ( ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁺ ⁻ ⁼ ⁽ ⁾ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ₊ ₋ ₌ ₍ ₎ ), a full superscript Latin lowercase alphabet except q ( ᵃ ᵇ ᶜ ᵈ ᵉ ᶠ ᵍ ʰ ⁱ ʲ ᵏ ˡ ᵐ ⁿ ᵒ ᵖ ʳ ˢ ᵗ ᵘ ᵛ ʷ ˣ ʸ ᶻ ), a limited uppercase Latin alphabet ( ᴬ ᴮ ᴰ ᴱ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴼ ᴾ ᴿ ᵀ ᵁ ⱽ ᵂ ), a few subscripted lowercase letters ( ₐ ₑ ₕ ᵢ ₖ ₗ ₘ ₙ ₒ ₚ ᵣ ₛ ₜ ᵤ ᵥ ₓ ), and some Greek letters ( ᵅ ᵝ ᵞ ᵟ ᵋ ᶿ ᶥ ᶲ ᵠ ᵡ ᵦ ᵧ ᵨ ᵩ ᵪ ). Note that since these glyphs come from different ranges, they may not be of the same size and position, depending on the typeface.
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
has subscripted and superscripted
Subscript and superscript
A subscript or superscript is a number, figure, symbol, or indicator that appears smaller than the normal line of type and is set slightly below or above it – subscripts appear at or below the baseline, while superscripts are above...
versions of a number of characters including a full set of arabic numerals
Arabic numerals
Arabic numerals or Hindu numerals or Hindu-Arabic numerals or Indo-Arabic numerals are the ten digits . They are descended from the Hindu-Arabic numeral system developed by Indian mathematicians, in which a sequence of digits such as "975" is read as a numeral...
. These characters allow any polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...
, chemical
Chemical equation
A chemical equation is the symbolic representation of a chemical reaction where the reactant entities are given on the left hand side and the product entities on the right hand side. The coefficients next to the symbols and formulae of entities are the absolute values of the stoichiometric numbers...
and certain other equation
Equation
An equation is a mathematical statement that asserts the equality of two expressions. In modern notation, this is written by placing the expressions on either side of an equals sign , for examplex + 3 = 5\,asserts that x+3 is equal to 5...
s to be represented in plain text without using any form of markup
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...
like HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
or TeX
TeX
TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....
.
The World Wide Web Consortium
World Wide Web Consortium
The World Wide Web Consortium is the main international standards organization for the World Wide Web .Founded and headed by Tim Berners-Lee, the consortium is made up of member organizations which maintain full-time staff for the purpose of working together in the development of standards for the...
and the Unicode Consortium
Unicode Consortium
The Unicode Consortium is a non-profit organization that coordinates the development of the Unicode standard. Its stated goal is to eventually replace existing character encoding schemes with Unicode and its standard Unicode Transformation Format schemes, claiming that many of the existing...
have made recommendations on the choice between using markup and using superscript and subscript characters: "When used in mathematical context (MathML
MathML
Mathematical Markup Language is an application of XML for describing mathematical notations and capturing both its structure and content. It aims at integrating mathematical formulae into World Wide Web pages and other documents...
) it is recommended to consistently use style markup for superscripts and subscripts...However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic
Phonetic transcription
Phonetic transcription is the visual representation of speech sounds . The most common type of phonetic transcription uses a phonetic alphabet, e.g., the International Phonetic Alphabet....
or phonemic transcription."
Uses
Most fonts that include these characters design them for mathematical numerator and denominator glyphs, which are smaller than normal characters but are aligned with the cap line and the baselineBaseline
A baseline is a line that is a base for measurement or for construction; see datum or point of reference .The word baseline may refer to:...
, respectively. When used with the solidus
Solidus
Solidus may refer to:*Solidus , the "⁄" grammatical punctuation character, also used in mathematics*Slash a sign, "/" used as a punctuation mark and for various other purposes...
, these glyphs are useful for making arbitrary diagonal fractions (similar to the ½ glyph).
This was not the intended use of these characters when Unicode was designed. The intended use was to allow chemical and algebra formulas to be written without markup. Proper appearance of these requires true superscript and subscript, H2O probably looks better using a subscript markup than using these characters, which appear in your browser as H₂O.
Another Unicode character, the fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts) was intended to tell a layout system that a fraction, such as ¹¹⁄₁₂, is preferred. Most font layout systems do not actually produce this, your browser for instance produces 11⁄12.
Superscripts and subscripts block
The most common superscript digits (1,2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at to U+209F. The two tables below show these characters. Each superscript or subscript character is preceded by a normal x to show the subscripting/superscripting. The table on the left contains the actual Unicode characters; the one on the right contains the equivalents using HTML markup for the subscript/superscript. Gray cells are reserved for future use, white cells are other characters from Latin-1.Other superscript and subscript characters
Unicode also includes subscript and superscript characters that are intended for semantic usage, in the following blocks:- the Latin-1 Supplement block contains the feminine and masculine ordinal indicators ª and º.
- the Combining Diacritical Marks block contains medieval superscript letter diacritics. These letters are written directly above other letters appearing in medieval Germanic manuscripts, and so these glyphs do not include spacing, for example uͤ. They are shown here over a long string of periods: ....ͣ...ͤ...ͥ...ͦ...ͧ...ͨ...ͩ...ͪ...ͫ...ͬ...ͭ...ͮ...ͯ..
- the Spacing Modifier Letters block has superscripted letters and symbols used for phonetic transcription: ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ˀ ˁ ˠ ˡ ˢ ˣ ˤ
- the Phonetic Extensions block has several sub- and super-scripted letters and symbols: ᴬ ᴭ ᴮ ᴯ ᴰ ᴱ ᴲ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴻ ᴼ ᴽ ᴾ ᴿ ᵀ ᵁ ᵂ ᵃ ᵄ ᵅ ᵆ ᵇ ᵈ ᵉ ᵊ ᵋ ᵌ ᵍ ᵎ ᵏ ᵐ ᵑ ᵒ ᵓ ᵔ ᵕ ᵖ ᵗ ᵘ ᵙ ᵚ ᵛ ᵜ ᵝ ᵞ ᵟ ᵠ ᵡ ᵢ ᵣ ᵤ ᵥ ᵦ ᵧ ᵨ ᵩ ᵪ ᵸ
- the Phonetic Extensions Supplement block has a few more: ᶛ ᶜ ᶝ ᶞ ᶟ ᶠ ᶡ ᶢ ᶣ ᶤ ᶥ ᶦ ᶧ ᶨ ᶩ ᶪ ᶫ ᶬ ᶭ ᶮ ᶯ ᶰ ᶱ ᶲ ᶳ ᶴ ᶵ ᶶ ᶷ ᶸ ᶹ ᶺ ᶻ ᶼ ᶽ ᶾ ᶿ
Consolidated for cut-and-pasting purposes, the Unicode standard defines complete sub- and super-scripts for numbers and common mathematical symbols ( ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁺ ⁻ ⁼ ⁽ ⁾ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ₊ ₋ ₌ ₍ ₎ ), a full superscript Latin lowercase alphabet except q ( ᵃ ᵇ ᶜ ᵈ ᵉ ᶠ ᵍ ʰ ⁱ ʲ ᵏ ˡ ᵐ ⁿ ᵒ ᵖ ʳ ˢ ᵗ ᵘ ᵛ ʷ ˣ ʸ ᶻ ), a limited uppercase Latin alphabet ( ᴬ ᴮ ᴰ ᴱ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴼ ᴾ ᴿ ᵀ ᵁ ⱽ ᵂ ), a few subscripted lowercase letters ( ₐ ₑ ₕ ᵢ ₖ ₗ ₘ ₙ ₒ ₚ ᵣ ₛ ₜ ᵤ ᵥ ₓ ), and some Greek letters ( ᵅ ᵝ ᵞ ᵟ ᵋ ᶿ ᶥ ᶲ ᵠ ᵡ ᵦ ᵧ ᵨ ᵩ ᵪ ). Note that since these glyphs come from different ranges, they may not be of the same size and position, depending on the typeface.
Composite characters
Primarily for compatibility with earlier character sets, Unicode contains a number of characters that composite super and subscripts along with other symbols. In most fonts these render much better than attempting to construct these symbols from the above characters or by using markup.- the Latin-1 Supplement block contains the precomposed diagonal fractions ½, ¼, and ¾. The copyright © and registered trademark signs ® are also in this block.
- the General Punctuation block contains the permille sign ‰ and the per-ten-thousand sign ‱.
- the Number FormsNumber FormsNumber Forms are Unicode characters which have specific meaning as numbers, but are constructed from other characters. They consist primarily of vulgar fractions and roman numerals. They are placed in the Unicode codepoint range 0x2150 through 0x218F , except for three fractions in ISO-8859-1...
block contains several pre-composed diagonal fractions: ⅐ ⅑ ⅒ ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞ ⅟ ↉ - the Letterlike SymbolsLetterlike SymbolsLetterlike Symbols are graphemes which are constructed mainly from the glyphs of one or more letters.In Unicode, Letterlike Symbols are placed in the block U+2100–214F, as in the following table.-See also:*Mapping of Unicode characters...
block contains a few symbols composed of subscript and superscript characters: ℀ ℁ ℅ ℆ № ℠ ™ ⅍