HTML decimal character rendering
Encyclopedia
A numeric character reference in HTML refers to a character by its Universal Character Set
Universal Character Set
The Universal Character Set , defined by the International Standard ISO/IEC 10646, Information technology — Universal multiple-octet coded character set , is a standard set of characters upon which many character encodings are based...

/Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 code point, and uses the format
&#nnnn;

or
&#xhhhh;


where nnnn is the code point in decimal
Decimal
The decimal numeral system has ten as its base. It is the numerical base most widely used by modern civilizations....

 form, and hhhh is the code point in hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

 form. The x must be lowercase in XML documents. The nnnn or hhhh may be any number of digits and may include leading zeros. The hhhh may mix uppercase and lowercase, though uppercase is the usual style.

Not all web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

s or email clients used by receivers of HTML documents, or text editor
Text editor
A text editor is a type of program used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....

s used by authors of HTML documents, will be able to render all HTML characters. Most modern software is able to display most or all of the characters for the user's language, and will draw a box or other clear indicator for characters they cannot render.

For codes from 0 to 127, the original 7-bit ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using character entity names. Only a few higher-numbered codes can be created using entity names, but all can be created by decimal number character reference.

Illegal characters

HTML forbids the use of the characters with Universal Character Set
Universal Character Set
The Universal Character Set , defined by the International Standard ISO/IEC 10646, Information technology — Universal multiple-octet coded character set , is a standard set of characters upon which many character encodings are based...

/Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 code points
  • 0 to 31, except 9, 10, and 13 (C0 control characters)
  • 127 (DEL character)
  • 128 to 159 (C1 control characters)
  • 55296 to 57343 (xD800–xDFFF, the UTF-16 surrogate halves)


These characters are not even allowed by reference. That is, you should not even write them as numeric character reference
Numeric character reference
A numeric character reference is a common markup construct used in SGML and other SGML-related markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represent a single character from the Universal Character Set of Unicode...

s. However, references to characters 128–159 are commonly interpreted by lenient web browsers as if they were references to the characters assigned to bytes 128–159 (decimal) in the Windows-1252
Windows-1252
Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages...

 character encoding. This is in violation of HTML and SGML standards, and the characters are already assigned to higher code points, so HTML document authors should always use the higher code points. For example, for the trademark sign (™), use ™, not ™.

The characters 9 (tab), 10 (linefeed), and 13 (carriage return) are allowed in HTML documents, but, along with 32 (space) are all considered "whitespace
Whitespace (computer science)
In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...

". The "form feed" control character, which would be at 12, is not allowed in HTML documents, but is also mentioned as being one of the "white space" characters — perhaps an oversight in the specifications. In HTML, most consecutive occurrences of white space characters, except in a <pre> block, are interpreted as comprising a single "word separator" for rendering purposes. A word separator is typically rendered a single en-width space in European languages, but not in others.

See also

  • Character encodings in HTML
    Character encodings in HTML
    HTML has been in use since 1991, but HTML 4.0 was the first standardized version where international characters were given reasonably complete treatment...

  • Character entity reference
    Character entity reference
    In the markup languages SGML, HTML, XHTML and XML, a character entity reference is a reference to a particular kind of named entity that has been predefined or explicitly declared in a Document Type Definition . The "replacement text" of the entity consists of a single character from the Universal...

  • List of XML and HTML character entity references
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK