Document Type Declaration
Encyclopedia
A Document Type Declaration, or DOCTYPE, is an instruction that associates a particular SGML or XML
document (for example, a webpage) with a Document Type Definition
(DTD) (for example, the formal definition of a particular version of HTML
). In the serialized
form of the document, it manifests as a short string of markup
that conforms to a particular syntax.
The HTML
layout engine
s in modern web browser
s perform DOCTYPE "sniffing" or "switching", wherein the DOCTYPE in a document served as
" or "standards mode". The
or
In XML, the root element of the document is the first element in the document. For example, in XHTML
, the root element is <html>, being the first element opened (after the doctype declaration) and last closed. The keywords SYSTEM and PUBLIC suggest what kind of DTD it is (one that is on a private system or one that is open to the public). If the PUBLIC keyword is chosen then this keyword is followed by a restricted form of "public identifier
" called Formal Public Identifier
(FPI) enclosed in double quote marks. After that, necessarily, a "system identifier" enclosed in double quote marks, too, is provided. For example, the FPI for XHTML 1.1 is "-//W3C//DTD XHTML 1.1//EN" and, there are 3 possible system identifiers available for XHTML 1.1 depending on the needs, one of them is the URI
reference "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd". If, instead, the SYSTEM keyword is chosen, only a system identifier must be given. It means that the XML parser must locate the DTD in a system specific fashion, in this case, by means of a URI
reference of the DTD enclosed in double quote marks. The last part, surrounded by literal square brackets ([]), is called an internal subset which can be used to add/edit entities or add/edit PUBLIC keyword behaviours. The internal subset is always optional (and sometimes even forbidden within simple SGML profiles, notably those for basic HTML parsers that don't implement a full SGML parser).
On the other hand, document type declarations are slightly different in SGML-based documents such as HTML, where you may associate the public identifier with the system identifier. This association might be performed, e. g., by means of a catalog file resolving the FPI to a system identifier.
pages reads as follows:
This Document Type Declaration for XHTML includes by reference a DTD, whose public identifier
is
should be used for that instead. This is how the Strict DTD looks:
PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
Transitional DTD allows some older PUBLIC and attributes that have been deprecated:
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
If frame
s are used, the Frameset DTD must be used instead, like this:
PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
's DTDs are also Strict, Transitional and Frameset.
XHTML Strict DTD. No deprecated tags are supported and the code must be written correctly.
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML Transitional DTD is like the XHTML Strict DTD, but deprecated tags are allowed.
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
XHTML Frameset DTD is the only XHTML DTD that supports Frameset. The DTD is below.
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
. XHTML 1.1 has the stringency of XHTML 1.0 Strict.
"-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
"-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
XHTML Basic 1.1
"-//W3C//DTD XHTML Basic 1.1//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd">
"-//WAPFORUM//DTD XHTML Mobile 1.0//EN"
"http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
XHTML Mobile Profile 1.1
"-//WAPFORUM//DTD XHTML Mobile 1.1//EN"
"http://www.openmobilealliance.org/tech/DTD/xhtml-mobile11.dtd">
XHTML Mobile Profile 1.2
"-//WAPFORUM//DTD XHTML Mobile 1.2//EN"
"http://www.openmobilealliance.org/tech/DTD/xhtml-mobile12.dtd">
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
With the exception of the lack of a URI or the FPI string (the FPI string is treated case sensitively by validators), this format (a case-insensitive match of the string
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
document (for example, a webpage) with a Document Type Definition
Document Type Definition
Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...
(DTD) (for example, the formal definition of a particular version of HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
). In the serialized
Serialization
In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored and "resurrected" later in the same or another computer environment...
form of the document, it manifests as a short string of markup
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...
that conforms to a particular syntax.
The HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....
layout engine
Layout engine
A web browser engine, , is a software component that takes marked up content and formatting information and displays the formatted content on the screen. It "paints" on the content area of a window, which is displayed on a monitor or a printer...
s in modern web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...
s perform DOCTYPE "sniffing" or "switching", wherein the DOCTYPE in a document served as
text/html
determines a layout mode, such as "quirks modeQuirks mode
In computing, quirks mode refers to a technique used by some web browsers for the sake of maintaining backward compatibility with web pages designed for older browsers, instead of strictly complying with W3C and IETF standards in standards mode....
" or "standards mode". The
text/html
serialization of HTML5, which is not SGML-based, uses the DOCTYPE only for mode selection. Since web browsers are implemented with special-purpose HTML parsers, rather than general-purpose DTD-based parsers, they don't use DTDs and will never access them even if a URL is provided. The DOCTYPE is retained in HTML5 as a "mostly useless, but required" header only to trigger "standards mode" in common browsers.Syntax
The general syntax for a document type declaration is:
<!DOCTYPE root-element PUBLIC "FPI" ["URI"] [
<!-- internal subset declarations -->
]>
or
<!DOCTYPE root-element SYSTEM "URI" [
<!-- internal subset declarations -->
]>
In XML, the root element of the document is the first element in the document. For example, in XHTML
XHTML
XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....
, the root element is <html>, being the first element opened (after the doctype declaration) and last closed. The keywords SYSTEM and PUBLIC suggest what kind of DTD it is (one that is on a private system or one that is open to the public). If the PUBLIC keyword is chosen then this keyword is followed by a restricted form of "public identifier
Public identifier
A public identifier is a document processing construct in SGML and XML.In HTML and XML, a public identifier is meant to be universally unique within its application scope. It typically occurs in a Document Type Declaration....
" called Formal Public Identifier
Formal Public Identifier
A Formal Public Identifier is a short piece of specially formatted text that may be used to uniquely identify a product, specification or document...
(FPI) enclosed in double quote marks. After that, necessarily, a "system identifier" enclosed in double quote marks, too, is provided. For example, the FPI for XHTML 1.1 is "-//W3C//DTD XHTML 1.1//EN" and, there are 3 possible system identifiers available for XHTML 1.1 depending on the needs, one of them is the URI
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...
reference "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd". If, instead, the SYSTEM keyword is chosen, only a system identifier must be given. It means that the XML parser must locate the DTD in a system specific fashion, in this case, by means of a URI
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...
reference of the DTD enclosed in double quote marks. The last part, surrounded by literal square brackets ([]), is called an internal subset which can be used to add/edit entities or add/edit PUBLIC keyword behaviours. The internal subset is always optional (and sometimes even forbidden within simple SGML profiles, notably those for basic HTML parsers that don't implement a full SGML parser).
On the other hand, document type declarations are slightly different in SGML-based documents such as HTML, where you may associate the public identifier with the system identifier. This association might be performed, e. g., by means of a catalog file resolving the FPI to a system identifier.
Example
The first line of many World Wide WebWorld Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...
pages reads as follows:
"-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
This Document Type Declaration for XHTML includes by reference a DTD, whose public identifier
Public identifier
A public identifier is a document processing construct in SGML and XML.In HTML and XML, a public identifier is meant to be universally unique within its application scope. It typically occurs in a Document Type Declaration....
is
-//W3C//DTD XHTML 1.0 Transitional//EN
and whose system identifier is http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
. An entity resolver may use either identifier for locating the referenced external entity. No internal subset has been indicated in this example or the next ones. The root element is declared to be html
and, therefore, it is the first tag to be opened after the end of the doctype declaration in this example and the next ones, too. The html tag is not part of the doctype declaration but has been included in the examples for orientation purposes.HTML 4.01 DTDs
Strict DTD does not allow presentational markup with the argument that Cascading Style SheetsCascading Style Sheets
Cascading Style Sheets is a style sheet language used to describe the presentation semantics of a document written in a markup language...
should be used for that instead. This is how the Strict DTD looks:
PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
Transitional DTD allows some older PUBLIC and attributes that have been deprecated:
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
If frame
Framing (World Wide Web)
When using web browsers, the terms frames or frameset refer to the display of two or more web pages or media elements displayed side-by-side within the same browser window...
s are used, the Frameset DTD must be used instead, like this:
PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
XHTML 1.0 DTDs
XHTMLXHTML
XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....
's DTDs are also Strict, Transitional and Frameset.
XHTML Strict DTD. No deprecated tags are supported and the code must be written correctly.
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML Transitional DTD is like the XHTML Strict DTD, but deprecated tags are allowed.
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
XHTML Frameset DTD is the only XHTML DTD that supports Frameset. The DTD is below.
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
XHTML 1.1 DTD
XHTML 1.1 is the most current finalized revision of XHTML, introducing support for XHTML ModularizationXHTML Modularization
XHTML modularization is a methodology for producing modularized markup languages in a number of different schema languages so that the modules can easily be plugged together to create markup languages....
. XHTML 1.1 has the stringency of XHTML 1.0 Strict.
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
XHTML Basic DTDs
XHTML Basic 1.0"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
XHTML Basic 1.1
"http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd">
XHTML Mobile Profile DTDs
XHTML Mobile Profile 1.0"http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
XHTML Mobile Profile 1.1
"http://www.openmobilealliance.org/tech/DTD/xhtml-mobile11.dtd">
XHTML Mobile Profile 1.2
"http://www.openmobilealliance.org/tech/DTD/xhtml-mobile12.dtd">
XHTML + RDFa DTD
XHTML+RDFa 1.0HTML5 DTD-less DOCTYPE
HTML5 uses aDOCTYPE
declaration which is very short, due to its lack of references to a Document Type Definition in the form of a URL and/or FPI. All it contains is the tag name of the root element of the document, HTML
. In the words of the specification draft itself:In other words,
, case-insensitively.
With the exception of the lack of a URI or the FPI string (the FPI string is treated case sensitively by validators), this format (a case-insensitive match of the string
!DOCTYPE HTML) is the same as found in the syntax of the SGML based HTML 4.01 DOCTYPE
. Both in HTML4 and in HTML5, the formal syntax is defined in upper case letter, even if both lower case and mixes of lower case upper case are also treated as valid.
In XHTML5 the DOCTYPE
has to be a case-sensitive match of the string "
". This is because in XHTML syntax all HTML PUBLIC are required to be in lower case, including the root element referenced inside the HTML5 DOCTYPE
. As well, XHTML only accepts the upper case inside the DOCTYPE
string. These rules are not defined by the HTML5 specification itself but by XML and the syntax rules for XHTML DTDs. For the XHTML5 syntax, then Document Type Definitions are permitted as well.
The DOCTYPE
is optional in XHTML5 and may simply be omitted, though many layout engines render such documents in Quirks modeQuirks modeIn computing, quirks mode refers to a technique used by some web browsers for the sake of maintaining backward compatibility with web pages designed for older browsers, instead of strictly complying with W3C and IETF standards in standards mode....
. This would be a problem whenever the document is supposed to be consumed by text/html
parser as well as by XHTML (application/xhtml+xml
) parsers. Given, however, that the HTML5 specification forbids XML-serialized HTML5 (XHTML5) from being served with any MIME type other than application/xhtml+xml
, this is unlikely to be a situation encountered in the real-world. Unlike with the previous versions of XHTML, it is impossible to serve an XHTML5 (that is, HTML5 serialized as XML) document as text/html
in any conceivable situation; any situation involving XHTML5 will be served as application/xhtml+xml
and parsed as XML in a standards-compliant system.
See also
- Document Type DefinitionDocument Type DefinitionDocument Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...
contains an example
- RDFaRDFaRDFa is a W3C Recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents...
- XML schemaXML schemaAn XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...
- Cascading Style SheetsCascading Style SheetsCascading Style Sheets is a style sheet language used to describe the presentation semantics of a document written in a markup language...
External links
- Recommended DTDs to use in your Web document - an informative (not normative) W3C Quality Assurance publication
- DOCTYPE grid - another overview table
- Quirks mode and transitional mode
- Box model tweaking