SGML entity
Encyclopedia
In the Standard Generalized Markup Language
Standard Generalized Markup Language
The Standard Generalized Markup Language is an ISO-standard technology for defining generalized markup languages for documents...

 (SGML), an entity is a primitive
Primitive type
In computer science, primitive data type is either of the following:* a basic type is a data type provided by a programming language as a basic building block...

 data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

, which associates a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 with either a unique alias (such as a user-specified name) or an SGML reserved word
Reserved word
Reserved words are one type of grammatical construct in programming languages. These words have special meaning within the language and are predefined in the language’s formal specifications...

 (such as #DEFAULT). Entities are foundational to the organizational structure and definition of SGML documents. The SGML specification defines numerous entity types, which are distinguished by keyword qualifiers and context. An entity string value may variously consist of plaintext
Plaintext
In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is often used as a synonym. Before the computer era, plaintext most commonly meant message text in the language of the communicating parties....

, SGML tags, and/or references to previously-defined entities. Certain entity types may also invoke external documents. Entities are called by reference.

Types of entities

Entities are classified as general or parameter:
  • A general entity can only be referenced within the document content.
  • A parameter entity can only be referenced within the DTD.


Entities are also further classified as parsed or unparsed:
  • A parsed entity contains text, which will be incorporated into the document and parsed if the entity is referenced. A parameter entity can only be a parsed entity.
  • An unparsed entity contains any kind of data, and a reference to it will result in the application's merely being notified of the entity's presence; the content of the entity will not be parsed, even if it is text. An unparsed entity can only be external.

Internal and external entities

An internal entity has a value that is either a literal string, or a parsed string comprising markup and entities defined in the same document (such as a Document Type Declaration
Document Type Declaration
A Document Type Declaration, or DOCTYPE, is an instruction that associates a particular SGML or XML document with a Document Type Definition...

 or subdocument). In contrast, an external entity has a declaration
Declaration (computer science)
In programming languages, a declaration specifies the identifier, type, and other aspects of language elements such as variables and functions. It is used to announce the existence of the element to the compiler; this is important in many strongly-typed languages that require variables and their...

 that invokes an external document, thereby necessitating the intervention of an entity manager to resolve the external document reference.

System entities

A system entity invokes the optional SYSTEM parameter
Parameter (computer science)
In computer programming, a parameter is a special kind of variable, used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are called arguments...

, which instructs SGML parsers to process an entity's string referent as a resource identifier.

SGML document entity

When an external entity references a complete SGML document, it is known in the calling document as an SGML document entity. An SGML document is a text document with SGML markup defined in an SGML prologue (i.e., the DTD and subdocuments). A complete SGML document comprises not only the document instance itself, but also the prologue and, optionally, the SGML declaration (which defines the document's markup syntax and declares the character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

).

Syntax

An entity is defined via an entity declaration in a document's DTD
Document Type Definition
Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...

. For example:

<!ENTITY greeting1 "Hello world">
<!ENTITY greeting2 SYSTEM "file:///hello.txt">
<!ENTITY % greeting3 "¡Hola!">
<!ENTITY greeting4 "%greeting3; means Hello!">

This DTD markup declares the following:
  • An internal general entity named "greeting1" exists and consists of the string "Hello world".
  • An external general entity named "greeting2" exists and consists of the text found in the resource identified by the URI
    Uniform Resource Identifier
    In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...

     "file:///hello.txt".
  • An internal parameter entity named "greeting3" exists and consists of the string "¡Hola!".
  • An internal general entity named "greeting4" exists and consists of the string "¡Hola! means Hello!".


Names for entities must follow the rules for SGML name
SGML name
An SGML name, in computers, consists of one name start character followed by zero or more name characters for a maximum of 8 characters in total ....

s, and there are limitations on where entities can be referenced.

Parameter entities are referenced by placing the entity name between "%" and ";". Parsed general entities are referenced by placing the entity name between "&" and ";". Unparsed entities are referenced by placing the entity name in the value of an attribute declared as type ENTITY.

The general entities from the example above might be referenced in a document as follows:

<content>
<info>'&greeting1;' is a common test string.</info>
<info>The content of hello.txt is: &greeting2;</info>
<info>In Spanish, &greeting4;</info>
</content>

When parsed, this document would be reported to the downstream application the same as if it has been written as follows, assuming the hello.txt file contains the text "Salutations":

<content>
<info>'Hello world' is a common test string.</info>
<info>The content of hello.txt is: Salutations</info>
<info>In Spanish, ¡Hola! means Hello!</info>
</content>

A reference to an undeclared entity is an error unless a default entity has been defined. For example:

<!ENTITY DEFAULT "This entity is not defined">

Additional markup constructs and processor options may affect whether and how entities are processed. For example, a processor may optionally ignore external entities.

Character entities

Standard entity sets for SGML and some of its derivatives have been developed as mnemonic
Mnemonic
A mnemonic , or mnemonic device, is any learning technique that aids memory. To improve long term memory, mnemonic systems are used to make memorization easier. Commonly encountered mnemonics are often verbal, such as a very short poem or a special word used to help a person remember something,...

 devices, to ease document authoring when there is a need to use characters that are not easily typed or that are not widely supported by legacy character encodings. Each such entity consists of just one character from the Universal Character Set
Universal Character Set
The Universal Character Set , defined by the International Standard ISO/IEC 10646, Information technology — Universal multiple-octet coded character set , is a standard set of characters upon which many character encodings are based...

. Although any character can be referenced using a numeric character reference
Numeric character reference
A numeric character reference is a common markup construct used in SGML and other SGML-related markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represent a single character from the Universal Character Set of Unicode...

, a character entity reference
Character entity reference
In the markup languages SGML, HTML, XHTML and XML, a character entity reference is a reference to a particular kind of named entity that has been predefined or explicitly declared in a Document Type Definition . The "replacement text" of the entity consists of a single character from the Universal...

 allows characters to be referenced by name instead of code point.

HTML 4, for example, has 252 built-in character entities that don't have to be explicitly declared. XML has five. XHTML has the same five as XML, but if its DTDs are explicitly used, then it has 253 (&apos; being the extra entity beyond those in HTML 4).

See also

  • Declarative programming
    Declarative programming
    In computer science, declarative programming is a programming paradigm that expresses the logic of a computation without describing its control flow. Many languages applying this style attempt to minimize or eliminate side effects by describing what the program should accomplish, rather than...

  • Object (computer science)
    Object (computer science)
    In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...

  • List of XML and HTML character entity references
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK