DocBook
Encyclopedia
DocBook is a semantic
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

 markup language
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

 for technical documentation
Documentation
Documentation is a term used in several different ways. Generally, documentation refers to the process of providing evidence.Modules of Documentation are Helpful...

. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation.

As a semantic language, DocBook enables its users to create document content in a presentation-neutral form
Separation of presentation and content
Separation of presentation and content is a common idiom, a design philosophy, and a methodology applied in the context of various publishing technology disciplines, including information retrieval, template processing, web design, web development, word processing, desktop publishing,...

 that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

, XHTML
XHTML
XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....

, EPUB
EPUB
EPUB is a free and open e-book standard by the International Digital Publishing Forum...

, PDF
Portable Document Format
Portable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....

, man pages
Manual page (Unix)
Man pages are the extensive documentation that comes preinstalled with almost all substantial Unix and Unix-like operating systems. The Unix command used to display them is man. Each page is a self-contained document.- Usage :...

, Web help
Web help
Web help is a type of online help delivered through the internet. A well-known example of such a system is WebHelp.Com. This approach, mixing internet and local resources, is also used in Windows XP's Help and Support feature....

 and HTML Help, without requiring users to make any changes to the source.

Overview

DocBook is an XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 language. In its current version (5.0), DocBook's language is formally defined by a RELAX NG
RELAX NG
In computing, RELAX NG is a schema language for XML, based on Murata Makoto's RELAX and James Clark's TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document...

 schema
XML schema
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...

 with integrated Schematron
Schematron
In markup languages, Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees...

 rules. (There are also W3C XML Schema+Schematron and Document Type Definition
Document Type Definition
Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...

 (DTD) versions of the schema available, but these are considered non-standard.)

As a semantic language, DocBook documents do not describe what their contents "look like," but rather the meaning of those contents. For example, rather than explaining how the abstract
Abstract (summary)
An abstract is a brief summary of a research article, thesis, review, conference proceeding or any in-depth analysis of a particular subject or discipline, and is often used to help the reader quickly ascertain the paper's purpose. When used, an abstract always appears at the beginning of a...

 for an article might be visually formatted, DocBook simply says that a particular section is an abstract. It is up to an external processing tool or application to decide where on a page the abstract should go and what it should look like. (And, indeed, to decide whether or not it should be included in the final output at all.)

DocBook provides a vast number of semantic element tags. They are divided into three broad categories: structural, block-level, and inline.

Structural tags specify broad characteristics of their contents. The book element, for example, specifies that its child elements represent the parts of a book. This includes a title, chapters, glossaries, appendices, and so on. DocBook's structural tags include, but are not limited to:
  • set: a titled collection of one or more books. Sets can be nested with other sets.
  • book: a titled collection of chapters, articles, and/or parts, with optional glossaries, appendices, and so forth.
  • part: a titled collection of one or more chapters. Parts can be nested with other parts. May have special introductory text.
  • article: a titled, unnumbered collection of block-level elements.
  • chapter: a titled, numbered collection of block-level elements. DocBook does not actually require that chapters be explicitly given numbers; it is understood by the semantics that the number of a chapter is the number of previous chapter elements in the XML document plus 1.
  • appendix: the contained text represents an appendix
    Addendum
    An addendum, in general, is an addition required to be made to a document by its reader subsequent to its printing or publication. It comes from the Latin verbal phrase addendum est, being the gerundive form of the verb addo, addere, addidi, additum, "to give to, add to", meaning " must be added"...

    .
  • dedication: the text represents the dedication of the contained structural element.


Structural elements can contain other structural elements. Structural elements are the only permitted top-level elements in a DocBook document.

Block-level tags are elements like paragraph, lists, and so forth. Not all of these elements can contain actual text directly. Sequential block-level elements are expected to be rendered one "after" another. After, in this case, can differ depending on the language. In most Western languages, "after" means below: text paragraphs are printed down the page. Other languages' writing systems can have different directionality; for example, in Japanese, text is often printed in columns, with paragraphs running from right to left, so "after" in that case would be to the left. DocBook semantics are entirely neutral to these kinds of language-based concepts.

Inline-level tags are elements like emphasis, hyperlinks, and so forth. They wrap text within a block-level element. These elements do not cause the text to break when rendered in a paragraph format, but typically they cause the document processor to apply some kind of distinct typographical treatment to the enclosed text, by changing the font, size, or similar attributes. (The DocBook specification does say that it expects different typographical treatment, but it does not offer specific requirements as to what this treatment may be.) That is, it is not required that a DocBook processor transform an emphasis tag into "italics." A reader-based DocBook processor could increase the volume of the words. Or, a text-based processor could use bold instead of italics.

Sample document




Very simple book

Chapter 1
Hello world!
I hope that your day is proceeding splendidly!


Chapter 2
Hello again, world!




Semantically, this document is a "book," with a "title," that contains two "chapters" each with their own "titles." Those "chapters" contain "paragraphs" that have text in them. The markup is fairly readable in English.

In more detail, the root element of the document is book. All DocBook elements are in an XML Namespace
XML Namespace
xmlns tagged XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary...

, so the root element has an xmlns attribute to set the current namespace. Also, the root element of a DocBook document must have a version that specifies the version of the format that the document is built on.

(XML documents can include elements from multiple namespaces at once. For simplicity, the example does not illustrate this.)

A book element must contain a title, or an info element containing a title. This must be before any child structural elements. Following the title are the structural children, in this case, two chapter elements. Each of these must have a title. They contain para block elements which can contain free text and other inline elements like the emphasis in the second paragraph of the first chapter.

Schemas and validation

Rules such as the ones alluded to in the preceding paragraph ("a book element must contain a title, or an info element containing a title," etc.) are formally defined in the DocBook schema. Appropriate programming tools can be used to validate an XML document (DocBook or otherwise), against its corresponding schema, in order to determine if (and if so, where) the document fails to conform to that schema. XML editing tools can also use schema information to avoid creating non-conforming documents in the first place.

DocBook authoring

Because DocBook is XML, documents can be created and edited with any text editor. A dedicated XML Editor
XML editor
An XML editor is a markup language editor with added functionality to facilitate the editing of XML. This can be done using a plain text editor, with all the code visible, but XML editors have added facilities like tag completion and menus and buttons for tasks that are common in XML editing, based...

 is likewise a functional DocBook editor. DocBook provides schema files
XML schema
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...

 for popular XML schema languages, so any XML Editor that can provide content completion based on a schema can do so for DocBook. Many graphical or WYSIWYG
WYSIWYG
WYSIWYG is an acronym for What You See Is What You Get. The term is used in computing to describe a system in which content displayed onscreen during editing appears in a form closely corresponding to its appearance when printed or displayed as a finished product...

 XML editors come with the ability to edit DocBook like a Word Processor
Word processor
A word processor is a computer application used for the production of any sort of printable material....

.

DocBook processing

Because DocBook is an XML format, conforming to a well-defined schema, documents can be validated and processed using any tool or programming language which includes XML support.

DocBook files are used to prepare output files in a wide variety of formats. Nearly always, this is accomplished using DocBook XSL
DocBook XSL
The DocBook XSL stylesheets are a set of XSLT stylesheets for the XML-based DocBook language.- Purpose :DocBook is a semantic markup language. That is, it specifies the meaning of the elements in a document, not how they are intended to be presented to the end user. It provides separation between...

 stylesheets. These are XSLT
XSLT
XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

 stylesheets that transform DocBook documents into a number of formats (HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

, XSL-FO for later conversion into PDF, etc). These stylesheets can be sophisticated enough to generate tables of contents, glossaries, and indexes. They can oversee the selection of particular designated portions of a master document to produce different versions of the same document (such as a "tutorial" or a "quick-reference guide," where both of these consist of a subset of the material).

Because the standard DocBook XSL stylesheets are well-formed XSL stylesheets, and DocBook is well-formed XML, users can write their own customized stylesheets or even a full-fledged program to process the DocBook into an appropriate output format as their needs dictate.

Web help

Webhelp is a chunked html output format in the DocBook xslt stylesheets that was introduced in version 1.76.1. The documentation for web help also provides an example of web help and is part of the DocBook xsl distribution. It's major features include CSS-based page layout without frameset, multilingual full content search, Table of contents (TOC) pane with collapsible TOC tree, Auto-synchronization of content pane and TOC. This web help format was originally implemented by Kasun Gajasinghe and David Cramer as part of the Google Summer of Code 2010 program.

History

DocBook began in 1991 in discussion groups on Usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

 and evenually became a joint project of HAL Computer Systems
HAL Computer Systems
HAL Computer Systems, Inc was a Campbell, California-based computer manufacturer founded in 1990 by Andrew Heller, a principal designer of the original IBM POWER architecture...

 and O'Reilly & Associates
O'Reilly Media
O'Reilly Media is an American media company established by Tim O'Reilly that publishes books and Web sites and produces conferences on computer technology topics...

 and eventually spawned its own maintenance organization (the Davenport Group) before moving in 1998 to the SGML Open consortium, which subsequently became OASIS
OASIS (organization)
The Organization for the Advancement of Structured Information Standards is a global consortium that drives the development, convergence and adoption of e-business and web service standards...

. DocBook is currently maintained by the DocBook Technical Committee at OASIS.

DocBook is available in both SGML and XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 forms, as a DTD
Document Type Definition
Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...

. RELAX NG
RELAX NG
In computing, RELAX NG is a schema language for XML, based on Murata Makoto's RELAX and James Clark's TREX. A RELAX NG schema specifies a pattern for the structure and content of an XML document...

 and W3C XML Schema forms of the XML version are available. Starting with DocBook 5, the RELAX NG version is the "normative" form from which the other formats are generated.

DocBook originally started out as an SGML application, but an equivalent XML application was developed and has now replaced the SGML one for most uses. (Starting with version 4 of the SGML DTD, the XML DTD continued with this version numbering scheme.) Initially, a key group of software companies used DocBook since their representatives were involved in its initial design. Eventually, however, DocBook was adopted by the open source community where it has become a standard for creating documentation for many projects, including FreeBSD
FreeBSD
FreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...

, KDE
KDE
KDE is an international free software community producing an integrated set of cross-platform applications designed to run on Linux, FreeBSD, Microsoft Windows, Solaris and Mac OS X systems...

, GNOME
GNOME
GNOME is a desktop environment and graphical user interface that runs on top of a computer operating system. It is composed entirely of free and open source software...

 desktop documentation, the GTK+
GTK+
GTK+ is a cross-platform widget toolkit for creating graphical user interfaces. It is licensed under the terms of the GNU LGPL, allowing both free and proprietary software to use it. It is one of the most popular toolkits for the X Window System, along with Qt.The name GTK+ originates from GTK;...

 API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

 references, the Linux kernel
Linux kernel
The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....

 documentation, and the work of the Linux Documentation Project
Linux Documentation Project
The Linux Documentation Project is an all-volunteer project that maintains a large collection of GNU and Linux-related documentation and publishes the collection online. It began as a way for hackers to share their documentation with each other and with their users, and for users to share...

.

Norman Walsh and the DocBook Project development team maintain the key application for producing output from DocBook source documents: A set of XSL
Extensible Stylesheet Language
In computing, the term Extensible Stylesheet Language is used to refer to a family oflanguages used to transform and render XML documents....

 stylesheets (as well as a legacy set of DSSSL stylesheets) that can generate high-quality HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 and print (FO
XSL Formatting Objects
XSL Formatting Objects, or XSL-FO, is a markup language for XML document formatting which is most often used to generate PDFs. XSL-FO is part of XSL , a set of W3C technologies designed for the transformation and formatting of XML data. The other parts of XSL are XSLT and XPath...

/PDF
Portable Document Format
Portable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....

) output, as well as output in other formats, including RTF
Rich Text Format
The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....

, man pages and HTML Help.

Walsh is also the principal author of the book DocBook: The Definitive Guide, the official documentation of DocBook. This book is available online under the GFDL
GNU Free Documentation License
The GNU Free Documentation License is a copyleft license for free documentation, designed by the Free Software Foundation for the GNU Project. It is similar to the GNU General Public License, giving readers the rights to copy, redistribute, and modify a work and requires all copies and...

, and also as a print publication.

Pre DocBook v5.0

The current version of DocBook, 5.0, is fairly recent. Prior versions have been and still are in widespread use, so this section provides an overview of the changes to the older 4.x formats.

Until DocBook 5, DocBook was defined normatively by a Document Type Definition (DTD). Since DocBook was built originally as an application of SGML, the DTD was the only available schema language. DocBook 4.x formats can be SGML or XML, but the XML version does not have its own namespace.

As an outgrowth of being defined by a DTD, DocBook 4.x formats were required to live within the restrictions of being defined by a DTD. The most significant for the language being that an element name uniquely defines its possible contents. That is, an element named info must contain the same information no matter where it is in the DocBook file. As such, there are many kinds of info elements in DocBook 4.x: bookinfo, chapterinfo, etc. Each of them has a slightly different content model, but they do share some of their content model. Additionally, they repeat context information. The book's info element is that because it is a direct child of the book; it does not need to be named specially for a human reader. However, because the format was defined by a DTD, it did have to be named as such.

The root element does not have or need a version, as the version is built into the DTD declaration at the top of a pre-DocBook 5 document.

DocBook 4.x documents are not compatible with DocBook 5, but they can be converted into DocBook 5 documents through the use of an XSLT stylesheet. One is provided as part of the distribution of the DocBook 5 schema and specification package.

Simplified DocBook

DocBook offers a large number of features that may be overwhelming to a new user. For those who want the convenience of DocBook without a large learning curve, Simplified DocBook was designed. It is a small subset of DocBook designed for single documents such as articles or white papers (i.e., "books" are not supported). The Simplified DocBook DTD is currently at version 1.1.

See also

  • List of document markup languages
  • Comparison of document markup languages
    Comparison of document markup languages
    The following tables compare general and technical information for a number of document markup languages. Please see the individual markup languages' articles for further information.-General information:...

  • DocBook XSL
    DocBook XSL
    The DocBook XSL stylesheets are a set of XSLT stylesheets for the XML-based DocBook language.- Purpose :DocBook is a semantic markup language. That is, it specifies the meaning of the elements in a document, not how they are intended to be presented to the end user. It provides separation between...

     A group of XSLT
    XSLT
    XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

     stylesheets for transforming DocBook into various viewable formats.
  • Darwin Information Typing Architecture
    Darwin Information Typing Architecture
    The Darwin Information Typing Architecture is an OASIS standard XML data model for authoring and publishing. Many third party tools support authoring, including Adobe FrameMaker, XMetaL, Arbortext, Quark XML Author, Oxygen XML Editor, easyDITA, and SDL Xopus...

     (DITA), a competing xml vocabulary for technical documents
  • LinuxDoc
    LinuxDoc
    LinuxDoc is an SGML DTD which is similar to DocBook. It was created by Matt Welsh and version 1.1 was announced in 1994. It is primarily used by the Linux Documentation Project. The DocBook SGML tags are often longer than the equivalent LinuxDoc tags...

  • LaTeX
    LaTeX
    LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK