XSL Formatting Objects - AbsoluteAstronomy.com

XSL
Extensible Stylesheet Language
In computing, the term Extensible Stylesheet Language is used to refer to a family oflanguages used to transform and render XML documents....

Formatting Objects, or XSL-FO, is a markup language

Markup language

A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

for XML document formatting which is most often used to generate PDFs. XSL-FO is part of XSL

Extensible Stylesheet Language

In computing, the term Extensible Stylesheet Language is used to refer to a family oflanguages used to transform and render XML documents....

(Extensible Stylesheet Language), a set of W3C technologies designed for the transformation and formatting of XML data. The other parts of XSL are XSLT

XSL Transformations

XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

and XPath

XPath

XPath is a language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document...

. As of December 12, 2006, the current version of XSL-FO is v1.1.

XSL-FO basics

Unlike the combination of HTML

HTML

HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

and CSS

Cascading Style Sheets

Cascading Style Sheets is a style sheet language used to describe the presentation semantics of a document written in a markup language...

, XSL-FO is a unified presentational language. It has no semantic markup in the way it is meant in HTML. And, unlike CSS which modifies the default presentation of an external XML or HTML document, it stores all of the document's data within itself.

The general idea behind XSL-FO's use is that the user writes a document, not in FO, but in an XML language. XHTML

XHTML

XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....

, DocBook

DocBook

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation....

, and TEI

Text Encoding Initiative

The Text Encoding Initiative is a text-centric community of practice in the academic field of digital humanities. The community runs a mailing list, meetings and conference series, and maintains a technical standard, a wiki and a toolset....

are all possible examples. Then, the user obtains an XSLT

XSLT

transform, either by writing one themselves or by finding one for the document type in question. This XSLT transform converts the XML into XSL-FO.

Once the XSL-FO document is generated, it is then passed to an application called an FO processor. FO processors convert the XSL-FO document into something that is readable, printable or both. The most common output of XSL-FO is a PDF file or as PS

PostScript

PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

, but some FO processors can output to other formats like RTF

Rich Text Format

The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation since 1987 for Microsoft products and for cross-platform document interchange....

files or even just a window in the user's GUI displaying the sequence of pages and their contents.

The XSLT language itself was originally conceived only for this purpose; it is now in widespread use for more general XML transformations. This transformation step is taken so much for granted in XSL-FO that it is not uncommon for people to call the XSLT that turns XML into XSL-FO the actual XSL-FO document itself. Even tutorials on XSL-FO tend to be written with XSLT commands around the FO processing instructions.

The XSLT transformation step is exceptionally powerful. It allows for the automatic generation of a table of contents, linked references, an index, and various other possibilities.

An XSL-FO document is not like a PDF or a PostScript document. It does not definitively describe the layout of the text on various pages. Instead, it describes what the pages look like and where the various contents go. From there, an FO processor determines how to position the text within the boundaries described by the FO document. The XSL-FO specification even allows different FO processors to have varying responses with regard to the resultant generated pages.

For example, some FO processors can hyphenate words to minimize space when breaking a line, while others choose not to. Different processors may even use different hyphenation algorithms, ranging from very simple to more complex hyphenation algorithms that take into account whether the previous or next line also is hyphenated. These will change, in some borderline cases quite substantially, the layout of the various pages. There are other cases where the XSL-FO specification explicitly allows FO processors some degree of choice with regard to layout.

This differentiation between FO processors, creating inconsistent results between processors is often not a concern. This is because the general purpose behind XSL-FO is to generate paged, printed media. XSL-FO documents themselves are usually used as intermediaries, mostly to generate either PDF files or a printed document as the final form to be distributed. This is as opposed to how HTML is generated and distributed as a final form directly to the user. Distributing the final PDF rather than the formatting language input (whether HTML/CSS or XSL-FO) means on the one hand that recipients aren't affected by the unpredictability resulting from differences among formatting language interpreters, while on the other hand means that the document cannot easily adapt to different recipient needs, such as different page size or preferred font size, or tailoring for on-screen versus on-paper versus audio presentation.

XSL-FO language concepts

The XSL-FO language was designed for paged media; as such, the concept of pages is an integral part of XSL-FO's structure.

FO works best for what could be called "content-driven" design. This is the standard method of layout for books, articles, legal documents, and so forth. It involves a single flowing span of fairly contiguous text, with various repeating information built into the margins of a page. This is as opposed to "layout-driven" design, which is used in newspapers or magazines. If content in those documents does not fit in the required space, some of it is trimmed away until it does fit. XSL-FO does not easily handle the tight restrictions of magazine layout; indeed, in many cases, it lacks the ability to express some forms of said layout.

Despite the basic nature of the language's design, it is capable of a great deal of expressiveness. Tables, lists, side floats, and a variety of other features are available. These features are comparable to CSS's layout features, though some of those features are expected to be built by the XSLT.

XSL-FO document structure

XSL-FO documents are XML documents, but they do not have to conform to any DTD

Document Type Definition

Document Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...

or schema. Instead, they conform to a syntax defined in the XSL-FO specification.

XSL-FO documents contain two required sections. The first section details a list of named page layouts. The second section is a list of document data, with markup, that uses the various page layouts to determine how the content fills the various pages.

Page layouts define the properties of the page. They can define the directions for the flow of text, so as to match the conventions for the language in question. They define the size of a page as well as the margins of that page. More importantly, they can define sequences of pages that allow for effects where the odd and even pages look different. For example, one can define a page layout sequence that gives extra space to the inner margins for printing purposes; this allows more space to be given to the margin where the book will be bound.

The document data portion is broken up into a sequence of flows, where each flow is attached to a page layout. The flows contain a list of blocks which, in turn, each contain a list of text data, inline markup elements, or a combination of the two. Content may also be added to the margins of the document, for page numbers, chapter headings and the like.

Blocks and inline elements function in much the same way as for CSS, though some of the rules for padding and margins differ between FO and CSS. The direction, relative to the page orientation, for the progression of blocks and inlines can be fully specified, thus allowing FO documents to function under languages that are read different from English. The language of the FO specification, unlike that of CSS 2.1, uses direction-neutral terms like start and end rather than left and right when describing these directions.

XSL-FO's basic content markup is derived from CSS and its cascading rules. As such, many attributes in XSL-FO propagate into the child elements unless explicitly overridden.

Capabilities of XSL-FO v1.0

XSL-FO is capable of a great deal of textual layout functionality. In addition to the information as specified above, XSL-FO's language allows for the specification of the following.

Multiple columns

A page can be defined to have multiple columns. When this is the case, blocks flow from one column into the next by default. Individual blocks can be set to span all columns, creating a textual break in the page. The columns above this break will flow into each other, as will the columns below the break. But no text is allowed to flow from the above section to the below section.

Because of the nature of XSL-FO's page specification, multiple pages may actually have different numbers and widths of columns. As such, text can flow from a 3 column page to a 5 column page to a 1 column page quite easily.

All FO features work within the restrictions of a multi-column page.

Lists

An XSL-FO list is, essentially, two sets of blocks stacked side by side. An entry consists of a block on the "left", or start inline direction, and a block sequence on the "right", or end inline direction. The block on the left is conceptually what would be the number or bullet in a list. However, it could just as easily be a string of text, as one might see in a glossary entry. The block on the right works as expected. Both of these blocks can be block containers, or have multiple blocks in a single list entry.

Numbering of XSL-FO lists, when they are numbered, is expected to be done by the XSLT, or whatever other process, that generated the XSL-FO document. As such, number lists are to be explicitly numbered in XSL-FO.

Pagination controls

The user can specify Widow and Orphan

Orphan (typesetting)

In typesetting, widows and orphans are words or short lines at the beginning or end of a paragraph, which are left dangling at the top or bottom of a column, separated from the rest of the paragraph...

for blocks or for the flow itself, and allow the attributes to cascade into child blocks. Additionally, blocks can be specified to be kept together on a single page. For example, an image block and the description of that image can be set to never be separated. The FO processor will do its best to adhere to these commands, even if it requires creating a great deal of empty space on a page.

Footnotes

The user can create footnotes that appear at the bottom of a page. The footnote is written, in the FO document, in the regular flow of text at the point where it is referenced. The reference is represented as an inline definition, though it is not required. The body is one or more blocks that are placed by the FO processor to the bottom of the page. The FO processor guarantees that wherever the reference is, the footnote cited by that reference will begin on the same page. This will be so even if it means creating extra empty space on a page.

Tables

An FO table functions much like an HTML/CSS table. The user specifies rows of data for each individual cell. The user can, also, specify some styling information for each column, such as background color. Additionally, the user can specify the first row as a table header row, with its own separate styling information.

The FO processor can be told exactly how much space to give each column, or it can be told to auto-fit the text in the table.

Text orientation controls

FO has extensive controls for orienting blocks of text. One can, in the middle of a page, designate a block of text to be oriented in a different orientation. These oriented blocks can be used for languages in a different orientation from the rest of the document, or simply if one needs to orient the text for layout purposes. These blocks can contain virtually any kind of content, from tables to lists or even other blocks of reoriented text.

Miscellaneous

Page number citations. A page that contains a special tag can be cited in text, and the FO processor will fill in the actual page number where this tag appears.
Block borders, in a number of styles.
Background colors and images.
Font controls and weighting, as in CSS.
Side floats.
Miscellaneous Inline Elements.

Multiple flows and flow mapping

XSL-FO 1.0 was fairly restrictive about what text was allowed to go in what areas of a page. Version 1.1 loosens these restrictions significantly, allowing flowing text to be mapped into multiple explicit regions on a page. This allows for more newspaper-like typesetting.

Bookmarks

Many output formats for XSL-FO processors, specifically PDF, have bookmarking features. These allow the format to specify a string of text in a separate window that can be selected by the user. When selected, the document window scrolls immediately to a specific region of the document.

XSL-FO v1.1 now provides the ability to create named bookmarks in XSL-FO, thus allowing the processor to pass this on to an output format that supports it.

Indexing

XSL-FO 1.1 has features that support the generation of an index that might be found at the back of a book. This is done through referencing of properly marked-up elements in the FO document.

Last page citation

The last page can be generated without providing an explicit in-document reference to a specific anchor in the FO document. The definition of "last page" can be restricted to within a specific set of pages or to cover the entire document. This allows the user to specify something like, "Page 2 out of 15", where page 15 is the page number of a last page definition.

Table markers

Table markers allow the user to create dynamic content within table headers and footers, such as running totals at the bottom of each page of a table or "table continued" indicators.

Inside/outside floats

XSL-FO 1.1 adds the keywords "inside" and "outside" for side floats, which makes it possible to achieve page layouts with marginalia positioned on the outside or inside edges of pages. Inside refers to the side of the page towards the book binding, and outside refers to the side of a page away from the book binding.

Refined graphic sizing

XSL-FO 1.1 refines the functionality for sizing of graphics to fit, with the ability to shrink to fit (but not grow to fit), as well as the ability to define specific scaling steps. In addition, the resulting scaling factor can be referenced for display (for example, to say in a figure caption, "image shown is 50% actual size").

Advantages of XSL-FO

XML language - Because it is an XML language, only an XSLT transform (and an XSLT processor) is required to generate XSL-FO code from any XML language. One can easily write a document in TEI

Text Encoding Initiative

or DocBook

DocBook

, and transform it into HTML for web viewing or PDF (through an FO processor) for printing. In fact, there are many pre-existing TEI and DocBook XSLTs for both of these purposes.

Ease of use - Another advantage of XSL-FO is the relative ease of use. Much of the functionality of the language is based on work from CSS, so a CSS user will be familiar with the basics of the markup attributes. Understanding what a specific section of an FO document will look like is usually quite easy.

Low cost - Compared with commercial typesetting and page layout products, XSL-FO can offer a much lower cost solution when it otherwise meets the typographic and layout requirements (see below). The initial cost of ownership is low (zero if the free implementations, such as Apache FOP and xmlroff, meet your requirements), especially compared to the cost of commercial composition tools. The skills required (primarily XSLT programming) are widely available. There are a number of good books on XSL-FO as well as online resources and an active user community.

Multi-lingual - XSL-FO has been designed to work for all written human languages and the implementations have largely achieved that goal. This makes XSL-FO particularly well suited for composing documents localized into a large number of national languages where the requirement is to have a single tool set that can compose all the language versions of documents. This is especially valuable for technical documentation for things like consumer electronics, where Asian and Middle Eastern languages are important because those parts of world represent huge markets for things like mobile phones and computer peripherals.

Mature standard - With the publication of XSL-FO 1.1, XSL-FO is proving to be a mature standard with a number of solid commercial and non-commercial implementations. There is no other comparable standard for page composition.

Drawbacks of XSL-FO

Limited capabilities - XSL-FO was specifically designed to meet the requirements of "lightly designed" documents typified by technical manuals, business documents, invoices, and so on. While it can be and is used for more sophisticated designs, it is inherently limited in what it can do from a layout and typographic perspective. In particular, XSL-FO does not provide a direct way to get formatting effects that depend on knowing the page position relationship of two formatting objects. For example, there is no direct way to say "if this thing is on the same page as that thing, then do X, otherwise do Y". This is an explicit design decision reflecting the two-stage, transform-based abstract processing model used by XSL-FO. This limitation can be addressed by implementing a multi-pass process. Unfortunately, there is currently no standard for how the result of the first pass would be communicated back to the second pass. Most, if not all, implementations provide some form of processable intermediate result format that can be used for this, but any such process implemented today would, by necessity, be implementation specific.

By the same token, there are important layout features that are simply not in XSL-FO, either because they were not of high enough priority or because designing them was too difficult to allow inclusion in version 1.1, or because there were insufficient implementations to allow their inclusion in the final specification per W3C rules.

In addition to these architectural limitations, the current XSL-FO implementations, both commercial and open source, do not provide the same level of typographic sophistication provided by high-end layout tools like QuarkXPress

QuarkXPress

QuarkXPress is a computer application for creating and editing complex page layouts in a WYSIWYG environment. It runs on Mac OS X and Windows. It was first released by Quark, Inc...

or InDesign

Adobe InDesign

Adobe InDesign is a software application produced by Adobe Systems. It can be used to create works such as posters, flyers, brochures, magazines, newspapers and books. In conjunction with Adobe Digital Publishing Suite InDesign can publish content suitable for tablet devices...

, or by programmable typesetting systems like LaTeX

LaTeX

LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

. For example, no current implementation provides features for ensuring that text lines on facing pages are lined up vertically. There is nothing in the XSL-FO specification that prevents it but nothing that requires it either. For most documents for which a completely automated composition solution is sufficient, that level of typographic sophistication is not needed. However, for high-end publications and mass-market books, it usually is; in some cases this can be met by using XSLT to generate a LaTeX

LaTeX

document instead.

Extension dependency - When considering the applicability of XSL-FO to a particular document or document design, one must consider proprietary extensions provided by the different XSL-FO implementations. These extensions add features that are not part of the core specification. For example, one product adds support for Japanese typographic conventions that the XSL-FO specification does not address. However, use of these features makes such an XSL-FO system a little more bound to a specific implementation (but not completely bound as it would be when using a totally proprietary composition system.)

Impractical manual editing - It is generally impractical to edit XSL-FO instances by hand (XSL-FO was designed for clarity and completeness, not ease of editing.). Visual editing tools such as XFDesigner

XFDesigner

XF Designer is a document layout editor based on XSL-FO. Its primary purpose is to eliminate the complexity of building XSL-FO templates by providing a WYSIWYG way to do this....

can alleviate the task, although not all XSL-FO tags are accessible (most notably markers and footnotes).

When trying to decide whether or not XSL-FO will work for a given document, the following typographic and layout requirements usually indicate that XSL-FO will not work (although some of these may be satisfied by proprietary extensions):

Need to restart footnote numbers or symbol sequence on each new page (however, some implementations provide extensions to support automatic footnote numbering.)
Need to run text around both sides of a floated object (XSL-FO can run text around one side and the top and/or bottom, but not both sides; however, some implementations provide support for such complex layouts via proprietary extensions.)
Need to have variable numbers of columns on a single page (however, at least two commercial implementations provide extensions for creating multi-column blocks within a page.)
Need to have column-wide footnotes (several implementations provide column footnote extensions.)
Need to have marginalia that is dynamically placed relative to other marginalia (for example, marginal notes that are evenly spaced vertically on the page). XSL-FO only provides features for placing marginalia so that it is vertically aligned with its anchor.
Need to create content that spreads across two pages as a float or "out of line" object in an otherwise homogeneous sequence of repeating page masters (this can be done in XSL-FO 1.1 using multiple body regions and flow maps, but it requires being able to control the page masters used for those pages.)
Need both bottom-floated content and footnotes on the same page.
Need to be able to run text against an arbitrary curve (though some implementation support SVG
Scalable Vector Graphics
Scalable Vector Graphics is a family of specifications of an XML-based file format for describing two-dimensional vector graphics, both static and dynamic . The SVG specification is an open standard that has been under development by the World Wide Web Consortium since 1999.SVG images and their...

, which can be used to get around this limitation).
Need to be able to constrain lines to specific baseline grids (for example, to achieve exact registration of lines on facing pages.)
Anything that requires page-aware layout, such as ensuring that a figure always occurs on the page facing its anchor point.

External links

What is XSL-FO? on XML.com
W3 Schools A source for beginners and dabblers
FO examples and techniques Excellent reference site set up by Dave Pawson, for those who want to use FOP, but were afraid to ask.
XSL-FO: Ready for Prime Time? on the Gilbane Report
XSL-FO Tutorial and Samples on Antenna House
Data2Type (German) XSL-FO information
XSL Formatting Objects Tutorial on RenderX
XSL-FO Tutorial and Samples on Ecrion.com
XSL-FO Support on AltSoft
aXSL - Experimental open source and royalty-free interfaces for publishing system modules, based on XSL-FO
FOray - Open source and royalty-free implementation of XSL-FO, using the aXSL interfaces

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

XSL-FO basics

XSL-FO language concepts

XSL-FO document structure

Capabilities of XSL-FO v1.0

Multiple columns

Lists

Pagination controls

Footnotes

Tables

Text orientation controls

Miscellaneous

Multiple flows and flow mapping

Bookmarks

Indexing

Last page citation

Table markers

Inside/outside floats

Refined graphic sizing

Advantages of XSL-FO

Drawbacks of XSL-FO

See also

External links