Complex Text Layout
Encyclopedia
- See Help:Multilingual support for enabling complex text layout on your computer
Complex text layout (abbreviated CTL) or complex text rendering refers to the typesetting of writing system
Writing system
A writing system is a symbolic system used to represent elements or statements expressible in language.-General properties:Writing systems are distinguished from other possible symbolic communication systems in that the reader must usually understand something of the associated spoken language to...
s which require complex transformations between text input and text display for proper rendering on the screen or the printed page (also known as complex scripts). In other words, for these scripts the way text is stored is not mapped to the way it is displayed in a straightforward fashion. The term is used in the field of software internationalization
Internationalization
In economics, internationalization has been viewed as a process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization or international entrepreneurship...
.
Examples of writing systems requiring CTL are the Arabic alphabet
Arabic alphabet
The Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually stand for consonants, it is classified as an abjad.-Consonants:The Arabic alphabet has...
and scripts of the Brahmic family
Brahmic family
The Brahmic or Indic scripts are a family of abugida writing systems. They are used throughout South Asia , Southeast Asia, and parts of Central and East Asia, and are descended from the Brāhmī script of the ancient Indian subcontinent...
such as Devanagari
Devanagari
Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...
or the Thai alphabet
Thai alphabet
Thai script , is used to write the Thai language and other, minority, languages in Thailand. It has forty-four consonants , fifteen vowel symbols that combine into at least twenty-eight vowel forms, and four tone marks ....
.
CTL is a generalization of the concept of ligature
Ligature (typography)
In writing and typography, a ligature occurs where two or more graphemes are joined as a single glyph. Ligatures usually replace consecutive characters sharing common components and are part of a more general class of glyphs called "contextual forms", where the specific shape of a letter depends on...
: for the Latin alphabet
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...
, ligatures are usually considered a marginal aesthetic concern, but there is no fundamental difference between the ligatures required for acceptable typesetting of the Arabic script, and typesetting a Latin cursive
Cursive
Cursive, also known as joined-up writing, joint writing, or running writing, is any style of handwriting in which the symbols of the language are written in a simplified and/or flowing manner, generally for the purpose of making writing easier or faster...
. Conversely, most characters of the Chinese script are compositional and could be considered ligatures, but are usually encoded as so many individual characters, that typesetting requires an enormous typeface rather than sophisticated layout. An example of a contextual variant that is not considered a ligature is Greek final sigma
Sigma
Sigma is the eighteenth letter of the Greek alphabet, and carries the 'S' sound. In the system of Greek numerals it has a value of 200. When used at the end of a word, and the word is not all upper case, the final form is used, e.g...
ς, the word-final contextual variant of the usual σ shape. Unicode encodes both variants separately, at U+03C2 and U+03C3 respectively. However, for collation and comparison purposes, software should likely consider the string "δῖος Ἀχιλλεύς." equivalent to "δῖοσ Ἀχιλλεύσ." (Unicode does not direct conforming software to treat ς and σ as canonically or compatibility equivalent
Unicode equivalence
Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character...
).
The main characteristics of CTL language complexity are:
- Bi-directional textBi-directional textBi-directional text is text containing text in both text directionalities, both right-to-left and left-to-right . It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text directionality in each row.Some writing systems of the...
, where characters may be written from either right-to-left or left-to-right direction. - Context-sensitive shaping (ligatures), where a character may change its shape, dependent on its location and/or the surrounding characters. For example, a character in Arabic script can have as many as four different shape-forms, depending on context.
- Ordering, the displayed order of the characters is not the same as the logical order. For example, in Devanagari, which is written from left to right, the grapheme for "short i" appears to the left of ("before") the preceding consonant: in ki, the -i should render on the left, its bow reaching until above the k- to the right.
Implementations
Some CTL implementations do not encapsulate information about specific scripts. In these implementations, the script-specific CTL information resides within the font files. Therefore, they are able to render any script:- Apple Advanced TypographyApple Advanced TypographyApple Advanced Typography is Apple Inc's computer software for advanced font rendering, supporting internationalization and complex features for typographers, a successor to Apple's little-used QuickDraw GX font technology of the mid-1990s...
- GraphiteGraphite (SIL)Graphite is a programmable Unicode-compliant smart-font technology and rendering system developed by SIL International. It is free software, distributed under the terms of the GNU Lesser General Public License and the Common Public License....
Other CTL implementations encapsulate information about specific scripts. In these implementations, the script-specific CTL information is provided by the CTL implementation. Therefore, they are only able to render the scripts that are previously implemented:
- International Components for UnicodeInternational Components for UnicodeInternational Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all...
(ICU) - PangoPangoPango is an LGPL licensed open source computing library used by software developers for laying out and rendering text in high quality, emphasising support for multilingual text...
provides text services to GTK+GTK+GTK+ is a cross-platform widget toolkit for creating graphical user interfaces. It is licensed under the terms of the GNU LGPL, allowing both free and proprietary software to use it. It is one of the most popular toolkits for the X Window System, along with Qt.The name GTK+ originates from GTK;... - Harfbuzz is the new OpenTypeOpenTypeOpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior...
layout engine for Pango and QtQt (toolkit)Qt is a cross-platform application framework that is widely used for developing application software with a graphical user interface , and also used for developing non-GUI programs such as command-line tools and consoles for servers... - UniscribeUniscribeUniscribe is the Microsoft Windows set of services for rendering Unicode-encoded text, especially complex text layout. They are implemented in the DLL USP10.DLL. USP10.dll became available to the public with Windows 2000 and Internet Explorer 5.0...
and its successor, DirectWriteDirectWriteDirectWrite is a text-layout and glyph-rendering API by Microsoft. It was designed to replace GDI/GDI+ and Uniscribe for screen-oriented rendering and was shipped with Windows 7 and Windows Server 2008 R2, as well as Windows Vista and Windows Server 2008 DirectWrite is a text-layout and...
See also
- TypographyTypographyTypography is the art and technique of arranging type in order to make language visible. The arrangement of type involves the selection of typefaces, point size, line length, leading , adjusting the spaces between groups of letters and adjusting the space between pairs of letters...
- UnicodeUnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
- Writing systems which require complex text layout:
- Arabic alphabetArabic alphabetThe Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually stand for consonants, it is classified as an abjad.-Consonants:The Arabic alphabet has...
(technically an abjadAbjadAn abjad is a type of writing system in which each symbol always or usually stands for a consonant; the reader must supply the appropriate vowel....
) - Most of the Brahmic family of scriptsBrahmic family of scriptsThe Brahmic or Indic scripts are a family of abugida writing systems. They are used throughout South Asia , Southeast Asia, and parts of Central and East Asia, and are descended from the Brāhmī script of the ancient Indian subcontinent...
- N'Ko script
- TengwarTengwarThe Tengwar are an artificial script created by J. R. R. Tolkien. In his fictional universe of Middle-earth, the tengwar were invented by the Elf Fëanor, and used first to write the Elven tongues: Quenya, Telerin, and also Valarin. Later a great number of languages of Middle-earth were written...
(diacritics and numbers)
- Arabic alphabet
External links
- Examples of complex rendering — SIL internationalSIL InternationalSIL International is a U.S.-based, worldwide, Christian non-profit organization, whose main purpose is to study, develop and document languages, especially those that are lesser-known, in order to expand linguistic knowledge, promote literacy, translate the Christian Bible into local languages,...
's examples of complex writing systems around the world - Complex Text Layout — The Open GroupThe Open GroupThe Open Group is a vendor and technology-neutral industry consortium, currently with over three hundred member organizations. It was formed in 1996 when X/Open merged with the Open Software Foundation...
's Desktop Technologies - Supporting Indic Scripts in Mozilla — also other CTL scripts
- Project SILA — GraphiteGraphite (SIL)Graphite is a programmable Unicode-compliant smart-font technology and rendering system developed by SIL International. It is free software, distributed under the terms of the GNU Lesser General Public License and the Common Public License....
and MozillaMozillaMozilla is a term used in a number of ways in relation to the Mozilla.org project and the Mozilla Foundation, their defunct commercial predecessor Netscape Communications Corporation, and their related application software....
integration project - CTL Architecture in Solaris — Solaris Globalization Whitepapers
- Complex Scripts — Microsoft Global Development and Computing Portal
- Theppitak's Homepage — information about Thai language processing
- HarfBuzz' page at Freedesktop.orgFreedesktop.orgfreedesktop.org is a project to work on interoperability and shared base technology for free software desktop environments for the X Window System on Linux and other Unix-like operating systems. It was founded by Havoc Pennington from Red Hat in March 2000.The organisation focuses on the user....
- D-Type Unicode Text Module — Portable software library for complex text