Bi-directional text
Encyclopedia
Bi-directional text is text containing text in both text directionalities, both right-to-left
Right-to-left
A language is described as right-to-left if writing starts from the right of the page, and continues to the left. Right to left scripts are:* Arabic alphabet - used for Arabic, Persian, Urdu and many other languages....

 (RTL) and left-to-right (LTR). It generally involves text containing different types of alphabet
Alphabet
An alphabet is a standard set of letters—basic written symbols or graphemes—each of which represents a phoneme in a spoken language, either as it exists now or as it was in the past. There are other systems, such as logographies, in which each character represents a word, morpheme, or semantic...

s, but may also refer to boustrophedon
Boustrophedon
Boustrophedon , is a type of bi-directional text, mostly seen in ancient manuscripts and other inscriptions. Every other line of writing is flipped or reversed, with reversed letters. Rather than going left-to-right as in modern English, or right-to-left as in Arabic and Hebrew, alternate lines in...

, which is changing text directionality in each row.

Some writing system
Writing system
A writing system is a symbolic system used to represent elements or statements expressible in language.-General properties:Writing systems are distinguished from other possible symbolic communication systems in that the reader must usually understand something of the associated spoken language to...

s of the world, notably the Arabic
Arabic alphabet
The Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually stand for consonants, it is classified as an abjad.-Consonants:The Arabic alphabet has...

 and Hebrew
Hebrew alphabet
The Hebrew alphabet , known variously by scholars as the Jewish script, square script, block script, or more historically, the Assyrian script, is used in the writing of the Hebrew language, as well as other Jewish languages, most notably Yiddish, Ladino, and Judeo-Arabic. There have been two...

 scripts, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. This is different from the left-to-right (LTR) direction used by most languages in the world. When LTR text is mixed with RTL in the same paragraph, each type of text is written in its own direction, which is known as bi-directional text. This can get rather complex when multiple levels of quotation are used.

Many computer programs fail to display bi-directional text correctly.
For example, the Hebrew name Sarah (שרה) is spelled shin (ש) resh (ר) heh (ה) from right to left.
Some Web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

s may display the Hebrew text in this article in the opposite direction.

Unicode support

Bidirectional script support is the capability of a computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

 system to correctly display bi-directional text. The term is often shortened to the jargon
Jargon
Jargon is terminology which is especially defined in relationship to a specific activity, profession, group, or event. The philosophe Condillac observed in 1782 that "Every science requires a special language because every science has its own ideas." As a rationalist member of the Enlightenment he...

 term BiDi or bidi.

Early computer installations were designed only to support a single writing system
Writing system
A writing system is a symbolic system used to represent elements or statements expressible in language.-General properties:Writing systems are distinguished from other possible symbolic communication systems in that the reader must usually understand something of the associated spoken language to...

, typically for left-to-right scripts based on the Latin alphabet
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...

 only. Adding new character sets and character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

s enabled a number of other left-to-right scripts to be supported, but did not easily support right-to-left scripts such as Arabic
Arabic alphabet
The Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually stand for consonants, it is classified as an abjad.-Consonants:The Arabic alphabet has...

 or Hebrew
Hebrew alphabet
The Hebrew alphabet , known variously by scholars as the Jewish script, square script, block script, or more historically, the Assyrian script, is used in the writing of the Hebrew language, as well as other Jewish languages, most notably Yiddish, Ladino, and Judeo-Arabic. There have been two...

, and mixing the two was not practical. Right-to-left scripts were introduced through encodings like ISO/IEC 8859-6
ISO/IEC 8859-6
ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was...

 and ISO/IEC 8859-8
ISO/IEC 8859-8
ISO/IEC 8859-8:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Hebrew...

, storing the letters (usually) in writing and reading order. It is possible to simply flip the left-to-right display order to a right-to-left display order, but doing this sacrifices the ability to correctly display left-to-right scripts. With bidirectional script support, it is possible to mix scripts from different scripts on the same page, regardless of writing direction.

In particular, the Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 standard provides foundations for complete BiDi support, with detailed rules as to how mixtures of left-to-right and right-to-left scripts are to be encoded and displayed.

In Unicode encoding, all non-punctuation characters
Character (computing)
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....

 are stored in writing order. This means that the writing direction of characters is stored within the characters. If this is the case, the character is called "strong". Punctuation characters however, can appear in both LTR and RTL scripts. They are called "weak" characters because they do not contain any directional information. So it is up to the software to decide in which direction these "weak" characters will be placed. Sometimes (in mixed-directions text) this leads to display errors, caused by the BiDi-algorithm that runs through the text and identifies LTR and RTL strong characters and assigns a direction to weak characters, according to the algorithm's rules.

In the algorithm, each sequence of concatenated strong characters is called a "run". A weak character that is located between two strong characters with the same orientation will inherit their orientation. A weak character that is located between two strong characters with a different writing direction, will inherit the main context's writing direction (in an LTR document the character will become LTR, in an RTL document, it will become RTL). If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. These errors are corrected or prevented with "pseudo-strong" characters. Such Unicode control characters
Unicode control characters
Many Unicode control characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters...

 are called marks. The mark ( or ) is to be inserted into a location to make an enclosed weak character inherit its writing direction.

For example, to correctly display the for an English name brand (LTR) in an Arabic (RTL) passage, an LRM mark is inserted after the trademark symbol if the symbol is not followed by LTR text. If the LRM mark is not added, the weak character will be neighbored by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order.

Possible BiDi-types of a character, to be used by the BiDi algorithm, are:

Scripts using bi-directional text

There are very few scripts
Writing system
A writing system is a symbolic system used to represent elements or statements expressible in language.-General properties:Writing systems are distinguished from other possible symbolic communication systems in that the reader must usually understand something of the associated spoken language to...

 that can be written in either direction.

Writing a boustrophedon
Boustrophedon
Boustrophedon , is a type of bi-directional text, mostly seen in ancient manuscripts and other inscriptions. Every other line of writing is flipped or reversed, with reversed letters. Rather than going left-to-right as in modern English, or right-to-left as in Arabic and Hebrew, alternate lines in...

 requires every second line to use mirrored glyphs.

Egyptian
Egyptian language
Egyptian is the oldest known indigenous language of Egypt and a branch of the Afroasiatic language family. Written records of the Egyptian language have been dated from about 3400 BC, making it one of the oldest recorded languages known. Egyptian was spoken until the late 17th century AD in the...

 hieroglyphs can be written bi-directional too, where the signs had a distinct "head" that faced the beginning of a line and "tail" that faced the end.

Chinese characters can also be written in either direction as well as vertically (top to bottom then right to left), especially in signs (such as plaques), but the orientation of the individual characters is never changed. This can often be seen on tour buses in China, where the company name customarily runs from the front of the vehicle to its rear - that is, from right to left on the right side of the bus, and from left to right on the left side of the bus.
Another variety of writing style, called boustrophedon
Boustrophedon
Boustrophedon , is a type of bi-directional text, mostly seen in ancient manuscripts and other inscriptions. Every other line of writing is flipped or reversed, with reversed letters. Rather than going left-to-right as in modern English, or right-to-left as in Arabic and Hebrew, alternate lines in...

,
was used in some ancient Greek
Greek language
Greek is an independent branch of the Indo-European family of languages. Native to the southern Balkans, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history;...

 inscriptions, Tuareg, and Hungarian runes. This method of writing alternates direction, and usually reverses the individual characters, on each successive line.

See also

  • Internationalization and localization
    Internationalization and localization
    In computing, internationalization and localization are means of adapting computer software to different languages, regional differences and technical requirements of a target market...

  • Horizontal and vertical writing in East Asian scripts
    Horizontal and vertical writing in East Asian scripts
    Many East Asian scripts can be written horizontally or vertically. The Chinese, Japanese and Korean scripts can be oriented in either direction, as they consist mainly of disconnected syllabic units, each occupying a square block of space...

  • Writing system#Directionality (section on directionality)
  • Combining Cyrillic Millions
    Combining Cyrillic Millions
    The Combining Cyrillic Millions is a modifier in the Cyrillic numerals system, representing a multiplier of one million. The character is combined with a letter representing a number to modify it....

  • Transformation of text
    Transformation of text
    Transformation of text is strategies to perform geometric transformations on text , particularly in systems that do not natively support transformation, such as HTML, seven-segment displays and plain text.-Implementation:...

  • Boustrophedon
    Boustrophedon
    Boustrophedon , is a type of bi-directional text, mostly seen in ancient manuscripts and other inscriptions. Every other line of writing is flipped or reversed, with reversed letters. Rather than going left-to-right as in modern English, or right-to-left as in Arabic and Hebrew, alternate lines in...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK