Left-to-right mark
Encyclopedia
The left-to-right mark is a control character
or non-printing character, used in the computerized typesetting
of bi-directional text
, containing mixed left-to-right scripts (such as English
and Russian
) and right-to-left scripts (such as Arabic
and Hebrew
). It is used to change the way adjacent characters are grouped with respect to text direction.
, LRM is encoded . UTF-8
is E2 80 8E. Usage is prescribed in the Unicode Bidi (bidirectional) algorithm.
لغة C++ هي لغة برمجة تستخدم...
With an LRM mark entered in the HTML after the ++, it renders as follows:
لغة C++ هي لغة برمجة تستخدم...
Standards-compliant browsers will render the ++ on the left in the first example, and on the right in the second
This happens because the browser recognizes that the paragraph is in a RTL script (Arabic), and applies punctuation, which is neutral as to its direction, in coordination with the more prominent (paragraph level) adjacent text. The LRM causes the punctuation to be adjacent to only LTR text - the "C" and the LRM mark - and hence position as if it were in left-to-right text, i.e., to the right of the preceding text.
Control character
In computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....
or non-printing character, used in the computerized typesetting
Typesetting
Typesetting is the composition of text by means of types.Typesetting requires the prior process of designing a font and storing it in some manner...
of bi-directional text
Bi-directional text
Bi-directional text is text containing text in both text directionalities, both right-to-left and left-to-right . It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text directionality in each row.Some writing systems of the...
, containing mixed left-to-right scripts (such as English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
and Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
) and right-to-left scripts (such as Arabic
Arabic language
Arabic is a name applied to the descendants of the Classical Arabic language of the 6th century AD, used most prominently in the Quran, the Islamic Holy Book...
and Hebrew
Hebrew language
Hebrew is a Semitic language of the Afroasiatic language family. Culturally, is it considered by Jews and other religious groups as the language of the Jewish people, though other Jewish languages had originated among diaspora Jews, and the Hebrew language is also used by non-Jewish groups, such...
). It is used to change the way adjacent characters are grouped with respect to text direction.
In Unicode
In UnicodeUnicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
, LRM is encoded . UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
is E2 80 8E. Usage is prescribed in the Unicode Bidi (bidirectional) algorithm.
Example of use in HTML
Suppose the writer wishes to inject a run of English text (i.e. left-to-right) text into an Arabic or Hebrew paragraph, with non-alphabetic characters at the end of the English text (on the right). "The language C++ is a programming language used..." in Arabic, but with the "C++" in English renders as follows: لغة C++ هي لغة برمجة تستخدم...
With an LRM mark entered in the HTML after the ++, it renders as follows:
لغة C++ هي لغة برمجة تستخدم...
Standards-compliant browsers will render the ++ on the left in the first example, and on the right in the second
This happens because the browser recognizes that the paragraph is in a RTL script (Arabic), and applies punctuation, which is neutral as to its direction, in coordination with the more prominent (paragraph level) adjacent text. The LRM causes the punctuation to be adjacent to only LTR text - the "C" and the LRM mark - and hence position as if it were in left-to-right text, i.e., to the right of the preceding text.