ISCII
Encyclopedia
Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India
India
India , officially the Republic of India , is a country in South Asia. It is the seventh-largest country by geographical area, the second-most populous country with over 1.2 billion people, and the most populous democracy in the world...

. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Assamese
Assamese script
The Assamese script is a variant of the Eastern Nagari script also used for Bengali and Bishnupriya Manipuri. The Eastern Nagari script belongs to the Brahmic family of scripts and has a continuous history of development from Nagari script, a precursor of Devanagari...

, Bengali (Bengla)
Bengali script
The Bengali alphabet is the writing system for the Bengali language. The script with variations is used for Assamese and is basis for Meitei, Bishnupriya Manipuri, Kokborok, Garo and Mundari alphabets. All these languages are spoken in the eastern region of South Asia. Historically, the script has...

, Devanagari
Devanagari
Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...

, Gujarati
Gujarati script
The Gujarati script , which like all Nāgarī writing systems is strictly speaking an abugida rather than an alphabet, is used to write the Gujarati and Kutchi languages...

, Gurmukhi, Kannada
Kannada script
The Kannada script is an alphasyllabary of the Brahmic family, used primarily to write the Kannada language, one of the Dravidian languages of southern India and also Sanskrit in the past. The Telugu script is derived from Old Kannada, and resembles Kannada script...

, Malayalam
Malayalam script
The Malayalam script is a Brahmic script used commonly to write the Malayalam language—which is the principal language of the Indian state of Kerala, spoken by 36 million people in the world. Like many other Indic scripts, it is an abugida, or a writing system that is partially “alphabetic” and...

, Oriya (Odia)
Oriya script
The Oriya script or Utkala Lipi or Utkalakshara is used to write the Oriya language, and can be used for several other Indian languages, for example, Sanskrit.- History :...

, Tamil
Tamil script
The Tamil script is a script that is used to write the Tamil language as well as other minority languages such as Badaga, Irulas, and Paniya...

, and Telugu
Telugu script
Telugu script, an abugida from the Brahmic family of scripts, is used to write the Telugu language, a language found in the South-Central Indian state of Andhra Pradesh as well as several other neighboring states. The Telugu script is derived from the Bhattiprolu script...

. ISCII does not encode the writing systems of India based on Arabic, but its writing system switching codes nonetheless provide for Kashmiri
Kashmiri language
Kashmiri is a language from the Dardic sub-group and it is spoken primarily in the Kashmir Valley, in Jammu and Kashmir. There are approximately 5,554,496 speakers in Jammu and Kashmir, according to the Census of 2001. Most of the 105,000 speakers or so in Pakistan are émigrés from the Kashmir...

, Sindhi
Sindhi language
Sindhi is the language of the Sindh region of Pakistan that is spoken by the Sindhi people. In India, it is among 22 constitutionally recognized languages, where Sindhis are a sizeable minority. It is spoken by 53,410,910 people in Pakistan, according to the national government's Statistics Division...

, Urdu
Urdu
Urdu is a register of the Hindustani language that is identified with Muslims in South Asia. It belongs to the Indo-European family. Urdu is the national language and lingua franca of Pakistan. It is also widely spoken in some regions of India, where it is one of the 22 scheduled languages and an...

, Persian
Persian language
Persian is an Iranian language within the Indo-Iranian branch of the Indo-European languages. It is primarily spoken in Iran, Afghanistan, Tajikistan and countries which historically came under Persian influence...

, Pashto and Arabic. The Arabic-based writing systems were subsequently encoded in the PASCII
Perso-Arabic Script Code for Information Interchange
Perso-Arabic Script Code for Information Interchange is one of the Indian government standards for encoding languages using writing systems based on that of Arabic, in particular Kashmiri, Persian, Sindhi, and Urdu...

 encoding.

The Brahmi-derived writing systems are mostly rather similar in structure, but have different letter shapes. So ISCII encodes letters with the same phonetic value at the same codepoint, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent [ki]. This will be rendered as कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the ATR code described below.

One motivation for the use of a single encoding is the idea that it will allow easy transliteration
Transliteration
Transliteration is a subset of the science of hermeneutics. It is a form of translation, and is the practice of converting a text from one script into another...

 from one writing system to another. However, there are enough incompatibilities that this is not really a practical idea. See About ISCII.

ISCII is a stateful 8-bit encoding. The lower 128 codepoints are plain ASCII, the upper 128 codepoints are ISCII-specific. In addition to the codepoints representing characters, ISCII makes use of a codepoint with mnemonic ATR that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes, such as bold and italic. ISCII does not provide a means of indicating the default writing system.

ISCII has not been widely used outside of certain government institutions and has now been rendered largely obsolete by Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

. Unicode uses a separate block for each Indic writing system, and largely preserves the ISCII layout within each block.

Codepage layout

The following table shows the character set for Devanagari
Devanagari
Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...

. The code sets for Assamese, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu are similar, with each Devanagari form replaced by the equivalent form in each writing system
Brahmic family of scripts
The Brahmic or Indic scripts are a family of abugida writing systems. They are used throughout South Asia , Southeast Asia, and parts of Central and East Asia, and are descended from the Brāhmī script of the ancient Indian subcontinent...

. Each character is shown with its decimal code and its Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 equivalent.
Windows-1251
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F
|
|
|

Special code points

INV character—code point D9 (217): The INV character is used as a pseudo-consonant to display combining elements in isolation. For example, क (ka) + ् (halant) + INV = क्‍ (half ka). The Unicode equivalent is no break space 00A0 or dotted circle ◌ 25CC.
ATR character—code point EF (239): The ATR character followed by a byte code is used to switch to a different font attribute (such as bold) or language (such as Bengali), up to the next ATR sequence or the end of the line. This has no direct Unicode equivalent, as font attributes are not part of Unicode, and each script has a distinct set of code points.
EXT character—code point F0 (240): The EXT character followed by a byte code indicates a Vedic accent. This has no direct Unicode equivalent, as Vedic accents are assigned to distinct code points.
Halant character ़—code point E8 (232): The halant character removes the implicit vowel from a consonant and is used between consonants to represent conjunct consonants. For example, क (ka) + ् (halant) + त (ta) = क्त (kta). The sequence ् (halant) + ् (halant) displays a conjunct with an explicit halant, for example क (ka) + ् (halant) + ् (halant) + त (ta) = क्‌त. The sequence ् (halant) + ़ (nukta) displays a conjunct with half consonants, if available, for example क (ka) + ् (halant) + ़ (nukta) + त (ta) = क्‍त.
ISCII | Unicode
single halant E8 halant 094D
halant + halant E8 E8 halant + ZWNJ
Zero-width non-joiner
The zero-width non-joiner is a non-printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively...

 
094D 200C
halant + nukta E8 E9 halant + ZWJ
Zero-width joiner
The zero-width joiner is a non-printing character used in the computerized typesetting of some complex scripts, such as the Arabic script or any of the Indic scripts. When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected...

 
094D 200D

Nukta character ़—code point E9 (233): The nukta
Nukta
Nukta is a generic term for the diacritic mark in several Brahmic scripts, like Devanagari that is used to represent sounds from other languages by being applied to an existing character. The word nukta, originates from the Arabic word nuqta ....

 character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have precomposed forms in Unicode, as shown in the following table.
ISCII
code point
Original
character
Character
with nukta
Unicode
code point
A1 (161) 0950
A6 (166) 090C
A7 (167) 0961
AA (176) 0960
B3 (179) क़ 0958
B4 (180) ख़ 0959
B5 (181) ग़ 095A
BA (186) ज़ 095B
BF (191) ड़ 095C
C0 (192) ढ़ 095D
C9 (201) फ़ 095E
DB (219) ि 0962
DC (220) 0963
DF (223) 0944
EA (234) 0964

Code points for all languages

Each alphabet is listed in the order of its ISCII code point. Code points with asterisks (*) indicate the code point followed by nukta, e.g. क (ka) + ़ = क़ (qa); इ (i) + ़ = ऌ (ḷ). Each character is listed along with its Unicode code point.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK