Half-width kana
Encyclopedia
are katakana
Katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji, and in some cases the Latin alphabet . The word katakana means "fragmentary kana", as the katakana scripts are derived from components of more complex kanji. Each kana represents one mora...

 characters displayed at half their normal width (a 2:1 aspect ratio
Aspect ratio
The aspect ratio of a shape is the ratio of its longer dimension to its shorter dimension. It may be applied to two characteristic dimensions of a three-dimensional shape, such as the ratio of the longest and shortest axis, or for symmetrical objects that are described by just two measurements,...

), instead of the usual square (1:1) aspect ratio. For example, the usual (full-width) form of the katakana ka is カ while the half-width form is カ. There are no half-width hiragana
Hiragana
is a Japanese syllabary, one basic component of the Japanese writing system, along with katakana, kanji, and the Latin alphabet . Hiragana and katakana are both kana systems, in which each character represents one mora...

 or kanji
Kanji
Kanji are the adopted logographic Chinese characters hanzi that are used in the modern Japanese writing system along with hiragana , katakana , Indo Arabic numerals, and the occasional use of the Latin alphabet...

.

Half-width kana were used in the early days of Japanese computing, to allow Japanese characters to be displayed on the same grid as monospaced fonts of Latin characters. Half-width hiragana or kanji were not used. Half-width kana characters are not generally used today, but find some use in specific settings, such as cash register displays, on shop receipts, and Japanese digital television and DVD subtitles. They also find occasional use online (in email or webpages) – for example, (normally written in hiragana or sometimes kanji 貴方) might be written as アナタ.

The term "half-width kana", which strictly refers only to how kana are displayed, not how they are stored – is also used loosely to refer to the A0–DF (hexadecimal) block where katakana are stored in some character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

s, such as JIS X 0201
JIS X 0201
JIS X 0201, a Japanese Industrial Standard developed in 1969 , was the first Japanese character set to become widely used. It is either 7-bit encoding or 8-bit encoding, although 8-bit encoding is dominant for modern use...

 (1969) – see encodings, below. This is formally incorrect, however – this JIS standard simply specifies that katakana be stored in these locations, without specifying how they should be displayed; the confusion is because in early computing, the characters stored here were in fact displayed as half-width kana – see confusion, below.

History

ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 is defined as a 7-bit character set and has room for 128 characters. However, since this standard was designed for the United States
United States
The United States of America is a federal constitutional republic comprising fifty states and a federal district...

, it does not contain characters and symbols, such as the yen (¥) symbol needed to represent Japanese currency, nor did it include space for characters from other alphabets, such as kana or kanji – thus Japanese characters could not be encoded. Further, Japanese characters, both kana and kanji, are drawn on a square grid, while Latin characters are generally written more narrowly – thus Japanese characters could not be displayed either.

JIS X 0201
JIS X 0201
JIS X 0201, a Japanese Industrial Standard developed in 1969 , was the first Japanese character set to become widely used. It is either 7-bit encoding or 8-bit encoding, although 8-bit encoding is dominant for modern use...

 was developed in 1969, a time when computers were generally incapable, both by software design and hardware resources, of representing the thousands of Chinese-based kanji
Kanji
Kanji are the adopted logographic Chinese characters hanzi that are used in the modern Japanese writing system along with hiragana , katakana , Indo Arabic numerals, and the occasional use of the Latin alphabet...

 characters used in Japanese. As a compromise, this standard encoded katakana (only – not hiragana or kanji) as a small set of characters, assigned in the upper byte value range of 0x80–0xFF. This allowed 8-bit processors to encode and process Japanese text phonetically (as katakana), though without being able to process hiragana or kanji. These katakana characters were in turn displayed as "half-width kana" – a new, unorthodox, narrower form factor to fit the same width as the monospaced Latin alphabets machines were capable of printing and displaying. Encoding-wise, JIS X 0201 is a variant extension of ASCII – it includes additional characters, and does not exactly agree with ASCII on the overlapping part (the Latin character section).

Half-width kana were developed as "...the first Japanese characters encoded on computers because they are used for Japanese telegrams."

To make katakana fit into the narrower cell area allowed, some compromises were made. For example, the diacritical marks dakuten
Dakuten
, colloquially ten-ten , is a diacritic sign most often used in the Japanese kana syllabaries to indicate that the consonant of a syllable should be pronounced voiced. Handakuten , colloquially maru , is a diacritic used with the kana for syllables starting with h to indicate that they should...

and handakuten are treated as separate characters instead of being part of the preceding character. This compromise led many to consider "half-width kana" visually unattractive, and causes problems for many computer programs today.

Encoding

In the JIS X 0201
JIS X 0201
JIS X 0201, a Japanese Industrial Standard developed in 1969 , was the first Japanese character set to become widely used. It is either 7-bit encoding or 8-bit encoding, although 8-bit encoding is dominant for modern use...

 specification (1969), katakana are encoded in A0–DF (hexadecimal) block – how they are displayed is not specified, and there is no separate encoding of full-width and half-width kana. In JIS X 0208
JIS X 0208
JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is...

, katakana, hiragana, and kanji are all encoded (and displayed as full-width characters; there are no half-width characters), though the ordering of the kana is different – see JIS X 0208#Hiragana and katakana.

In Shift JIS, which combines JIS X 0201 and JIS X 0208, these encodings (both of which can encode Latin characters and katakana) are stored separately, with JIS X 0201 all being displayed as half-width (thus the JIS X 0201 katakana are displayed as half-width kana), while JIS X 0208 are all displayed as full-width (thus the JIS X 0208 Latin characters are all displayed as full-width Latin characters). Thus in Shift JIS, Latin characters and katakana have two encodings with two separate display forms, both half-width and full-width.

In Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

, katakana and hiragana are primarily used as normal, full-width characters (the Katakana and Hiragana blocks are displayed as full-width characters); a separate block, the Halfwidth and Fullwidth Forms
Halfwidth and Fullwidth Forms
In CJK computing, graphic characters are traditionally classed into fullwidth and halfwidth characters...

 block is used to store variant characters, including half-width kana and full-width Latin characters.

Thus, the katakana in JIS X 0201 and the corresponding part of derived encodings (the JIS X 0201 part of Shift JIS) are displayed as half-width, while in Unicode half-width forms are specified separately.

Half-width table

"J" indicates the first four bits in JIS X 0201
JIS X 0201
JIS X 0201, a Japanese Industrial Standard developed in 1969 , was the first Japanese character set to become widely used. It is either 7-bit encoding or 8-bit encoding, although 8-bit encoding is dominant for modern use...

 (though see below, these do not necessarily indicate half-width) and in other sets such as Shift JIS, "U" indicates the row in Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 in the Halfwidth and Fullwidth Forms block.
J U 0 1 2 3 4 5 6 7 8 9 A B C D E F
A FF6  
B FF7 ソ
C FF8
D FF9

E-mail

Since the SMTP and NNTP protocols (used to deliver e-mail and Usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

, respectively) were formerly only able to transmit 7-bits, it was then the convention to use ISO-2022-JP for sending e-mail in Japanese.

Since half-width kana is not contained in ISO-2022-JP (it includes the Roman set of JIS X 0201, and all of JIS X 0208, but not the katakana set of JIS X 0201, which is used for half-width kana in Shift JIS, for instance), half-width kana could not be included in a message; if half-width kana were accidentally included in a message, it could become garbled during transmission (see mojibake
Mojibake
, from the Japanese 文字 "character" + 化け "change", is the occurrence of incorrect, unreadable characters shown when computer software fails to render text correctly according to its associated character encoding.-Causes:...

).

This is no longer such a problem since most e-mail servers today use ESMTP, and hence 8-bit characters are acceptable. Alternatively, an encoding system such as Base64 can be used and specified in the message using MIME
MIME
Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...

.

Web pages

The problem that exists in e-mail does not exist with Web pages since HTTP accepts 8-bit characters.

A problem that does exist is that computer programs have difficulties whether to treat a character as Shift JIS, EUC-JP, or UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 – hence character code information should be specified with a HTTP response header or a Meta tag.

Confusion

Strictly speaking, JIS X 0201 encoding as "half-width katakana" is incorrect, as the standard does not define character widths – it defines only the code representation of katakana characters. In the JIS X 0201 standard, katakana characters are printed in normal (full) width, not half-width.

Half-width characters were only used for display during the period when characters were displayed at half-width (and single-byte encodings were used), before full-width character displays (and associated double-byte encodings such as JIS X 0208) became widespread. However, in the Shift JIS standard, which combines the JIS X 0201 standard (whose characters – Latin and katakana – were displayed as half-width) and the JIS X 0208 standard (whose characters – katakana, hiragana, kanji, and Latin – were displayed as full-width), katakana and Latin characters are encoded twice, both in JIS X 0201 and JIS 0208, but displayed as half-width or full-width according to which section they are in (0201 or 0208) – thus the 0201 katakana block can be thought of as corresponding to "half-width kana", and the misunderstanding that the 0201 standard defines "half-width" characters is widespread.

Further, though JIS X 0201 is a single-byte encoding (and displayed at half-width) and JIS X 0208 is a double-byte encoding (and displayed at full-width), there is no connection between number of bytes and width (other than these corresponding in Shift JIS, as above) – for example, Unicode can be encoded is four bytes (UTF-32) to display both full-width and single-width characters.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK