ZX Spectrum character set
Encyclopedia
The ZX Spectrum character set is the variant of ASCII
used in the British Sinclair ZX Spectrum computers. It is based on ASCII-1967 (the standard ASCII on which all modern character sets are based), but with one character from ASCII-1963 (the first version of ASCII), two non-standard graphics characters, an idiosyncratic use of the control code area and use of the 128 high-bit characters beyond the ASCII range.
Beyond 0x7F, the Spectrum character set uses the high-bit range, 0x80–0xFF, for special purposes. 0x80–0x8F contain block graphics. 0x90–0xA4 contain the User Defined Graphics (UDGs), which the user can customise with a few lines of BASIC
. 0xA5–0xFF contain tokens (BASIC keywords represented as single characters): for example, pressing P at the beginning of a line would generate the code 0xF6, which would cause the BASIC keyword PRINT to display on the screen. Codes 0xC7–0xC9 are the mathematical operators <= (less-than-or-equal), >= (greater-than-or-equal) and <> (not-equal) respectively; unlike the relational operators of most other systems, these are characters in their own right and cannot be achieved by typing the two constituent symbols one after the other.
Mapping the printable Spectrum Character Set to Unicode
is possible, but fonts containing some of the block graphics characters are still not commonplace.
The default printable characters (32 (space) to 127 (copyright)) are stored at the end of the Spectrum's ROM at memory address 15616 (0x3D00) to 16383 (0x3FFF) and are referenced by the system variable CHARS which can be found at memory address 23606/7. The value in CHARS is actually 256 bytes lower than the first byte of the space character so that referencing a printable ASCII character does not need to consider the first 32 characters. As such, the CHARS value (by default) holds the address 15360 (0x3C00).
The UDG characters (Gr-A to Gr-U) are stored at the end of the Spectrum's RAM at memory address 65368 (0xFF58) to 65535 (0xFFFF). As such, a POKE issued to this address range changes the UDG characters used in subsequent PRINT statements (though not any UDG characters already drawn to the screen). The USR keyword (when followed by a single quoted character) provides a quick method to reference these addresses from BASIC. As with the printable characters, the location of the UDG characters is stored in the system variable UDG.
08 (ASCII Back Space) and ENTER for 0x0D (ASCII Carriage Return
), which also generates an automatic linefeed. Cursor-down 0x0A (ASCII Line Feed) can be simulated with 32 spaces printed with OVER 1 (transparent overprint) and cursor-up 0x0B (ASCII Vertical Tabulation) can be simulated with 32 backspaces. The system ROM has a fault which prevents cursor-right 0x09 (ASCII Horizontal Tabulation) from working.
Control code 0x0e is used to indicate that a floating-point number follows, to accelerate text processing. In a Sinclair BASIC
program, ASCII numbers are followed by a 0x0E byte, and then a 5-byte representation of the number in binary floating point format. If the listing is printed to a printer or the screen these 5 bytes are ignored, but when the program is being run the 5-byte representation is used and the text part is ignored. Some Spectrum programs used this behaviour to hide the real numbers from the user. For example, a BASIC line could contain the ASCII characters GOTO 10, followed by a 0x0e byte and the floating-point representation of 100. Anyone listing the program would see the number 10, but when executed the program would jump to line 100.
(X) characters are User Definable Graphics
1 SPECTRUM under the 128K BASIC
2 PLAY under the 128K BASIC
3 Float number 4-byte mantissa and a 1 byte exponent
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
used in the British Sinclair ZX Spectrum computers. It is based on ASCII-1967 (the standard ASCII on which all modern character sets are based), but with one character from ASCII-1963 (the first version of ASCII), two non-standard graphics characters, an idiosyncratic use of the control code area and use of the 128 high-bit characters beyond the ASCII range.
Printable characters
The printable part of the Spectrum Character Set, 0x20–0x7F, is almost standard, except that 0x60 has the pound sign (£) instead of the grave accent ( ` ) and 0x7F has the copyright sign (©) instead of the control code DEL. The pound sign was mapped to 0x60, and not 0x23 as in the British variant of ASCII (ISO-646-GB), making both pound sign and number sign (#) available universally. Code 0x5E contains an up-arrow (↑) as in ASCII-1963 instead of the ASCII-1967 caret (^); however, 0x5F has an underscore and not a left-arrow.Beyond 0x7F, the Spectrum character set uses the high-bit range, 0x80–0xFF, for special purposes. 0x80–0x8F contain block graphics. 0x90–0xA4 contain the User Defined Graphics (UDGs), which the user can customise with a few lines of BASIC
Sinclair BASIC
Sinclair BASIC is a dialect of the BASIC programming language used in the 8-bit home computers from Sinclair Research and Timex Sinclair...
. 0xA5–0xFF contain tokens (BASIC keywords represented as single characters): for example, pressing P at the beginning of a line would generate the code 0xF6, which would cause the BASIC keyword PRINT to display on the screen. Codes 0xC7–0xC9 are the mathematical operators <= (less-than-or-equal), >= (greater-than-or-equal) and <> (not-equal) respectively; unlike the relational operators of most other systems, these are characters in their own right and cannot be achieved by typing the two constituent symbols one after the other.
Mapping the printable Spectrum Character Set to Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
is possible, but fonts containing some of the block graphics characters are still not commonplace.
The default printable characters (32 (space) to 127 (copyright)) are stored at the end of the Spectrum's ROM at memory address 15616 (0x3D00) to 16383 (0x3FFF) and are referenced by the system variable CHARS which can be found at memory address 23606/7. The value in CHARS is actually 256 bytes lower than the first byte of the space character so that referencing a printable ASCII character does not need to consider the first 32 characters. As such, the CHARS value (by default) holds the address 15360 (0x3C00).
The UDG characters (Gr-A to Gr-U) are stored at the end of the Spectrum's RAM at memory address 65368 (0xFF58) to 65535 (0xFFFF). As such, a POKE issued to this address range changes the UDG characters used in subsequent PRINT statements (though not any UDG characters already drawn to the screen). The USR keyword (when followed by a single quoted character) provides a quick method to reference these addresses from BASIC. As with the printable characters, the location of the UDG characters is stored in the system variable UDG.
Control codes
In the control codes area (the C0 range), the Spectrum uses its own proprietary controls, such as INK and PAPER to control foreground and background colour. The only similarity to ASCII is having cursor-left for 0xHexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...
08 (ASCII Back Space) and ENTER for 0x0D (ASCII Carriage Return
Carriage return
Carriage return, often shortened to return, refers to a control character or mechanism used to start a new line of text.Originally, the term "carriage return" referred to a mechanism or lever on a typewriter...
), which also generates an automatic linefeed. Cursor-down 0x0A (ASCII Line Feed) can be simulated with 32 spaces printed with OVER 1 (transparent overprint) and cursor-up 0x0B (ASCII Vertical Tabulation) can be simulated with 32 backspaces. The system ROM has a fault which prevents cursor-right 0x09 (ASCII Horizontal Tabulation) from working.
Control code 0x0e is used to indicate that a floating-point number follows, to accelerate text processing. In a Sinclair BASIC
Sinclair BASIC
Sinclair BASIC is a dialect of the BASIC programming language used in the 8-bit home computers from Sinclair Research and Timex Sinclair...
program, ASCII numbers are followed by a 0x0E byte, and then a 5-byte representation of the number in binary floating point format. If the listing is printed to a printer or the screen these 5 bytes are ignored, but when the program is being run the 5-byte representation is used and the text part is ignored. Some Spectrum programs used this behaviour to hide the real numbers from the user. For example, a BASIC line could contain the ASCII characters GOTO 10, followed by a 0x0e byte and the floating-point representation of 100. Anyone listing the program would see the number 10, but when executed the program would jump to line 100.
Codepage layout
Spectrum Character Set | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x | 1x | 2x | 3x !! 4x !! 5x !! 6x !! 7x !! 8x !! 9x !! Ax !! Bx !! Cx !! Dx !! Ex !! Fx | |||||||||||||
x0 | INK | 0 | @ | P | £ | p | (A) | (Q) | VAL | USR | FORMAT | LPRINT | LIST | |||
x1 | PAPER | ! | 1 | A | Q | a | q | (B) | (R) | LEN | STR$ | MOVE | LLIST | LET | ||
x2 | FLASH | " | 2 | B | R | b | r | (C) | (S) | SIN | CHR$ | ERASE | STOP | PAUSE | ||
x3 | BRIGHT | # | 3 | C | S | c | s | (D) | (T)1 | COS | NOT | OPEN # | READ | NEXT | ||
x4 | INVERSE | $ | 4 | D | T | d | t | (E) | (U)2 | TAN | BIN | CLOSE # | DATA | POKE | ||
x5 | OVER | % | 5 | E | U | e | u | (F) | RND | ASN | OR | MERGE | RESTORE | |||
x6 | comma/print | AT | & | 6 | F | V | f | v | (G) | INKEY$ | ACS | AND | VERIFY | NEW | PLOT | |
x7 | edit | TAB | ' | 7 | G | W | g | w | (H) | PI | ATN | <= | BEEP | BORDER | RUN | |
x8 | left | ( | 8 | H | X | h | x | (I) | FN | LN | >= | CIRCLE | CONTINUE | SAVE | ||
x9 | right | ) | 9 | I | Y | i | y | (J) | POINT | EXP | <> | INK | DIM | RANDOMIZE | ||
xA | down | * | : | J | Z | j | z | (K) | SCREEN$ | INT | LINE | PAPER | REM | IF | ||
xB | up | + | ; | K | [ | k | { | (L) | ATTR | SQR | THEN | FLASH | FOR | CLS | ||
xC | delete | , | < | L | \ | l | (M) | AT | SGN | TO | BRIGHT | GO TO | DRAW | |||
xD | enter | - | = | M | ] | m | } | (N) | TAB | ABS | STEP | INVERSE | GO SUB | CLEAR | ||
xE | number 3 | . | > | N | ↑ | n | ~ | (O) | VAL$ | PEEK | DEF FN | OVER | INPUT | RETURN | ||
xF | / | ? | O | _ | o | © | (P) | CODE | IN | CAT | OUT | LOAD | COPY |
(X) characters are User Definable Graphics
1 SPECTRUM under the 128K BASIC
2 PLAY under the 128K BASIC
3 Float number 4-byte mantissa and a 1 byte exponent
External links
- Sinclair Spectrum+ 48K Character Set From Michael Zaretski's website
- Mapping table from Sinclair Spectrum+ 48K Character Set to Unicode From the same site
- The floating point package