TeX font metric
Encyclopedia
TeX font metric is a font
file format
used by the TeX
typesetting
system. It is a font metric format, not an outline font format like TrueType
, because it provides only the information necessary to typeset the font such as each character's width, height and depth. The actual glyphs are stored elsewhere. This is not unique to TeX; Adobe's AFM files and Windows' PFM files use the same technique.
TFM files contain all of the information TeX needs to produce its device-independent (DVI
) output. The actual glyphs are then inserted by the eventual DVI output driver or previewer, using, for instance, TrueType
fonts, or fonts in METAFONT
's PK format. The format is designed to be extremely compact: in the original Computer Modern
distribution, every font's TFM file is smaller than 2kb.
of the program TFtoPL.
A TFM file is broken down into a series of four-byte words, which can contain data fields of various lengths. Any data fields that are more than one byte long are held in big endian order. (The exact same file will be generated, regardless of architecture of the computer generating it.) The six-word (24-byte) file header contains twelve unsigned 16-bit integers which describe the length of the file, the range of character codes contained in the font, and the size of each of the tables. A single TFM file describes between 0 and 256 characters, inclusive.
The body of the TFM file consists of a series of ten tables, each one except for the first laid out as an array of fixed-length fields. A 32-bit signed fixed-point number
with 12 bits to the left of the decimal point, referred to as a
with one set of fonts from being printed with a different set, as well as ASCII descriptions of the character coding scheme (e.g.,
The next table,
There then follow the four tables
Ligatures
and kerning
are represented using a simple programming language consisting of fixed-length four-byte operations in the
Extensible characters are specified in the
/ /
| |
| |
< |
| |
| |
\ \
Of course, the font would use specially designed characters for this, instead of reusing existing ones, but the principle is the same.
The final table,
and the amount of italic slant (to determine how far to shift accents). Certain coding schemes such as
equivalent to the TFM format called PL, for property list. There is an exact correspondence between a TFM file and a PL file: one can be freely converted to the other and back again with no loss of information using the
For example, this is the code for the upper-case letter Y in Computer Modern
Roman
, ten point:
(CHARACTER C Y
(CHARWD R 0.750002)
(CHARHT R 0.683332)
(CHARIC R 0.025)
(COMMENT
(KRN C e R -0.083334)
(KRN C o R -0.083334)
(KRN C r R -0.083334)
(KRN C a R -0.083334)
(KRN C A R -0.083334)
(KRN C u R -0.083334)
)
)
The kerning values seen here are copied from the another section of the PL file in order to make it easier to read, which in itself is redundant. Notice how the full numeric values of the kerning constants are written out each time they appear, instead of being stored once and referred to by a much smaller index.
Typeface
In typography, a typeface is the artistic representation or interpretation of characters; it is the way the type looks. Each type is designed and there are thousands of different typefaces in existence, with new ones being developed constantly....
file format
File format
A file format is a particular way that information is encoded for storage in a computer file.Since a disk drive, or indeed any computer storage, can store only bits, the computer must have some way of converting information to 0s and 1s and vice-versa. There are different kinds of formats for...
used by the TeX
TeX
TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....
typesetting
Typesetting
Typesetting is the composition of text by means of types.Typesetting requires the prior process of designing a font and storing it in some manner...
system. It is a font metric format, not an outline font format like TrueType
TrueType
TrueType is an outline font standard originally developed by Apple Computer in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript...
, because it provides only the information necessary to typeset the font such as each character's width, height and depth. The actual glyphs are stored elsewhere. This is not unique to TeX; Adobe's AFM files and Windows' PFM files use the same technique.
TFM files contain all of the information TeX needs to produce its device-independent (DVI
DVI (file format)
The Device independent file format is the output file format of the TeX typesetting program, designed by David R. Fuchs in 1979. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a...
) output. The actual glyphs are then inserted by the eventual DVI output driver or previewer, using, for instance, TrueType
TrueType
TrueType is an outline font standard originally developed by Apple Computer in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript...
fonts, or fonts in METAFONT
METAFONT
Metafont is a programming language used to define vector fonts. It is also the name of the interpreter that executes Metafont code, generating the bitmap fonts that can be embedded into e.g. PostScript...
's PK format. The format is designed to be extremely compact: in the original Computer Modern
Computer Modern
Computer Modern is the family of typefaces used by default by the typesetting program TeX. It was created by Donald Knuth with his METAFONT program, and was most recently updated in 1992. However, the family font was superseded by CM-Super , the latest release dating 2008...
distribution, every font's TFM file is smaller than 2kb.
Specification
The canonical specification of the TFM format is embedded in the source codeSource code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
of the program TFtoPL.
A TFM file is broken down into a series of four-byte words, which can contain data fields of various lengths. Any data fields that are more than one byte long are held in big endian order. (The exact same file will be generated, regardless of architecture of the computer generating it.) The six-word (24-byte) file header contains twelve unsigned 16-bit integers which describe the length of the file, the range of character codes contained in the font, and the size of each of the tables. A single TFM file describes between 0 and 256 characters, inclusive.
The body of the TFM file consists of a series of ten tables, each one except for the first laid out as an array of fixed-length fields. A 32-bit signed fixed-point number
Fixed-point arithmetic
In computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits after the radix point...
with 12 bits to the left of the decimal point, referred to as a
fix_word
, is used heavily. The first table, header
, contains a checksum designed to prevent a document compiled into a DVIDVI (file format)
The Device independent file format is the output file format of the TeX typesetting program, designed by David R. Fuchs in 1979. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a...
with one set of fonts from being printed with a different set, as well as ASCII descriptions of the character coding scheme (e.g.,
ASCII
or TeX text
) and the font family. It also contains the font's design size; all following fix_word
values are interpreted as multiplication factors for this.The next table,
char_info
, consists of one word per character, and contains indexes into the width, height, depth and italic correction tables. This is a device to save space, because width values, for instance, are frequently duplicated. Because height and depth values are duplicated more frequently, to fit all of these values into a single word, the indexes are limited to four bits. Because of this, there is a limit of sixteen different character heights and sixteen different character depths in any given TFM file. Also, there is a limit of sixty-four different italic corrections. There is also one more index which can point into the lig_kern
table, or to information about extensible characters, depending on a two-bit tag
value. Extensible characters use a series of repeated characters to construct a single large one of arbitrary size, usually large delimiters such as parentheses or brackets.There then follow the four tables
width
, height
, depth
and italic
, which contain values (in fix_word
format) referred to by indexes in char_info
.Ligatures
Ligature (typography)
In writing and typography, a ligature occurs where two or more graphemes are joined as a single glyph. Ligatures usually replace consecutive characters sharing common components and are part of a more general class of glyphs called "contextual forms", where the specific shape of a letter depends on...
and kerning
Kerning
In typography, kerning is the process of adjusting the spacing between characters in a proportional font, usually to achieve a visually pleasing result. Kerning is the adjustment of the space between individual letter forms vs. tracking which is the uniform adjustment of spacing applied over a...
are represented using a simple programming language consisting of fixed-length four-byte operations in the
lig_kern
table; it makes use of kerning values (specified as fix_word
s) in the kern
table, which follows it.Extensible characters are specified in the
exten
table, using a series of four-byte words specifying the top, middle, bottom and repeated sections of an extensible character. For instance, the character at left below would be obtained by setting (top
, mid
, bot
, rep
) to the character codes for (/, <, \, |). The first three character codes can be set to zero. For instance, if mid
were set to 0 in the previous example, the result would change from the brace drawn at left to the parenthesis drawn to its right./ /
| |
| |
< |
| |
| |
\ \
Of course, the font would use specially designed characters for this, instead of reusing existing ones, but the principle is the same.
The final table,
param
, contains a series of specifically defined fix_word
values, including the font's x-heightX-height
In typography, the x-height or corpus size refers to the distance between the baseline and the mean line in a typeface. Typically, this is the height of the letter x in the font , as well as the u, v, w, and z...
and the amount of italic slant (to determine how far to shift accents). Certain coding schemes such as
TeX math symbols
and TeX math extension
define extra parameters which appear after these.Property lists
There is a human-readableHuman-readable
A human-readable medium or human-readable format is a representation of data or information that can be naturally read by humans.In computing, human-readable data is often encoded as ASCII or Unicode text, rather than presented in a binary representation...
equivalent to the TFM format called PL, for property list. There is an exact correspondence between a TFM file and a PL file: one can be freely converted to the other and back again with no loss of information using the
tftopl
and pltotf
programs. The PL format, optimized for usability instead of space, does not make the same use of references that the TFM format does. For instance, many characters in a font may use the same character width, which would be represented only once in the TFM format, and this value would be referenced by each character, since the index would be significantly smaller than the full-precision numerical value. In the PL format, however, the full value is written out each time it appears.For example, this is the code for the upper-case letter Y in Computer Modern
Computer Modern
Computer Modern is the family of typefaces used by default by the typesetting program TeX. It was created by Donald Knuth with his METAFONT program, and was most recently updated in 1992. However, the family font was superseded by CM-Super , the latest release dating 2008...
Roman
Roman type
In typography, roman is one of the three main kinds of historical type, alongside blackletter and italic. Roman type was modelled from a European scribal manuscript style of the 1400s, based on the pairing of inscriptional capitals used in ancient Rome with Carolingian minuscules developed in the...
, ten point:
(CHARACTER C Y
(CHARWD R 0.750002)
(CHARHT R 0.683332)
(CHARIC R 0.025)
(COMMENT
(KRN C e R -0.083334)
(KRN C o R -0.083334)
(KRN C r R -0.083334)
(KRN C a R -0.083334)
(KRN C A R -0.083334)
(KRN C u R -0.083334)
)
)
The kerning values seen here are copied from the another section of the PL file in order to make it easier to read, which in itself is redundant. Notice how the full numeric values of the kerning constants are written out each time they appear, instead of being stored once and referred to by a much smaller index.