Hyphenation algorithm
Encyclopedia
A hyphenation algorithm is a set of rules (especially one codified for implementation in a computer program) that decides at which points a word can be broken over two lines with a hyphen
Hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. The hyphen should not be confused with dashes , which are longer and have different uses, or with the minus sign which is also longer...

. For example, a hyphenation algorithm might decide that impeachment can be broken as impeach-ment or im-peachment, but not, say, as impe-achment.

One of the reasons for the complexity of the rules of word-breaking is that different 'dialects' of English tend to differ on the rule: American English
American English
American English is a set of dialects of the English language used mostly in the United States. Approximately two-thirds of the world's native speakers of English live in the United States....

 tends to work on sound, while British English
British English
British English, or English , is the broad term used to distinguish the forms of the English language used in the United Kingdom from forms used elsewhere...

 tends to look to the origins of the word and then to sound. There are also a large number of exceptions, which further complicates matters.

Some rules of thumb can be found in the reference "On Hyphenation – Anarchy of Pedantry". Among algorithmic approaches to hyphenation, the one implemented in the TeX
TeX
TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....

 typesetting system is widely used. It is thoroughly documented in the first two volumes of
Computers and Typesetting
Computers and Typesetting
Computers and Typesetting is a 5-volume set of books by Donald Knuth published 1986 describing the TeX and Metafont systems for digital typography. Knuth's computers and typesetting project was the result of his frustration with the lack of decent software for the typesetting of mathematical and...

and in Frank Liang's dissertation. Contrary to the belief that TeX relies on a large dictionary of exceptions, the point of Liang's work was to get the algorithm as accurate as he practically could and keep any exception dictionary small. In TeX's original hyphenation patterns for US English, the exception list contains fourteen words.

Hyphenation in TeX

Ports of the TeX hyphenation algorithm are available as libraries for several programming languages, including Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

, Ruby, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, and PostScript
PostScript
PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

, and TeX itself can be made to show hyphens in the log by using the \showhyphens command. Note however that TeX does not set out to find all hyphenation points of a word, and is therefore unsuitable for applications such as associating lyrics
Lyrics
Lyrics are a set of words that make up a song. The writer of lyrics is a lyricist or lyrist. The meaning of lyrics can either be explicit or implicit. Some lyrics are abstract, almost unintelligible, and, in such cases, their explication emphasizes form, articulation, meter, and symmetry of...

 with musical notes.

In LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...

hyphenation correction can be added by user using:

\hyphenation{words}


The \hyphenation command declares allowed hyphenation points, where words is a list of words, separated by spaces, in which each hyphenation point is indicated by a - character. For example

\hyphenation{fortran er-go-no-mic}


declares that in the current job "fortran" should not be hyphenated, and that if "ergonomic" must be hyphenated, to do so at the indicated points.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK