Word wrap
Encyclopedia
In text display, line wrap is the feature of continuing on a new line when a line is full, such that each line fits in the viewable window, allowing text to be read from top to bottom without any horizontal scrolling
Scrolling
In computer graphics, filmmaking, television production, and other kinetic displays, scrolling is sliding text, images or video across a monitor or display. "Scrolling", as such, does not change the layout of the text or pictures, or but incrementally moves the user's view across what is...

.

Word wrap is the additional feature of most text editor
Text editor
A text editor is a type of program used for editing plain text files.Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code....

s, word processors, and web browser
Web browser
A web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...

s, of breaking lines between and not within words, except when a single word is longer than a line.

A soft return
Soft return
In word processing and text-oriented markup languages the term soft return can mean a line break due to word wrapping. Alternatively it can mean a stored line break that is not a paragraph break. For example, it is common to print postal addresses in a multiple-line format, but the several lines...

 is the break resulting from line wrap or word wrap, whereas a hard return
Hard return
A hard return is a paragraph break in a word processor. It differs from a soft return in that it starts a new paragraph. Besides affecting the document statistics, this means that:*Often, extra space and a first line indent will be inserted....

 is an intentional break, creating a new paragraph
Paragraph
A paragraph is a self-contained unit of a discourse in writing dealing with a particular point or idea. A paragraph consists of one or more sentences. The start of a paragraph is indicated by beginning on a new line. Sometimes the first line is indented...

.

Similarly, a hard wrap inserts actual line breaks in the text at wrap points, whereas a soft wrap puts the text into separate lines without inserting line breaks.

Soft wrapping allows line lengths to adjust automatically with adjustments to the width of the user's window or margin settings. Soft wrapping is a standard feature of all modern text editors, word processors, and email clients.

Word boundaries, hyphenation, and hard spaces

The soft returns are usually placed after the ends of complete words, or after the punctuation that follows complete words. However, word wrap may also occur following a hyphen
Hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. The hyphen should not be confused with dashes , which are longer and have different uses, or with the minus sign which is also longer...

 inside of a word. This is sometimes not desired, and can be blocked by using a non-breaking hyphen, or hard hyphen, instead of a regular hyphen.

A word without hyphens can be made wrappable by having soft hyphen
Soft hyphen
In computing and typesetting, a soft hyphen is a type of hyphen used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed....

s
in it. When the word isn't wrapped (i.e., isn't broken across lines), the soft hyphen isn't visible. But if the word is wrapped across lines, this is done at the soft hyphen, at which point it is shown as a visible hyphen on the top line where the word is broken. (In the rare case of a word that is meant to be wrappable by breaking it across lines but without making a hyphen ever appear, a zero-width space
Zero-width space
The zero-width space is a non-printing character used in computerized typesetting to indicate word boundaries to text processing systems when using scripts that do not use explicit spacing, or after characters that are not followed by a visible space but after which there may nevertheless be a...

is put at the permitted breaking point(s) in the word.)

Sometimes word wrap is undesirable between adjacent words. In such cases, word wrap can usually be blocked by using a hard space or non-breaking space
Non-breaking space
In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position. In certain formats , it also prevents the “collapsing” of multiple consecutive whitespace characters into a...

between the words, instead of regular spaces.

Word wrapping in text containing Chinese, Japanese, and Korean

In Chinese
Chinese language
The Chinese language is a language or language family consisting of varieties which are mutually intelligible to varying degrees. Originally the indigenous languages spoken by the Han Chinese in China, it forms one of the branches of Sino-Tibetan family of languages...

, Japanese
Japanese language
is a language spoken by over 130 million people in Japan and in Japanese emigrant communities. It is a member of the Japonic language family, which has a number of proposed relationships with other languages, none of which has gained wide acceptance among historical linguists .Japanese is an...

, and Korean
Korean language
Korean is the official language of the country Korea, in both South and North. It is also one of the two official languages in the Yanbian Korean Autonomous Prefecture in People's Republic of China. There are about 78 million Korean speakers worldwide. In the 15th century, a national writing...

, each Han character is normally considered a word, and therefore word wrapping can usually occur before and after any Han character.

Under certain circumstances, however, word wrapping is not desired. For instance,
  • word wrapping might not be desired within personal names, and
  • word wrapping might not be desired within any compound words (when the text is flush left but only in some styles).


Most existing word processors and typesetting
Typesetting
Typesetting is the composition of text by means of types.Typesetting requires the prior process of designing a font and storing it in some manner...

 software cannot handle either of the above scenarios.

CJK
CJK
CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.- Characteristics :...

 punctuation may or may not follow rules similar to the above-mentioned special circumstances. It is up to line breaking rules in CJK.

A special case of line breaking rules in CJK, however, always applies: line wrap must never occur inside the CJK dash and ellipsis. Even though each of these punctuation marks must be represented by two characters due to a limitation of all existing character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

s, each of these are intrinsically a single punctuation mark that is two em
Em (typography)
An em is a unit of measurement in the field of typography, equal to the currently specified point size.The name of em is related to M. Originally the unit was derived from the width of the capital "M" in the given typeface....

s wide, not two one-em-wide punctuation marks.

Algorithm

Word wrapping is an optimization problem. Depending on what needs to be optimized for, different algorithms are used.

Minimum length

A simple way to do word wrapping is to use a greedy algorithm
Greedy algorithm
A greedy algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stagewith the hope of finding the global optimum....

 that puts as many words on a line as possible, then moving on to the next line to do the same until there are no more words left to place. This method is used by many modern word processors, such as OpenOffice.org Writer
OpenOffice.org Writer
OpenOffice.org Writer is the word processor component of the OpenOffice.org software package. Writer is a word processor similar to Microsoft Word and Corel's WordPerfect, with some of their features....

 and Microsoft Word
Microsoft Word
Microsoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...

. This algorithm is optimal in that it always puts the text on the minimum number of lines. The following pseudocode implements this algorithm:

SpaceLeft := LineWidth
for each Word in Text
if (Width(Word) + SpaceWidth) > SpaceLeft
insert line break before Word in Text
SpaceLeft := LineWidth - Width(Word)
else
SpaceLeft := SpaceLeft - (Width(Word) + SpaceWidth)

Where LineWidth is the width of a line, SpaceLeft is the remaining width of space on the line to fill, SpaceWidth is the width of a single space character, Text is the input text to iterate over and Word is a word in this text.

Minimum raggedness

A different algorithm, used in TeX
TeX
TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....

, minimizes the square of the space at the end of lines to produce a more aesthetically pleasing result. The algorithm above is not optimal with respect to this, as the following example demonstrates:

aaa bb cc ddddd

If the cost function of a line is defined by the remaining space squared, the greedy algorithm would yield a sub-optimal solution for the problem (for simplicity, consider a fixed-width font and line width 6):

------ Line width: 6
aaa bb Remaining space: 0 (cost = 0 squared = 0)
cc Remaining space: 4 (cost = 4 squared = 16)
ddddd Remaining space: 1 (cost = 1 squared = 1)

Summing to a total cost of 17, while an optimal solution would look like this:

------ Line width: 6
aaa Remaining space: 3 (cost = 3 squared = 9)
bb cc Remaining space: 1 (cost = 1 squared = 1)
ddddd Remaining space: 1 (cost = 1 squared = 1)

The difference here is that the first line is broken before bb instead of after it, yielding a better right margin and a lower cost 11.

To solve the problem we need to define a cost function that computes the cost of a line consisting of the words to from the text:


Where typically is or . There are some special cases to consider: If the result is negative (that is, the sequence of words cannot fit on a line), the cost needs to reflect the cost of tracking
Tracking (typography)
In typography, letter-spacing, also called tracking, refers to the amount of space between a group of letters to affect density in a line or block of text....

 or condensing the text to fit; if that is not possible, it needs to return .

The cost of the optimal solution can be defined as a recurrence
Recurrence
Recurrence and recurrent may refer to:*Recurrence relation, an equation which defines a sequence recursively*Poincaré recurrence theorem, Henri Poincaré's theorem on dynamical systems...

:


This can be efficiently implemented using dynamic programming
Dynamic programming
In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

, for a time and space complexity of . Faster but more complicated linear time algorithms are also known.

Knuth's algorithm


Other word-wrap links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK