Space (punctuation)
Encyclopedia
In writing
Writing
Writing is the representation of language in a textual medium through the use of a set of signs or symbols . It is distinguished from illustration, such as cave drawing and painting, and non-symbolic preservation of language via non-textual media, such as magnetic tape audio.Writing most likely...

, a space ( ) is a blank area devoid of content, serving to separate words, letters, numbers, and punctuation. Conventions for interword
Interword separation
In punctuation, a word divider is a glyph that separates written words. In languages which use the Latin, Cyrillic, and Arabic alphabets, as well as other languages of Europe and the Mideast, the word divider is a blank space, or whitespace, a convention which is spreading, along with other aspects...

 and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex.

In the classical period
Classical antiquity
Classical antiquity is a broad term for a long period of cultural history centered on the Mediterranean Sea, comprising the interlocking civilizations of ancient Greece and ancient Rome, collectively known as the Greco-Roman world...

, Latin was written with interpunct
Interpunct
An interpunct —also called an interpoint—is a small dot used for interword separation in ancient Latin script, which also appears in some modern languages as a stand-alone sign inside a word. It is present in Unicode as code point ....

s (centred dots) as word separators, but that practice was abandoned sometime around the year AD 200 in favour of scriptio continua
Scriptio continua
Scriptio continua is a style of writing without spaces or other marks between words or sentences....

, i.e., with the words running together without any word separators. In around AD 600–800, blank spaces started being inserted between words in Latin, and that practice carried over to all languages using the Latin alphabet (e.g. English). In typesetting, spaces have historically been of multiple lengths with particular space-lengths being used for specific typographic purposes, such as separating words or separating sentences or separating punctuation from words. Following the invention of the typewriter and the subsequent overlap of designer style-preferences and computer-technology limitations, much of this reader-centric variation has been lost in normal use.

In computer representation of text, spaces of various sizes, styles, or language characteristics (different space characters) are indicated with unique code point
Code point
In character encoding terminology, a code point or code position is any of the numerical values that make up the code space . For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112...

s.

Spaces between words

Modern English uses a space to separate words, but not all languages follow this practice. Spaces were not used to separate words in Latin
Latin
Latin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and...

 until roughly 600–800 CE. Ancient Hebrew and Arabic did use spaces, partly to compensate in clarity for the lack of vowels. Traditionally, all CJK
CJK
CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.- Characteristics :...

 languages have no spaces: modern Chinese
Chinese language
The Chinese language is a language or language family consisting of varieties which are mutually intelligible to varying degrees. Originally the indigenous languages spoken by the Han Chinese in China, it forms one of the branches of Sino-Tibetan family of languages...

 and Japanese
Japanese language
is a language spoken by over 130 million people in Japan and in Japanese emigrant communities. It is a member of the Japonic language family, which has a number of proposed relationships with other languages, none of which has gained wide acceptance among historical linguists .Japanese is an...

 (except when written with little or no kanji
Kanji
Kanji are the adopted logographic Chinese characters hanzi that are used in the modern Japanese writing system along with hiragana , katakana , Indo Arabic numerals, and the occasional use of the Latin alphabet...

) still do not, but modern Korean
Korean language
Korean is the official language of the country Korea, in both South and North. It is also one of the two official languages in the Yanbian Korean Autonomous Prefecture in People's Republic of China. There are about 78 million Korean speakers worldwide. In the 15th century, a national writing...

 uses spaces.

Spaces between sentences

There have been various methods of sentence spacing in languages with a Latin-derived alphabet since the advent of movable type in the 15th century.
  • One space (French Spacing
    French spacing
    Sentence spacing is the horizontal space between sentences in typeset text. It is a matter of typographical convention. Since the introduction of movable-type printing in Europe, various sentence spacing conventions have been used in languages with a Latin-derived alphabet...

    ). This is the current convention in countries that use the modern Latin alphabet for published and final written work, as well as digital (World Wide Web) media.

  • Double space (English Spacing). This convention stems from the use of the monospaced font on typewriters. This historical convention was carried on by tradition until it was replaced by the single space convention in published print and digital media today.

  • One widened space, typically one-and-a-third to slightly less than two times wider than a word space. This spacing has been seen in early typesetting practices (prior to the nineteenth century). It has also been used in other non-typewriter typesetting systems such as the Linotype machine
    Linotype machine
    The Linotype typesetting machine is a "line casting" machine used in printing. The name of the machine comes from the fact that it produces an entire line of metal type at once, hence a line-o'-type, a significant improvement over manual typesetting....

     and the TeX
    TeX
    TeX is a typesetting system designed and mostly written by Donald Knuth and released in 1978. Within the typesetting system, its name is formatted as ....

     system. Modern computer-based digital fonts can adjust the spacing after terminal punctuation as well, creating a space slightly wider than a standard word space.

  • No space. According to Lynne Truss
    Lynne Truss
    Lynne Truss is an English writer and journalist, best known for her popular book Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation.-Early life:...

    , "young people" today using digital media "are now accustomed to following a full stop with a lower-case letter and no space".


There has been some controversy regarding the proper amount of sentence spacing in typeset material. The Elements of Typographic Style states that only a single word space is required for sentence spacing since "Larger spaces...are themselves punctuation."

Spaces and unit symbols

In Canadian Style: A Guide to Writing and Editing:
  • When symbols are used, the prefix symbol and unit symbols are run together:
5 cm
7 hL
4 dag
13 kPa

  • When a symbol consists entirely of letters, leave a full space between the quantity and the symbol:
45 kg not

  • When the symbol includes a non-letter character as well as letter, leave no space:
32°C not or

  • For the sake of clarity, a hyphen may be inserted between a numeral and a symbol used adjectivally:
35-mm film
60-W bulb

Variable-width general-purpose space

In computer character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

s, there is a normal general-purpose space (Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 character ; 32 decimal) whose width will vary according to the design of the typeface. Typical values range from 1/5-em to 1/3-em (in digital typography an em
Em (typography)
An em is a unit of measurement in the field of typography, equal to the currently specified point size.The name of em is related to M. Originally the unit was derived from the width of the capital "M" in the given typeface....

 is equal to the nominal size of the font, so for a 10-point font the space will probably be between 2 and 3.3 points). Sophisticated fonts may have differently sized spaces for bold, italic, and small-caps faces, and often compositors will manually adjust the width of the space depending on the size and prominence of the text.

In addition to this general-purpose space, it is possible to encode a space of a specific width. See the table below for a complete list.

In monospaced proofreading
Proofreading
Proofreading is the reading of a galley proof or computer monitor to detect and correct production-errors of text or art. Proofreaders are expected to be consistently accurate by default because they occupy the last stage of typographic production before publication.-Traditional method:A proof is...

 copy
Copy (written)
Copy refers to written material, in contrast to photographs or other elements of layout, in a large number of contexts, including magazines, advertising, and book publishing....

, only em- and en-spaces are represented using this character (which is called an em-quad or an en-quad), while other types of spaces are represented with a number sign
Number sign
Number sign is a name for the symbol #, which is used for a variety of purposes including, in some countries, the designation of a number...

 (see Number sign#Space in particular).

Breaking and non-breaking spaces

By default, computer programs usually assume that, in flowing text, a line break may as necessary be inserted at the position of a space. The non-breaking space
Non-breaking space
In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position. In certain formats , it also prevents the “collapsing” of multiple consecutive whitespace characters into a...

, (160 decimal), is intended to render the same as a normal space but prevents line-wrapping at that position.

However, there are programs which do not follow this intent exactly, for example even such a modern and widespread web browser like the Mozilla Firefox
Mozilla Firefox
Mozilla Firefox is a free and open source web browser descended from the Mozilla Application Suite and managed by Mozilla Corporation. , Firefox is the second most widely used browser, with approximately 25% of worldwide usage share of web browsers...

 3.5 series, released in 2009. It (correctly) suppresses the line-wrapping when rendering the non-breaking space, but it (incorrectly) ignores the word-spacing CSS
CSS
-Computing:*Cascading Style Sheets, a language used to describe the style of document presentations in web development*Central Structure Store in the PHIGS 3D API*Closed source software, software that is not distributed with source code...

 property for non-breaking spaces. This was corrected in Firefox version 3.6, released in 2010. Other programs may also suffer from the same flaw. The following simple HTML code demonstrates this flaw on affected browsers:



This paragraph shows extremely wide spaces between words,

because of the 1em word-spacing CSS value.



This paragraph contains non-breaking spaces and 

should show the same spaces as the first one.





Here are the above two paragraphs rendered in your current browser:
The generic Unicode space is often considered insignificant when appearing at the end of a line of text, or when part of a sequence of whitespace characters, so it may be omitted or "collapsed" in such circumstances. The non-breaking space is expressly non-collapsible and may be used to indent text, though best World Wide Web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

 practice prescribes using CSS
CSS
-Computing:*Cascading Style Sheets, a language used to describe the style of document presentations in web development*Central Structure Store in the PHIGS 3D API*Closed source software, software that is not distributed with source code...

 for this purpose.

Hair spaces around dashes

In American typography, both en dashes and em dashes are set continuous with the text (as illustrated by use in the Chicago Manual of Style
The Chicago Manual of Style
The Chicago Manual of Style is a style guide for American English published since 1906 by the University of Chicago Press. Its 16 editions have prescribed writing and citation styles widely used in publishing...

, 6.80, 6.83–86). However, an em dash can optionally be surrounded with a so-called hair space, (8202 decimal), or thin space, (8201 decimal). The thin space can be written in HTML by using the named entity   and the hair space can be written using numeric character reference
Numeric character reference
A numeric character reference is a common markup construct used in SGML and other SGML-related markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represent a single character from the Universal Character Set of Unicode...

   or  . This space should be much thinner than a normal space, and is seldom used on its own.
Normal space versus hair space
(as rendered by your browser)
Normal space left right
Normal space with em dash left — right
Thin space with em dash left — right
Hair space with em dash left — right
No space with em dash left—right

Spaces in Unicode

Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 defines several space characters with specific semantics and rendering characteristics, as shown in the table below. Depending on the browser and fonts used to view this table, not all spaces may display properly:
Space characters defined in Unicode
Code Dec Break
Non-breaking space
In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position. In certain formats , it also prevents the “collapsing” of multiple consecutive whitespace characters into a...

 
URL HTML Name Block
Unicode block
In Unicode, a block is defined as one contiguous range of code points. Blocks are named uniquely and have no overlap. They may be defined with the starting and ending code points. The block explicitly can include code points that are unassigned and non-characters. Code points not belonging to any...

Display
U+0020 32 Space Basic Latin ] [
Normal space, same as ASCII character 0x20
U+00A0 160   No-Break Space Latin-1 Supplement ] [
Identical to U+0020, but not a point at which a line may be broken
U+1680 5760 Ogham ] [
Used for interword separation
Interword separation
In punctuation, a word divider is a glyph that separates written words. In languages which use the Latin, Cyrillic, and Arabic alphabets, as well as other languages of Europe and the Mideast, the word divider is a blank space, or whitespace, a convention which is spreading, along with other aspects...

 in Ogham
Ogham
Ogham is an Early Medieval alphabet used primarily to write the Old Irish language, and occasionally the Brythonic language. Ogham is sometimes called the "Celtic Tree Alphabet", based on a High Medieval Bríatharogam tradition ascribing names of trees to the individual letters.There are roughly...

 text. Normally a vertical line in vertical text or a horizontal line in horizontal text, but may also be a blank space in "stemless" fonts. Requires an Ogham font.
U+180E 6158 Mongolian Vowel Separator (MVS) Mongolian ]᠎[
A narrow space character (not to be confused with "thin space", below) used in Mongolian to cause the final two characters of a word to take on different shapes.
U+2000 8192 En quad General Punctuation ] [
En Quad is canonically equivalent to U+2002, U+2002 is preferred.
U+2001 8193
Mutton quad
General Punctuation ] [
Em Quad is canonically equivalent to U+2003. U+2003 is preferred.
U+2002 8194  
Nut
General Punctuation ] [
Width of one en
En (typography)
An en is a typographic unit, half of the width of an em. By definition, it is equivalent to half of the height of the font . As its name suggests, it is also traditionally the width of a lowercase letter "n"....

 (half of one em
Em (typography)
An em is a unit of measurement in the field of typography, equal to the currently specified point size.The name of em is related to M. Originally the unit was derived from the width of the capital "M" in the given typeface....

). U+2000 En Quad is canonically equivalent to this character (En Space is preferred).
U+2003 8195  
Mutton
General Punctuation ] [
Width of one em
Em (typography)
An em is a unit of measurement in the field of typography, equal to the currently specified point size.The name of em is related to M. Originally the unit was derived from the width of the capital "M" in the given typeface....

. U+2001 Em Quad is canonically equivalent to this character (Em Space is preferred).
U+2004 8196
Thick Space
General Punctuation ] [
One third of an em wide
U+2005 8197
Mid Space
General Punctuation ] [
One fourth of an em wide
U+2006 8198 General Punctuation ] [
One sixth of an em wide. In computer typography sometimes equated to U+2009.
U+2007 8199 General Punctuation ] [
In fonts with monospaced digits, equal to the width of one digit.
U+2008 8200 General Punctuation ] [
As wide as the narrow punctuation in a font, i.e. the advance width of the period or comma.
U+2009 8201   General Punctuation ] [
One fifth (sometimes one sixth) of an em wide. Recommended for use as a thousands separator for measures made with SI units. Unlike U+2002 to U+2008, its width may get adjusted in typesetting.
U+200A 8202 General Punctuation ] [
Thinner than a thin space
U+200B 8203 (ZWSP) General Punctuation ]​[
Used to indicate word boundaries to text processing systems when using scripts that do not use explicit spacing.
U+200C 8204 ‌ (ZWNJ) General Punctuation ]‌[
When placed between two characters that would otherwise be connected, a ZWNJ causes them to be printed in their final and initial forms, respectively.
U+200D 8205 ‍ (ZWJ) General Punctuation ]‍[
When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms.
U+202F 8239 General Punctuation ] [
Similar in function to U+00A0 No-Break Space. Introduced in Unicode 3.0 for Mongolian, to separate a suffix from the word stem without indicating a word boundary. When used with Mongolian, its width is usually one third of the normal space; in other context, its width resembles that of the Thin Space (U+2009) at least with some fonts. This char is also used in French before ";?!»" chars and after "«".
U+205F 8287 Medium Mathematical Space (MMSP) General Punctuation ] [
Used in mathematical formulae. Four-eighteenths of an em. In mathematical typography, the widths of spaces are usually given in integral multiples of an eighteenth of an em, and 4/18 em may be used in several situations, for example between the a and the + and between the + and the b in the expression a + b.
U+2060 8288 (WJ) General Punctuation ]⁠[
Identical to U+200B, but not a point at which a line may be broken. Introduced in Unicode 3.2 to replace the deprecated "zero width no-break space" function of the U+FEFF character.
U+3000 12288 CJK Symbols and Punctuation ] [
As wide as a CJK
CJK
CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.- Characteristics :...

 character cell (fullwidth)
U+FEFF 65279
= Byte Order Mark (BOM)
Byte Order Mark
The byte order mark is a Unicode character used to signal the endianness of a text file or stream. Its code point is U+FEFF. BOM use is optional, and, if used, should appear at the start of the text stream...

Arabic Presentation Forms-B ][
Used primarily as a Byte Order Mark
Byte Order Mark
The byte order mark is a Unicode character used to signal the endianness of a text file or stream. Its code point is U+FEFF. BOM use is optional, and, if used, should appear at the start of the text stream...

. Use as an indication of non-breaking is deprecated as of Unicode 3.2, see U+2060 instead.

Unicode also provides some visible characters to stand in for space when necessary:
Space characters defined in Unicode
Code Dec Name Block
Unicode block
In Unicode, a block is defined as one contiguous range of code points. Blocks are named uniquely and have no overlap. They may be defined with the starting and ending code points. The block explicitly can include code points that are unassigned and non-characters. Code points not belonging to any...

Display Description
U+00B7 183 Middle dot Basic Latin · interpunct
Interpunct
An interpunct —also called an interpoint—is a small dot used for interword separation in ancient Latin script, which also appears in some modern languages as a stand-alone sign inside a word. It is present in Unicode as code point ....

, used in text processors. HTML also: ·
U+237D 9085 Shouldered open box Miscellaneous Technical used for NBSP
U+2420 9248 Symbol for space Control Pictures
U+2422 9250 Blank symbol Control Pictures
U+2423 9251 Open box Control Pictures

Use of the space in computing

In programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 syntax, spaces are frequently used to explicitly separate tokens. Aside from this use, spaces and other whitespace character
Whitespace (computer science)
In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...

s are usually ignored by modern programming languages. Exceptions are Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

, occam, ABC
ABC programming language
ABC is an imperative general-purpose programming language and programming environment developed at CWI, Netherlands by Leo Geurts, Lambert Meertens, and Steven Pemberton. It is interactive, structured, high-level, and intended to be used instead of BASIC, Pascal, or AWK...

, and Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, which use the amount of whitespace in indentation to indicate the bounds of a block, and a whimsical language called Whitespace
Whitespace (programming language)
Whitespace is an esoteric programming language developed by Edwin Brady and Chris Morris at the University of Durham . It was released on 1 April 2003 . Its name is a reference to whitespace characters...

, where whitespace is the only meaningful syntactical element.

In commands processed by command processors, e.g. in scripts and typed in, the space character can cause problems as it has two possible functions: as part of a command or parameter, or as a parameter or name separator
Delimiter
A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.Delimiters represent...

. Ambiguity can be prevented either by prohibiting embedded spaces, or by enclosing a name with embedded spaces between quote characters.

Text editors, word processor
Word processor
A word processor is a computer application used for the production of any sort of printable material....

s, and desktop publishing software differ in how they represent whitespace on the screen, and how they represent spaces at the ends of lines longer than the screen or column width. In some cases, spaces are shown simply as blank space; in other cases they may be represented by an interpunct
Interpunct
An interpunct —also called an interpoint—is a small dot used for interword separation in ancient Latin script, which also appears in some modern languages as a stand-alone sign inside a word. It is present in Unicode as code point ....

 or other symbols. Many different characters (described below) could be used to produce spaces, and non-character functions (such as margins and tab settings) can also affect whitespace.

Space characters in markup languages

Generalised markup languages, such as SGML, do not treat space characters differently from other characters.

However, special-purpose markup languages may. In particular, web markup languages such as XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 and HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 treat whitespace characters specially, including space characters, for programmers' convenience. One or more space characters read by conforming Display-time processors of those markup language
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...

s are collapsed to 0 or 1 space, depending on their semantic context. For example, double (or more) spaces within text are collapsed to a single space, and spaces which appear on either side of the "=" that separates an attribute name from its value have no effect on the interpretation of the document. Element end tags can contain trailing spaces, and empty-element tags in XML can contain spaces before the "/>".

In XML attribute values, sequences of whitespace characters are treated as a single space when the document is read by a parser. Whitespace in XML element content is not changed in this way by the parser, but an application receiving information from the parser may choose to apply similar rules to element content. An XML document author can use the xml:space="preserve" attribute on an element to force the parser to discourage the downstream application from altering whitespace in that element's content.

In most HTML element
HTML element
An HTML element is an individual component of an HTML document. HTML documents are composed of a tree of HTML elements and other nodes, such as text nodes. Each element can have attributes specified. Elements can also have content, including other elements and text. HTML elements represent...

s, a sequence of whitespace characters is treated as a single inter-word separator, which may manifest as a single space character when rendering text in a language that normally inserts such space between words. Conforming HTML renderers are required to apply a more literal treatment of whitespace within a few prescribed elements, such as the pre tag and any element for which CSS
Cascading Style Sheets
Cascading Style Sheets is a style sheet language used to describe the presentation semantics of a document written in a markup language...

 has been used to apply pre-like whitespace processing. In such elements, space characters will not be "collapsed" into inter-word separators.

In both XML and HTML, the non-breaking space
Non-breaking space
In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position. In certain formats , it also prevents the “collapsing” of multiple consecutive whitespace characters into a...

 character, along with other non-"standard" spaces, is not treated as collapsible "whitespace", so it is not subject to the rules above.

See also

  • Hard space
    Hard space
    In typesetting and text editors, the term hard space has several meanings, all related to a special way of representing the space between characters....

  • Hyphenation
  • Internal field separator
    Internal field separator
    In Unix operating systems, internal field separator refers to the character or characters designated as whitespace by the operating system. IFS is actually a system variable, and it can be modified, which is useful programmatically in a number of ways.IFS typically includes the space and the...

  • Non-breaking space
    Non-breaking space
    In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position. In certain formats , it also prevents the “collapsing” of multiple consecutive whitespace characters into a...

  • Paren space
  • Sentence spacing
  • Zero-width joiner
    Zero-width joiner
    The zero-width joiner is a non-printing character used in the computerized typesetting of some complex scripts, such as the Arabic script or any of the Indic scripts. When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected...

  • Zero-width non-joiner
    Zero-width non-joiner
    The zero-width non-joiner is a non-printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively...

  • For the Chinese one-character space used as respect, see Tai tou

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK