Quantitative linguistics
Encyclopedia
Quantitative linguistics is a sub-discipline of general linguistics and, more specifically, of mathematical linguistics. Quantitative Linguistics (QL) deals with language learning, language change, and application as well as structure of natural languages. QL investigates languages using statistical methods; its most demanding objective is the formulation of language laws and, ultimately, of a general theory of language in the sense of a set of interrelated languages laws Synergetic linguistics was from its very beginning specifically designed for this purpose.
QL is empirically based on the results of language statistics, a field which can be interpreted as statistics of languages or as statistics of any linguistic object. This field is not necessarily connected to substantial theoretical ambitions. Corpus linguistics
and computational linguistics
are other fields which contribute important empirical evidence.
.
Other linguistic units which also abide by this law are e.g., letters (characters) of different complexities, the lengths of the so-called hrebs and of speech acts. The same holds for the distributions of sounds (phones) of different durations.
QL is empirically based on the results of language statistics, a field which can be interpreted as statistics of languages or as statistics of any linguistic object. This field is not necessarily connected to substantial theoretical ambitions. Corpus linguistics
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...
and computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
are other fields which contribute important empirical evidence.
History
The earliest QL approaches date back in the ancient Greek and Indian world. One of the historical sources consists of applications of combinatorics to linguistic matters, another one is based on elementary statistical studies, which can be found under the header colometry and stichometryStichometry
Stichometry is a term applied to the measurement of ancient texts by στίχοι or verses of a fixed standard length.It was the custom of the Greeks and Romans to estimate the length of their literary works by measured lines...
.
Language laws in quantitative linguistics
In QL, the concept of law is understood as the class of law hypotheses which have been deduced from theoretical assumptions, are mathematically formulated, are interrelated with other laws in the field, and have sufficiently and successfully been tested on empirical data, i.e. which could not be refuted in spite of much effort to do so. Köhler writes about QL laws: “Moreover, it can be shown that these properties of linguistic elements and of the relations among them abide by universal laws which can be formulated strictly mathematically in the same way as common in the natural sciences. One has to bear in mind in this context that these laws are of stochastic nature; they are not observed in every single case (this would be neither necessary nor possible); they rather determine the probabilities of the events or proportions under study. It is easy to find counterexamples to each of the above-mentioned examples; nevertheless, these cases do not violate the corresponding laws as variations around the statistical mean are not only admissible but even essential; they are themselves quantitatively exactly determined by the corresponding laws. This situation does not differ from that in the natural sciences, which have since long abandoned the old deterministic and causal views of the world and replaced them by statistical/probabilistic models.“Some linguistic laws
There exist quite a number of proposed language laws, among them are:- Law of diversification: If linguistic categories such as parts-of-speech or inflectional endings appear in various forms it can be shown that the frequencies of their occurrences in texts are controlled by laws.
- Length (or more generally, complexity) distributions. The investigation of text or dictionary frequencies of units of any kind with regard to their lengths yields regularly a number of distributions, depending on the given kind of the unit under study. By now, the following units have been studied:
- Law of the distribution of morph lengths;
- Law of the distribution of the lengths of rhythmical units;
- Law of the distribution of sentence lengths;
- Law of the distribution of syllable lengths;
- Law of the distribution of word lengths;
Other linguistic units which also abide by this law are e.g., letters (characters) of different complexities, the lengths of the so-called hrebs and of speech acts. The same holds for the distributions of sounds (phones) of different durations.
- Martin's law: This law concerns lexical chains which are obtained by looking up the definition of a word in a dictionary, then looking up the definition of the definition just obtained etc. Finally, all these definitions form a hierarchy of more and more general meanings, whereby the number of definitions decreases with increasing generality. Among the levels of this kind of hierarchy, there exists a number of lawful relations.
- Menzerath's law (also, in particular in linguistics, Menzerath-Altmann law): This law states that the sizes of the constituents of a construction decrease with increasing size of the construction under study. The longer, e.g. a sentence (measured in terms of the number of clauses) the shorter the clauses (measured in terms of the number of words), or: the longer a word (in syllables or morphs) the shorter the syllables or words in sounds).
- Rank-frequency laws: Virtually any kind of linguistic units abides by these relations. We will give here only a few illustrative examples:
- The words of a text are arranged according their text frequency and assigned a rank number and the corresponding frequency. Since George Kingsley Zipf (the well-known “Zipf’s Law”), a large number of mathematical models of the relation between rank and frequency has been proposed.
- A similar distribution between rank and frequency of sounds, phonemes, and letters can be observed.
- Word associations: Rank and frequency of associations subjects react with on a (word) stimulus.
- Law of language change: Growth processes in language such as vocabulary growth, the dispersion of foreign or loan words, changes in the inflectional system etc. abide by a law known in QL as Piotrowski law, and corresponds to growth models in other scientific disciplines. The Piotrowski law is a case of the so-called logistic model (cf. logistic equation). It was shown that it covers also languages acquisition processes (cf. language acquisition law).
- Text block law: Linguistic units (e.g. words, letters, syntactic functions and constructions) show a specific frequency distribution in equally large text blocks.
- Zipf's law: The frequency of words is inversely proportional to their rank in frequency lists.
Stylistics
The study of poetic and also non-poetic styles can be based on statistical methods; moreover, it is possible to conduct corresponding investigations on the basis of the specific forms (parameters) language laws take in texts of different styles. In such cases, QL supports research into stylistics: One of the overall aims is evidence as objective as possible also in at least part of the domain of stylistic phenomena by referring to language laws. One of the central assumptions of QL is that some laws (eg. the distribution of word lengths) require different models, at least different parameter values of the laws (distributions or functions) depending on the text sort a text belongs to. If poetic texts are under study QL methods form a sub-discipline of Quantitative Study of Literature (stylometrics).Important authors
- Gabriel Altmann (1931)
- Otto BehaghelOtto BehaghelOtto Behaghel was a germanist and professor in Heidelberg, Basel, and Gießen.He added theoretical contributions to the German and Middle High German language. He formulated Behaghel's laws...
(1854-1936]; cf. Behaghel's lawsBehaghel's lawsBehaghel’s Laws describe the basic principles of the position of words and phrases in a sentence. They were formulated by the Linguist Otto Behaghel in the last volume of his four volume work Deutsche Syntax: Eine geschichtliche Darstellung .They include the following cross-language principles:#... - Sergej Grigor'evič Čebanov (1897-1966)
- William Palin EldertonWilliam Palin EldertonSir William Palin Elderton KBE PhD was a British actuary who served as president of the Institute of Actuaries . Elderton also had a very long association with the statistical journal Biometrika...
(1877-1962) - Sheila Embleton, Toronto
- Ernst Wilhelm Förstemann (1822-1906)
- Wilhelm Fucks (1902-1990)
- Peter Grzybek
- Pierre Guiraud
- Gustav Herdan (1897-1968);
- Luděk Hřebíček (1934)
- Friedrich Wilhelm Kaeding (1843-1928)
- Reinhard Köhler
- Werner Lehfeldt (1943)
- Viktor Vasil'evič Levickij (geb. 1938)
- Helmut Meier (1897-1973)
- Paul Menzerath (1883-1954), cf. Menzerath's law
- Sizuo Mizutani (1926)
- Augustus de MorganAugustus De MorganAugustus De Morgan was a British mathematician and logician. He formulated De Morgan's laws and introduced the term mathematical induction, making its idea rigorous. The crater De Morgan on the Moon is named after him....
(1806-1871). - Charles Muller, Straßburg
- Raijmund G. Piotrowski
- L.A. Sherman
- Juhan Tuldava (1922-2003)
- Andrew Wilson, Lancaster
- Albert Thumb (1865-1915)
- George Kingsley Zipf (1902-1950); cf. Zipf's law
- Eberhard Zwirner (1899-1984). Phonometry