Lexis (linguistics)
In linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....

, a lexis (from the Greek
Greek language
Greek is an independent branch of the Indo-European family of languages. Native to the southern Balkans, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history;...

: λέξις "word") is the total word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...

-stock or lexicon
In linguistics, the lexicon of a language is its vocabulary, including its words and expressions. A lexicon is also a synonym of the word thesaurus. More formally, it is a language's inventory of lexemes. Coined in English 1603, the word "lexicon" derives from the Greek "λεξικόν" , neut...

 having items of lexical, rather than grammatical
In linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...

, meaning. This notion contrasts starkly with the Chomskian
Noam Chomsky
Avram Noam Chomsky is an American linguist, philosopher, cognitive scientist, and activist. He is an Institute Professor and Professor in the Department of Linguistics & Philosophy at MIT, where he has worked for over 50 years. Chomsky has been described as the "father of modern linguistics" and...

 proposition of a “Universal Grammar
Universal grammar
Universal grammar is a theory in linguistics that suggests that there are properties that all possible natural human languages have.Usually credited to Noam Chomsky, the theory suggests that some rules of grammar are hard-wired into the brain, and manifest themselves without being taught...

” as the prime mover for language. Grammar still plays an integral role in lexis, but it is the result of accumulated lexis, not its generator.


In short, the lexicon is
  • Formulaic: it relies on partially-fixed expressions and highly probable word combinations
  • Idiomatic
    Idiom is an expression, word, or phrase that has a figurative meaning that is comprehended in regard to a common use of that expression that is separate from the literal meaning or definition of the words of which it is made...

    : it follows conventions and patterns for usage
  • Metaphoric
    A metaphor is a literary figure of speech that uses an image, story or tangible thing to represent a less tangible thing or some intangible quality or idea; e.g., "Her eyes were glistening jewels." Metaphor may also be used for any rhetorical figures of speech that achieve their effects via...

    : concepts such as time and money, business and sex, systems and water all share a large portion of the same vocabulary
  • Grammatical
    In linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...

    : it uses rules based on sampling of the Lexicon
  • Register-specific: it uses the same word differently and/or less frequently in different contexts

A major area of study psycholinguistics
Psycholinguistics or psychology of language is the study of the psychological and neurobiological factors that enable humans to acquire, use, comprehend and produce language. Initial forays into psycholinguistics were largely philosophical ventures, due mainly to a lack of cohesive data on how the...

 and neurolinguistics
Neurolinguistics is the study of the neural mechanisms in the human brain that control the comprehension, production, and acquisition of language. As an interdisciplinary field, neurolinguistics draws methodology and theory from fields such as neuroscience, linguistics, cognitive science,...

 involves the question of how words are retrieved from the mental lexicon in online language processing and production. For example, the cohort model
Cohort model
The cohort model in psycholinguistics and neurolinguistics is a model of lexical retrieval first proposed by William Marslen-Wilson in the late 1980s. It attempts to describe how visual or auditory input is mapped onto a word in a hearer's lexicon...

 seeks to describe lexical retrieval in terms of segment-by-segment activation of competing lexical entries.

Formulaic language

In recent years, the compilation of language databases using real samples from speech and writing has enabled researchers to take a fresh look at the composition of languages. Among other things, statistical research methods offer reliable insight into the ways in which words interact. The most interesting findings have taken place in the dichotomy between language use (how language is used) and language usage (how language could be used).

Language use shows which occurrences of words and their partners are most probable. The major finding of this research is that language users rely to a very high extent on ready-made language “lexical chunks”, which can be easily combined to form sentences. This eliminates the need for the speaker to analyse each sentence grammatically, yet deals with a situation effectively. Typical examples include “I see what you mean” or “Could you please hand me the…” or “Recent research shows that…”

Language usage, on the other hand, is what takes place when the ready-made chunks do not fulfill the speaker’s immediate needs; in other words, a new sentence is about to be formed and must be analyzed for correctness. Grammar rules have been internalised by native speakers, allowing them to determine the viability of new sentences. Language usage might be defined as a fall-back position when all other options have been exhausted.

Context and co-text

When analyzing the structure of language statistically, a useful place to start is with high frequency context words, or so-called Key Word in Context (KWICs). After millions of samples of spoken and written language have been stored in a database, these KWICs can be sorted and analyzed for their co-text, or words which commonly co-occur with them. Valuable principles with which KWICs can be analyzed include:
  • Collocation
    In corpus linguistics, collocation defines a sequence of words or terms that co-occur more often than would be expected by chance. In phraseology, collocation is a sub-type of phraseme. An example of a phraseological collocation is the expression strong tea...

    : words and their co-occurrences (examples include “fulfill needs” and “fall-back position”)
  • Semantic prosody
    Semantic prosody
    Semantic prosody, also discourse prosody, describes the way in which certain seemingly neutral words can be perceived with positive or negative associations through frequent occurrences with particular collocations....

    : the connotation words carry (“pay attention” can be neutral or remonstrative, as when a teacher says to a pupil: “Pay attention!” (or else)
  • Colligation: the grammar that words use (while “I hope that suits you” sounds natural, “I hope that you are suited by that” does not).
  • Register
    Register (linguistics)
    In linguistics, a register is a variety of a language used for a particular purpose or in a particular social setting. For example, when speaking in a formal setting an English speaker may be more likely to adhere more closely to prescribed grammar, pronounce words ending in -ing with a velar nasal...

    : the text style in which a word is used (“President vows to support allies” is most likely found in news headlines, whereas “vows” in speech most likely refer to “marriages”; in speech, the verb “vow” is most likely used as “promise”).

Once data has been collected, it can be sorted to determine the probability of co-occurrences. One common and well-known way is with a concordance
Agreement (linguistics)
In languages, agreement or concord is a form of cross-reference between different parts of a sentence or phrase. Agreement happens when a word changes form depending on the other words to which it relates....

: the KWIC is centered and shown with dozens of examples of it in use, as with the example for “possibility” below.

Concordance for possibility

   About to be put on looks a real possibility. Now that Benn is no longer
Hiett, says that remains a real possibility: As part of the PLO, the PLF
Graham added. That's a possibility as well," Whitlock admitted.
Severe pain was always a possibility. Early in the century, both
that, when possible, every other possibility, including speeches by outside
that we can, that we use every possibility, including every possibility of
could be let separately. Another possibility is `constructive vandalism'
a people reject violence and the possibility of violence can the possibility
the French vote and now enjoy the possibility of winning two seats in the
immediately investigate the possibility of criminal charges and that her
Sri Lankan sources say that the possibility of negotiating with the Tamil
Sheikhdoms too there might be the possibility of encouraging agitation.
the twelve member states on the possibility of their threatening to
Marie had already looked into the possibility of persuading the [f]
a function of dependency, but the possibility of capitalist development,
were almost defenceless. The possibility of an invasion had been apparent
oddly and are worried about the possibility of drug use, say so. Tell them
was first convened to discuss the possibility of a coup d'état to return the
in the mi5 line and in the possibility of the state being used to smear
reasons behind the move was the possibility of a new market. Cheap terminals
be assessed individually. The possibility of genetic testing brings that
given the privilege. The other possibility, of course, is that the jaunt
All this undermines the possibility of economic reform and requires
get. (Knowing that there is no possibility of attempting coitus takes the
who was openly cynical about the possibility of achieving socialism 5
so that they can perceive the possibility of being citizens engaged in
poisoning and fire, facing the possibility of their own death just to be
hearing yesterday that the possibility of using the agency to gather
in 1903, and I don't foresee any possibility replacing that. The car we
a genetic factor at work here, a possibility supported by at least a few
refused even to entertain the possibility that any of the nations of the
has a long history, there is the possibility that the recent upsurge in
Police are investigating the possibility that she was seen a short time
any doctors who think there is a possibility that they may have been infected
are in a store, there is a good possibility that you are wearing moisturizer
living must be made. The possibility that a young adult will be
he'd completed his account of the possibility that there was a drug-smuggling
has been devoted to exploring the possibility that so-called ancient peoples

Once such a concordance has been created, the co-occurrences of other words with the KWIC can be analyzed. This is done by means of a t-score. If we take for example the word “stranger” (comparative adjective and noun), a t-score analysis will provide us with information such as word frequency in the corpus: words such as “no” and “to” are not surprisingly very frequent; a word such as “controversy” much less. It then calculates the occurrences of that word together with the KWIC (“joint frequency”) to determine if that combination is unusually common, in other words, if the word combination occurs significantly more often than would be expected by its frequency alone. If so, the collocation is considered strong, and is worth paying closer attention to.

In this example, “no stranger to” is a very frequent collocation; so are words such as “mysterious”, “handsome”, and “dark”. This comes as no surprise. More interesting, however, is “no stranger to controversy”. Perhaps the most interesting example, though, is the idiomatic “perfect stranger”. Such a word combination could not be predicted on its own, as it does not mean “a stranger who is perfect” as we should expect. Its unusually high frequency shows that the two words collocate strongly and as an expression are highly idiomatic.

The study of corpus linguistics
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...

 provides us with many insights into the real nature of language, as shown above. In essence, the lexicon seems to be built on the premise that language use is best approached as an assembly process, whereby the brain links together ready-made chunks. Intuitively this makes sense: it is a natural short-cut to alleviate the burden of having to “re-invent the wheel” every time we speak. Additionally, using well-known expressions conveys loads of information rapidly, as the listener does not need to break down an utterance into its constituent parts. In Words and Rules
Words and Rules
Words and Rules: The Ingredients of Language is a 1999 popular linguistics book by Steven Pinker on the subject of regular and irregular verbs...

, Steven Pinker shows this process at work with regular and irregular verbs: we collect the former, which provide us with rules we can apply to unknown words (for example, the "‑ed" ending for past tense verbs allows us to decline the neologism “to google” into “googled”). Other patterns, the irregular verbs, we store separately as unique items to be memorized.

Metaphor as an organizational principle for lexis

Another method of effective language storage in the Lexicon includes the use of metaphor as a storage principle. (“Storage” and “files” are good examples of how human memory and computer memory have been linked to the same vocabulary; this was not always the case). Lakoff’s work is usually cited as the cornerstone to studies of metaphor in the language. One example is quite common: “time is money”. We can save, spend and waste both time and money. Another interesting example comes from business and sex: businesses penetrate the market, attract customers, and discuss “relationship management.” Business is also war: launch an ad campaign, gain a foothold (already a climbing metaphor in military usage) in the market, suffer losses. Systems, on the other hand, are water: a flood of information, overflowing with people, flow of traffic. The NOA theory of Lexicon acquisition argues that the metaphoric sorting filter helps to simplify language storage and avoid overload.


Computer research has revealed that grammar, in the sense of its ability to create entirely new language, is avoided as far as possible. Biber and his team working at the University of Arizona
University of Arizona
The University of Arizona is a land-grant and space-grant public institution of higher education and research located in Tucson, Arizona, United States. The University of Arizona was the first university in the state of Arizona, founded in 1885...

 on the Cobuild GSWE noted an unusually high frequency of word bundles that, on their own, lack meaning. But a sample of one or two quickly suggests their function: they can be inserted as grammatical glue without any prior analysis of form. Even a cursory observation of examples reveals how commonplace they are in all forms of language use, yet we are hardly aware of their existence. Research suggests that language is heavily peppered with such bundles in all registers; two examples include "do you want me to", commonly found in speech, or "there was no significant" found in academic registers. Put together in speech, they can create comprehensible sentences, such as "I'm not sure" + "if they're" + "they're going" to form "I'm not sure if they're going". Such a sentence eases the burden on the Lexicon as it requires no grammatical analysis whatsoever.


British linguist Michael K. Halliday
Michael Halliday
Michael Alexander Kirkwood Halliday is a British linguist who developed an internationally influential model of language, the systemic functional linguistic model. His grammatical descriptions go by the name of systemic functional grammar .-Biography:Halliday was born and raised in England...

 proposes a useful dichotomy of spoken
Spoken language
Spoken language is a form of human communication in which words derived from a large vocabulary together with a diverse variety of names are uttered through or with the mouth. All words are made up from a limited set of vowels and consonants. The spoken words they make are stringed into...

 and written language
Written language
A written language is the representation of a language by means of a writing system. Written language is an invention in that it must be taught to children, who will instinctively learn or create spoken or gestural languages....

 which actually entails a shift in paradigm: while linguistic theory posits the superiority of spoken language over written language (as the former is the origin, comes naturally, and thus precedes the written language), or the written over the spoken (for the same reasons: the written language being the highest form of rudimentary speech), Halliday states they are two entirely different entities.

He claims that speech is grammatically complex while writing is lexically dense. In other words, a sentence such as “a cousin of mine, the one who I was talking about the other day — the one who lives in Houston, not the one in Dallas — called me up yesterday to tell me the very same story about Mary, who…” is most likely to be found in conversation, not as a newspaper headline. “Prime Minister vows conciliation”, on the other hand, would be a typical news headline. One is more communicative (spoken), the other is more a recording tool (written).

Halliday’s work suggests something radically different: language behaves in registers
Register (linguistics)
In linguistics, a register is a variety of a language used for a particular purpose or in a particular social setting. For example, when speaking in a formal setting an English speaker may be more likely to adhere more closely to prescribed grammar, pronounce words ending in -ing with a velar nasal...

. Biber et al. working on the LGSWE worked with four (these are not exhaustive, merely exemplary): conversation
Conversation is a form of interactive, spontaneous communication between two or more people who are following rules of etiquette.Conversation analysis is a branch of sociology which studies the structure and organization of human interaction, with a more specific focus on conversational...

, literature
Literature is the art of written works, and is not bound to published sources...

, news
News is the communication of selected information on current events which is presented by print, broadcast, Internet, or word of mouth to a third party or mass audience.- Etymology :...

, academic
Academic writing
In academia, writing and publishing is conducted in several sets of forms and genres. This is a list of genres of academic writing. It is a short summary of the full spectrum of critical & academic writing. It does not cover the variety of critical approaches that can be applied when writing about...

. These four registers clearly highlight distinctions within language use which would not be clear through a “grammatical” approach. Not surprisingly, each register favors the use of different words and structures: whereas news headline stories, for example, are grammatically simple, conversational anecdotes are full of lexical repetition. The lexis of the news, however, can be quite dense, just as the grammar of speech can be incredibly complicated.

See also

  • Lexicon
    In linguistics, the lexicon of a language is its vocabulary, including its words and expressions. A lexicon is also a synonym of the word thesaurus. More formally, it is a language's inventory of lexemes. Coined in English 1603, the word "lexicon" derives from the Greek "λεξικόν" , neut...

  • Lexeme
    A lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as RUN...

  • Lexical unit
  • Lexicography
    Lexicography is divided into two related disciplines:*Practical lexicography is the art or craft of compiling, writing and editing dictionaries....

  • Lexicology
    Lexicology is the part of linguistics which studies words, their nature and meaning, words' elements, relations between words , word groups and the whole lexicon....

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.