Lemma (linguistics)
Encyclopedia
In morphology
Morphology (linguistics)
In linguistics, morphology is the identification, analysis and description, in a language, of the structure of morphemes and other linguistic units, such as words, affixes, parts of speech, intonation/stress, or implied context...

 and lexicography
Lexicography
Lexicography is divided into two related disciplines:*Practical lexicography is the art or craft of compiling, writing and editing dictionaries....

, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of word
Word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...

s (headword
Headword
A headword, head word, lemma, or sometimes catchword is the word under which a set of related dictionary or encyclopaedia entries appear. The headword is used to locate the entry, and dictates its alphabetical position...

). In English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

, for example, run, runs, ran and running are forms of the same lexeme
Lexeme
A lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as RUN...

, with run as the lemma.
Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme. In lexicography, this unit is usually also the citation form or headword
Headword
A headword, head word, lemma, or sometimes catchword is the word under which a set of related dictionary or encyclopaedia entries appear. The headword is used to locate the entry, and dictates its alphabetical position...

 by which it is indexed. Lemmas have special significance in highly inflected languages
Inflection
In grammar, inflection or inflexion is the modification of a word to express different grammatical categories such as tense, grammatical mood, grammatical voice, aspect, person, number, gender and case...

 such as Czech
Czech language
Czech is a West Slavic language with about 12 million native speakers; it is the majority language in the Czech Republic and spoken by Czechs worldwide. The language was known as Bohemian in English until the late 19th century...

. The process of determining the lemma for a given word is called lemmatisation
Lemmatisation
Lemmatisation in linguistics, is the process of grouping together the different inflected forms of a word so they can be analysed as a single item....

.

Morphology

In English, the citation form of a noun
Noun
In linguistics, a noun is a member of a large, open lexical category whose members can occur as the main word in the subject of a clause, the object of a verb, or the object of a preposition .Lexical categories are defined in terms of how their members combine with other kinds of...

 is the singular
Grammatical number
In linguistics, grammatical number is a grammatical category of nouns, pronouns, and adjective and verb agreement that expresses count distinctions ....

: e.g., mouse rather than mice. For multi-word lexemes which contain possessive adjective
Possessive adjective
Possessive adjectives, also known as possessive determiners, are a part of speech that modifies a noun by attributing possession to someone or something...

s or reflexive pronoun
Reflexive pronoun
A reflexive pronoun is a pronoun that is preceded by the noun, adjective, adverb or pronoun to which it refers within the same clause. In generative grammar, a reflexive pronoun is an anaphor that must be bound by its antecedent...

s, the citation form uses a form of the indefinite pronoun
Indefinite pronoun
An indefinite pronoun is a pronoun that refers to one or more unspecified beings, objects, or places.-List of English indefinite pronouns:Note that many of these words can function as other parts of speech too, depending on context...

 one: e.g., do one's best, perjure oneself. In languages with grammatical gender
Grammatical gender
Grammatical gender is defined linguistically as a system of classes of nouns which trigger specific types of inflections in associated words, such as adjectives, verbs and others. For a system of noun classes to be a gender system, every noun must belong to one of the classes and there should be...

, the citation form of regular adjectives and nouns is usually the masculine singular. If the language additionally has cases
Grammatical case
In grammar, the case of a noun or pronoun is an inflectional form that indicates its grammatical function in a phrase, clause, or sentence. For example, a pronoun may play the role of subject , of direct object , or of possessor...

, the citation form is often the masculine singular nominative.

In many languages, the citation form of a verb
Verb
A verb, from the Latin verbum meaning word, is a word that in syntax conveys an action , or a state of being . In the usual description of English, the basic form, with or without the particle to, is the infinitive...

 is the infinitive
Infinitive
In grammar, infinitive is the name for certain verb forms that exist in many languages. In the usual description of English, the infinitive of a verb is its basic form with or without the particle to: therefore, do and to do, be and to be, and so on are infinitives...

: French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...

 aller, German
German language
German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....

 gehen. In English it usually is the full infinitive (to go); the present tense is used for some defective verb
Defective verb
In linguistics, a defective verb is a verb which is missing e.g. a past tense, or cannot be used in some other way that normal verbs come. Formally, it is a verb with an incomplete conjugation. Defective verbs cannot be conjugated in certain tenses, aspects, or moods.-Arabic:In Arabic, defective...

s (shall, can; and must has only the one form). In Latin, Ancient Greek
Ancient Greek
Ancient Greek is the stage of the Greek language in the periods spanning the times c. 9th–6th centuries BC, , c. 5th–4th centuries BC , and the c. 3rd century BC – 6th century AD of ancient Greece and the ancient world; being predated in the 2nd millennium BC by Mycenaean Greek...

, and Modern Greek
Modern Greek
Modern Greek refers to the varieties of the Greek language spoken in the modern era. The beginning of the "modern" period of the language is often symbolically assigned to the fall of the Byzantine Empire in 1453, even though that date marks no clear linguistic boundary and many characteristic...

 (which has no infinitive), however, the first person singular present tense
Present tense
The present tense is a grammatical tense that locates a situation or event in present time. This linguistic definition refers to a concept that indicates a feature of the meaning of a verb...

 is normally used, though occasionally the infinitive may also be seen. (For contracted verbs in Greek, an uncontracted first person singular present tense is used to reveal the contract vowel, e.g. philéō for philō "I love" [implying affection]; agapáō for agapō "I love" [implying regard]). In Japanese
Japanese language
is a language spoken by over 130 million people in Japan and in Japanese emigrant communities. It is a member of the Japonic language family, which has a number of proposed relationships with other languages, none of which has gained wide acceptance among historical linguists .Japanese is an...

, the non-past (present and future) tense is used.

In Arabic
Arabic language
Arabic is a name applied to the descendants of the Classical Arabic language of the 6th century AD, used most prominently in the Quran, the Islamic Holy Book...

, which has no infinitives, the third person singular masculine of the past tense is the least-marked form, and is used for entries in modern dictionaries. In older dictionaries, which are still commonly used today, the triliteral
Triliteral
The roots of verbs and most nouns in the Semitic languages are characterized as a sequence of consonants or "radicals"...

 of the word, either a verb or a noun, is used. Hebrew
Hebrew language
Hebrew is a Semitic language of the Afroasiatic language family. Culturally, is it considered by Jews and other religious groups as the language of the Jewish people, though other Jewish languages had originated among diaspora Jews, and the Hebrew language is also used by non-Jewish groups, such...

 often uses the 3rd person masculine qal
Qal (linguistics)
In Hebrew grammar, the qal is the simple paradigm of the verb.The Classical Hebrew verb conjugates according to person and number in two finite tenses, the perfect and the imperfect. Both of these can then be modified by means of prefixes and suffixes to create other "actions" of the verb...

perfect, e.g., ברא bara' create, כפר kaphar deny. Georgian
Georgian language
Georgian is the native language of the Georgians and the official language of Georgia, a country in the Caucasus.Georgian is the primary language of about 4 million people in Georgia itself, and of another 500,000 abroad...

 uses the verbal noun
Verbal noun
In linguistics, the verbal noun turns a verb into a noun and corresponds to the infinitive in English language usage. In English the infinitive form of the verb is formed when preceded by to, e.g...

. For Korean
Korean language
Korean is the official language of the country Korea, in both South and North. It is also one of the two official languages in the Yanbian Korean Autonomous Prefecture in People's Republic of China. There are about 78 million Korean speakers worldwide. In the 15th century, a national writing...

, -da is attached to the stem.

In the Irish language
Irish language
Irish , also known as Irish Gaelic, is a Goidelic language of the Indo-European language family, originating in Ireland and historically spoken by the Irish people. Irish is now spoken as a first language by a minority of Irish people, as well as being a second language of a larger proportion of...

 words are highly inflected depending on their case (genitive, nominative, dative, and vocative); they are also inflected on their place within a sentence due to the presence of initial mutations
Irish initial mutations
Irish, like all modern Celtic languages, is characterized by its initial consonant mutations. These mutations affect the initial consonant of a word under specific morphological and syntactic conditions...

. The noun cainteoir, the lemma for the noun meaning "speaker", has a variety of forms: chainteoir, gcainteoir, cainteora, chainteora, cainteoirí, chainteoirí and gcainteoirí.

Some phrases are cited in a sort of lemma, e.g., Carthago delenda est (literally, "Carthage must be destroyed") is a common way of citing Cato
Cato the Elder
Marcus Porcius Cato was a Roman statesman, commonly referred to as Censorius , Sapiens , Priscus , or Major, Cato the Elder, or Cato the Censor, to distinguish him from his great-grandson, Cato the Younger.He came of an ancient Plebeian family who all were noted for some...

, although what he said was more like, Ceterum censeo Carthaginem esse delendam ("As to the rest, I hold that Carthage must be destroyed").

Lexicography

In a dictionary, the lemma "go" represents the inflected
Inflection
In grammar, inflection or inflexion is the modification of a word to express different grammatical categories such as tense, grammatical mood, grammatical voice, aspect, person, number, gender and case...

 forms "go", "goes", "going", "went", and "gone". The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g., "went" < "go". The disadvantage of such simplifications is, of course, the inability to look up a declined or conjugated form of the word, although some dictionaries, like Webster's
Webster's Dictionary
Webster's Dictionary refers to the line of dictionaries first developed by Noah Webster in the early 19th century, and also to numerous unrelated dictionaries that added Webster's name just to share his prestige. The term is a genericized trademark in the U.S.A...

, will list "went". Multilingual dictionaries vary in how they deal with this issue: the Langenscheidt dictionary of German does not list ging (< gehen); the Cassell does.

The form that is chosen to be the lemma is usually the least marked
Markedness
Markedness is a specific kind of asymmetry relationship between elements of linguistic or conceptual structure. In a marked-unmarked relation, one term of an opposition is the broader, dominant one...

 form, though there are occasional exceptions; e.g., Finnish
Finnish language
Finnish is the language spoken by the majority of the population in Finland Primarily for use by restaurant menus and by ethnic Finns outside Finland. It is one of the two official languages of Finland and an official minority language in Sweden. In Sweden, both standard Finnish and Meänkieli, a...

 dictionaries list verbs not under the verb root, but under the first infinitive marked with -(t)a, -(t)ä.

Lemmas or word stem
Word stem
In linguistics, a stem is a part of a word. The term is used with slightly different meanings.In one usage, a stem is a form to which affixes can be attached. Thus, in this usage, the English word friendships contains the stem friend, to which the derivational suffix -ship is attached to form a new...

s are used often in corpus linguistics
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...

 for determining word frequency. In such usage the specific definition of "lemma" is flexible depending on the task it is being used for.

Difference between stem and lemma

In computational linguistics, a stem
Word stem
In linguistics, a stem is a part of a word. The term is used with slightly different meanings.In one usage, a stem is a form to which affixes can be attached. Thus, in this usage, the English word friendships contains the stem friend, to which the derivational suffix -ship is attached to form a new...

 is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the verb. For example, from "produced", the lemma is "produce", but the stem is "produc-." This is because there are words such as production. In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed. When phonology
Phonology
Phonology is, broadly speaking, the subdiscipline of linguistics concerned with the sounds of language. That is, it is the systematic use of sound to encode meaning in any spoken human language, or the field of linguistics studying this use...

 is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" (proʊˈdjuːst) vs. "production" (proʊˈdʌkʃən).

Some lexemes have several stems but one lemma. For instance "to go" (the lemma) has the stems "go" and "went". (The past tense is based on a different verb, "to wend". The "-t" suffix may be considered as equivalent to "-ed".)

See also

  • Principal parts
    Principal parts
    In language learning, the principal parts of a verb are those forms that a student must memorize in order to be able to conjugate the verb through all its forms.- English :...

  • Root (linguistics)
    Root (linguistics)
    The root word is the primary lexical unit of a word, and of a word family , which carries the most significant aspects of semantic content and cannot be reduced into smaller constituents....

  • Null morpheme
    Null morpheme
    In morpheme-based morphology, a null morpheme is a morpheme that is realized by a phonologically null affix . In simpler terms, a null morpheme is an "invisible" affix. It is also called a zero morpheme; the process of adding a null morpheme is called null affixation, null derivation or zero...

  • Uninflected word
    Uninflected word
    In the context of linguistic morphology, an uninflected word is a word that has no morphological markers such as affixes, ablaut, consonant gradation, etc., indicating declension or conjugation...

  • Lexical Markup Framework
    Lexical Markup Framework
    ISO 24613:2008, Language resource management - Lexical markup framework , is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing and machine-readable dictionary lexicons...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK