Coreference
Encyclopedia
In linguistics
, co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same "referent."
For example, in the sentence
"Mary said she would help me", "she" and "Mary" are most likely referring to the same person or group, in which case they are coreferent. Similarly, in "I saw Scott yesterday. He was fishing by the lake," Scott and he are most likely coreferent.
The pattern of these examples is typical: when first introducing a person or other topic for discussion, an author or speaker will use a relatively long or detailed description, such as a definite description
as defined by Saul Kripke
. However, later mentions are briefer. Once down to mere pronouns, references are frequently ambiguous
. In the "Mary said she would help me" example, although the most likely reading is that "she" refers to Mary, "she" could instead refer to someone else (most likely someone introduced earlier in a dialog).
In computational linguistics
, coreference resolution is a well-studied problem in discourse
. In order to derive the correct interpretation of text, or even to estimate the relative importance of various mentioned subjects, pronouns and other referring expression
s need to be connected to the right individuals.
When the reader must look back to the previous context, coreference is called "anaphoric reference
". When the reader must look forward, it is termed "cataphoric reference".
Algorithms intended to resolve co-references commonly look first for the nearest preceding individual that is compatible with the referring expression. For example, "she" might attach to a preceding expression such as "the woman" or "Anne", but not to "Bill". Pronouns such as "himself" have much stricter constraints. Algorithms for resolving co-reference tend to have accuracy in the 75% range (as with many linguistic tasks, there is a tradeoff between precision and recall
).
A classic problem for coreference resolution in English, is the pronoun "it", which has many uses. "It" can refer much like "he" and "she", except that it refers to objects that are inanimate (the rules are actually more complex: animals may be any of "it", "he", or "she"; ships have traditionally been "she"; "hurricanes" are not usually referred to as "she" or "he" despite having gendered names). "It" can also refer to abstractions rather than beings: "He was paid minimum wage, but didn't seem to mind it." And finally, "it" also has pleonastic uses, which do not refer in anything like the same way as "he" and "she" do:
Pleonastic uses are not considered referential, and so are not part of coreference. Li et al. (2009) have demonstrated very high accuracy in sorting out pleonastic "it", and this success promises to improve the accuracy of coreference resolution overall.
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
, co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same "referent."
For example, in the sentence
Sentence (linguistics)
In the field of linguistics, a sentence is an expression in natural language, and often defined to indicate a grammatical unit consisting of one or more words that generally bear minimal syntactic relation to the words that precede or follow it...
"Mary said she would help me", "she" and "Mary" are most likely referring to the same person or group, in which case they are coreferent. Similarly, in "I saw Scott yesterday. He was fishing by the lake," Scott and he are most likely coreferent.
The pattern of these examples is typical: when first introducing a person or other topic for discussion, an author or speaker will use a relatively long or detailed description, such as a definite description
Definite description
A definite description is a denoting phrase in the form of "the X" where X is a noun-phrase or a singular common noun. The definite description is proper if X applies to a unique individual or object. For example: "the first person in space" and "the 42nd President of the United States of...
as defined by Saul Kripke
Saul Kripke
Saul Aaron Kripke is an American philosopher and logician. He is a professor emeritus at Princeton and teaches as a Distinguished Professor of Philosophy at the CUNY Graduate Center...
. However, later mentions are briefer. Once down to mere pronouns, references are frequently ambiguous
Ambiguity
Ambiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...
. In the "Mary said she would help me" example, although the most likely reading is that "she" refers to Mary, "she" could instead refer to someone else (most likely someone introduced earlier in a dialog).
In computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
, coreference resolution is a well-studied problem in discourse
Discourse
Discourse generally refers to "written or spoken communication". The following are three more specific definitions:...
. In order to derive the correct interpretation of text, or even to estimate the relative importance of various mentioned subjects, pronouns and other referring expression
Referring expression
A referring expression , in linguistics, is any noun phrase, or surrogate for a noun phrase, whose function in a text is "pick out" someone an individual person, place, object, or a set of persons, places, objects, etc. The technical terminology for "pick out" differs a great deal from one...
s need to be connected to the right individuals.
When the reader must look back to the previous context, coreference is called "anaphoric reference
Anaphora (linguistics)
In linguistics, anaphora is an instance of an expression referring to another. Usually, an anaphoric expression is represented by a pro-form or some other kind of deictic--for instance, a pronoun referring to its antecedent...
". When the reader must look forward, it is termed "cataphoric reference".
Algorithms intended to resolve co-references commonly look first for the nearest preceding individual that is compatible with the referring expression. For example, "she" might attach to a preceding expression such as "the woman" or "Anne", but not to "Bill". Pronouns such as "himself" have much stricter constraints. Algorithms for resolving co-reference tend to have accuracy in the 75% range (as with many linguistic tasks, there is a tradeoff between precision and recall
Precision and recall
In pattern recognition and information retrieval, precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance...
).
A classic problem for coreference resolution in English, is the pronoun "it", which has many uses. "It" can refer much like "he" and "she", except that it refers to objects that are inanimate (the rules are actually more complex: animals may be any of "it", "he", or "she"; ships have traditionally been "she"; "hurricanes" are not usually referred to as "she" or "he" despite having gendered names). "It" can also refer to abstractions rather than beings: "He was paid minimum wage, but didn't seem to mind it." And finally, "it" also has pleonastic uses, which do not refer in anything like the same way as "he" and "she" do:
- It's raining.
- It's really a shame.
- It takes a lot of work to be a success.
- Sometimes it's those who are loudest who have the most influence.
Pleonastic uses are not considered referential, and so are not part of coreference. Li et al. (2009) have demonstrated very high accuracy in sorting out pleonastic "it", and this success promises to improve the accuracy of coreference resolution overall.
See also
- Anaphora (linguistics)Anaphora (linguistics)In linguistics, anaphora is an instance of an expression referring to another. Usually, an anaphoric expression is represented by a pro-form or some other kind of deictic--for instance, a pronoun referring to its antecedent...
- Generic antecedents
- Disambiguation
- Nearest referentNearest referentThe nearest referent is a grammatical term sometimes used when two or more possible referents of a pronoun, or other part of speech, cause ambiguity in a text...
- LogophoricityLogophoricityIn linguistics, logophoricity is a kind of coreferential anaphora, where the third-person subject of a dependent clause is marked as identical to the subject of the main clause. Logophoric systems are frequently restricted to indirect speech....
- ObviativeObviativeObviate third person person is a grammatical person marking that distinguishes a non-salient third person referent from a more salient third person referent in a given discourse context...
- Switch referenceSwitch referenceIn linguistics, switch-reference describes any clause-level morpheme that signals whether certain prominent arguments in 'adjacent' clauses co-refer...
- Reflexive pronounReflexive pronounA reflexive pronoun is a pronoun that is preceded by the noun, adjective, adverb or pronoun to which it refers within the same clause. In generative grammar, a reflexive pronoun is an anaphor that must be bound by its antecedent...
External links
- Illinois Coreference Package Coreference resolution package implemented in Java. Demo
- Yifan Li, Petr Musilek, Marek Reformat, and Loren Wyard-Scott. "Identification of Pleonastic It Using the Web." In Journal of Artificial Intelligence Research 34 (2009): 339-389. http://www.jair.org/media/2622/live-2622-4362-jair.ps