Relevance (information retrieval)
Encyclopedia
In information science
Information science
-Introduction:Information science is an interdisciplinary science primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval and dissemination of information...

 and information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

, relevance denotes how well a retrieved document or set of documents meets the information need of the user.

Types

Relevance most commonly refers to topical relevance or aboutness, i.e. to what extent the topic of a result matches the topic of the query or information need. Relevance can also be interpreted more broadly, referring to generally how "good" a retrieved result is with regard to the information need. The latter definition of relevance, sometimes referred to as user relevance, encompasses topical relevance and possibly other concerns of the user such as timeliness, authority or novelty of the result.

History

The concern with the problem of finding relevant information dates back at least to the first publication of scientific journals in 17th Century.

The formal study of relevance began in the 20th Century with the study of what would later be called bibliometrics
Bibliometrics
Bibliometrics is a set of methods to quantitatively analyze scientific and technological literature. Citation analysis and content analysis are commonly used bibliometric methods...

. In the 1930s and 1940s, S. C. Bradford used the term "relevant" to characterize articles relevant to a subject (cf., Bradford's law
Bradford's law
Bradford's law is a pattern first described by Samuel C. Bradford in 1934 that estimates the exponentially diminishing returns of extending a search for references in science journals...

). In the 1950s, the first information retrieval systems emerged, and researchers noted the retrieval of irrelevant articles as a significant concern. In 1958, B. C. Vickery made the concept of relevance explicit in an address at the International Conference on Scientific Information.

Since 1958, information scientists have explored and debated definitions of relevance. A particular focus of the debate was the distinction between "relevance to a subject" or "topical relevance" and "user relevance".

Evaluation

The information retrieval community has emphasized the use of test collections and benchmark tasks to measure topical relevance, starting with the Cranfield Experiments
Cranfield Experiments
The Cranfield experiments were experiments conducted by Cyril W. Cleverdon at Cranfield University in the 1960s to evaluate the efficiency of indexing systems. They represent the prototypical evaluation model of information retrieval systems, and this model has been used in large-scale information...

 of the early 1960s and culminating in the TREC
Text Retrieval Conference
The Text REtrieval Conference is an on-going series of workshops focusing on a list of different information retrieval research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology and the Intelligence Advanced Research Projects Activity , and began in 1992...

 evaluations that continue to this day as the main evaluation framework for information retrieval research.

In order to evaluate how well an information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

 system retrieved topically relevant results, the relevance of retrieved results must be quantified. In Cranfield
Cranfield Experiments
The Cranfield experiments were experiments conducted by Cyril W. Cleverdon at Cranfield University in the 1960s to evaluate the efficiency of indexing systems. They represent the prototypical evaluation model of information retrieval systems, and this model has been used in large-scale information...

-style evaluations, this typically involves assigning a relevance level to each retrieved result, a process known as relevance assessment. Relevance levels can be binary (indicating a result is relevant or that it is not relevant), or graded (indicating results have a varying degree of match between the topic of the result and the information need). Once relevance levels have been assigned to the retrieved results, information retrieval performance measures can be used to assess the quality of a retrieval system's output.

In contrast to this focus solely on topical relevance, the information science community has emphasized user studies that consider user relevance. These studies often focus on aspects of human-computer interaction (see also human-computer information retrieval).

Clustering and relevance

The cluster hypothesis, proposed by C. J. van Rijsbergen
C. J. van Rijsbergen
C. J. "Keith" van Rijsbergen is a professor of computer science and the leader of the Glasgow Information Retrieval Group based at the University of Glasgow...

 in 1979, asserts that two documents that are similar to each other have a high likelihood of being relevant to the same information need. With respect to the embedding similarity space, the cluster hypothesis can be interpreted globally or locally. The global interpretation assumes that there exist some fixed set of underlying topics derived from inter-document similarity. These global clusters or their representatives can then be used to relate relevance of two documents (e.g. two documents in the same cluster should both be relevant to the same request). Methods in this spirit include,
  • cluster-based information retrieval
  • cluster-based document expansion such as latent semantic analysis
    Latent semantic analysis
    Latent semantic analysis is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close...

     or its language modeling equivalents. It is important to ensure that clusters – either in isolation or combination – successfully model the set of possible relevant documents.


A second interpretation, most notably advanced by Ellen Voorhees, focuses on the local relationships between documents. The local interpretation avoids having to model the number or size of clusters in the collection and allow relevance at multiple scales. Methods in this spirit include,
  • multiple cluster retrieval
  • spreading activation and relevance propagation methods
  • local document expansion
  • score regularization

Local methods require an accurate and appropriate document similarity measure.

Epistemological issues

Are users best at evaluating the relevance of a given document, or is it better to use experts?
Most research about relevance in information retrieval in recent years have implicitly assumed that the users' evaluation of the output a given system should be used to increase "relevance" output. An alternative strategy would be to use journal impact factor
Impact factor
The impact factor, often abbreviated IF, is a measure reflecting the average number of citations to articles published in science and social science journals. It is frequently used as a proxy for the relative importance of a journal within its field, with journals with higher impact factors deemed...

 to rank output and thus base relevance on expert evaluations. Other strategies, such as including diversity of the search results, may be used as well. The important thing to recognize is, however, that relevance is fundamentally a question of epistemology, not psychology
Psychology
Psychology is the study of the mind and behavior. Its immediate goal is to understand individuals and groups by both establishing general principles and researching specific cases. For many, the ultimate goal of psychology is to benefit society...

. (Peoples' psychology reflects certain epistemological influences).

Additional reading

  • Hjørland, B. (2010). The foundation of the concept of relevance. Journal of the American Society for Information Science and Technology, 61(2), 217-237.

  • Relevance : communication and cognition. by Dan Sperber; Deirdre Wilson. 2nd ed. Oxford; Cambridge, MA: Blackwell Publishers, 2001. ISBN 9780631198789

  • Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(3), 1915-1933. (pdf)

  • Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance. Journal of the American Society for Information Science and Technology, 58(13), 2126-2144. (pdf)

  • Saracevic, T. (2007). Relevance in information science. Invited Annual Thomson Scientific Lazerow Memorial Lecture at School of Information Sciences, University of Tennessee. September 19, 2007. (video)
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK