Ranking function
Encyclopedia
In information retrieval
, a ranking function is a function used by search engine
s to rank matching documents according to their relevance
to a given search query.
Once a search engine has identified a set of potentially relevant documents, it faces the task of determining which articles are most relevant, so that they may be listed first. This is typically done by assigning a numerical score to each document based on a ranking function, which incorporates features of the document, the query, and the overall document collection.
Some very simple ranking functions include:
More sophisticated ranking functions include:
Ranking functions are evaluated by a variety of means; one of the simplest is determining the precision of the first k top-ranked results for some fixed k; for example, the proportion of the top 10 results that are relevant, on average over many queries.
Frequently, computation of ranking functions can be simplified by taking advantage of the observation that only the relative order of scores matters, not their absolute value; hence terms or factors that are independent of the document may be removed, and terms or factors that are independent of the query may be precomputed and stored with the document.
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
, a ranking function is a function used by search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
s to rank matching documents according to their relevance
Relevance (information retrieval)
In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user.-Types:...
to a given search query.
Once a search engine has identified a set of potentially relevant documents, it faces the task of determining which articles are most relevant, so that they may be listed first. This is typically done by assigning a numerical score to each document based on a ranking function, which incorporates features of the document, the query, and the overall document collection.
Some very simple ranking functions include:
- The constant ranking function assigning the same score to all documents.
- The term frequency ranking function counting the number of times that each query term occurs in the document, then summing these.
- The tf-idf ranking function computing the product of the term frequency and inverse document frequency for each query term, then summing these.
More sophisticated ranking functions include:
- Okapi BM25Okapi BM25In information retrieval, Okapi BM25 is a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and...
: a variant of tf-idf. , represents the state-of-the-art and is used in many practical applications. - Machine-learned ranking formulas, obtained automatically from training data by machine learningMachine learningMachine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
methods.
Ranking functions are evaluated by a variety of means; one of the simplest is determining the precision of the first k top-ranked results for some fixed k; for example, the proportion of the top 10 results that are relevant, on average over many queries.
Frequently, computation of ranking functions can be simplified by taking advantage of the observation that only the relative order of scores matters, not their absolute value; hence terms or factors that are independent of the document may be removed, and terms or factors that are independent of the query may be precomputed and stored with the document.