Precision and recall - AbsoluteAstronomy.com

Pattern recognition

In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...

and information retrieval

Information retrieval

Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

, precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance

Relevance

-Introduction:The concept of relevance is studied in many different fields, including cognitive sciences, logic and library and information science. Most fundamentally, however, it is studied in epistemology...

. When a program for recognizing the dogs in a scene correctly identifies four of the nine dogs but mistakes three cats for dogs, its precision is 4/7 while its recall is 4/9. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3.

In statistics

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, if the null hypothesis

Null hypothesis

The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...

is that all and only the relevant items are retrieved, absence of type I and type II errors

Type I and type II errors

In statistical test theory the notion of statistical error is an integral part of hypothesis testing. The test requires an unambiguous statement of a null hypothesis, which usually corresponds to a default "state of nature", for example "this person is healthy", "this accused is not guilty" or...

corresponds respectively to maximum precision (no false positives) and maximum recall (no false negatives). The above pattern recognition example contained 7 − 4 = 3 type I errors and 9 − 4 = 5 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity.

In even simpler terms, a high recall means you haven't missed anything but you may have a lot of useless results to sift through (which would imply low precision). High precision means that everything returned was a relevant result, but you might not have found all the relevant items (which would imply low recall).

Introduction

As an example, in an information retrieval

Information retrieval

scenario, the instances are documents and the task is to return a set of relevant documents given a search term; or equivalently, to assign each document to one of two categories, "relevant" and "not relevant". In this case, the "relevant" documents are simply those that belong to the "relevant" category. Recall is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents, while precision is defined as the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search.

In a classification task, the precision for a class is the number of true positives (i.e. the number of items correctly labeled as belonging to the positive class) divided by the total number of elements labeled as belonging to the positive class (i.e. the sum of true positives and false positives

Type I and type II errors

, which are items incorrectly labeled as belonging to the class). Recall in this context is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (i.e. the sum of true positives and false negatives

Type I and type II errors

, which are items which were not labeled as belonging to the positive class but should have been).

In information retrieval, a perfect precision score of 1.0 means that every result retrieved by a search was relevant (but says nothing about whether all relevant documents were retrieved) whereas a perfect recall score of 1.0 means that all relevant documents were retrieved by the search (but says nothing about how many irrelevant documents were also retrieved).

In a classification task, a precision score of 1.0 for a class C means that every item labeled as belonging to class C does indeed belong to class C (but says nothing about the number of items from class C that were not labeled correctly) whereas a recall of 1.0 means that every item from class C was labeled as belonging to class C (but says nothing about how many other items were incorrectly also labeled as belonging to class C).

Often, there is an inverse relationship between precision and recall, where it is possible to increase one at the cost of reducing the other. For example, an information retrieval system (such as a search engine

Search engine

A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

) can often increase its recall by retrieving more documents, at the cost of increasing number of irrelevant documents retrieved (decreasing precision).
Similarly, a classification system for deciding whether or not, say, a fruit is an orange, can achieve high precision by only classifying fruits with the exact right shape and color as oranges, but at the cost of low recall due to the number of false negatives from oranges that did not quite match the specification.

Usually, precision and recall scores are not discussed in isolation. Instead, either values for one measure are compared for a fixed level at the other measure (e.g. precision at a recall level of 0.75) or both are combined into a single measure, such as the F-measure, which is the weighted harmonic mean of precision and recall (see below), or the Matthews correlation coefficient

Matthews Correlation Coefficient

The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes...

Definition (information retrieval context)

In information retrieval

Information retrieval

contexts, precision and recall are defined in terms of a set of retrieved documents (e.g. the list of documents produced by a web search engine

Web search engine

A web search engine is designed to search for information on the World Wide Web and FTP servers. The search results are generally presented in a list of results often referred to as SERPS, or "search engine results pages". The information may consist of web pages, images, information and other...

for a query) and a set of relevant documents (e.g. the list of all documents on the internet that are relevant for a certain topic), cf. relevance

Relevance

Precision

In the field of information retrieval

Information retrieval

, precision is the fraction of retrieved documents that are relevant

Relevance (information retrieval)

In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user.-Types:...

to the search:

Precision takes all retrieved documents into account, but it can also be evaluated at a given cut-off rank, considering only the topmost results returned by the system. This measure is called precision at n or P@n.

For example for a text search on a set of documents precision is the number of correct results divided by the number of all returned results.

Precision is also used with recall, the percent of all relevant documents that is returned by the search. The two measures are sometimes used together in the F1 Score

F1 Score

In statistics, the F1 score is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct results divided by the number of all returned results and r is the number of correct results divided by the number of...

(or f-measure) to provide a single measurement for a system.

Note that the meaning and usage of "precision" in the field of Information Retrieval differs from the definition of accuracy and precision

Accuracy and precision

In the fields of science, engineering, industry and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's actual value. The precision of a measurement system, also called reproducibility or repeatability, is the degree to which...

within other branches of science and technology.

Recall

Recall in information retrieval is the fraction of the documents that are relevant to the query that are successfully retrieved.

For example for text search on a set of documents recall is the number of correct results divided by the number of results that should have been returned

In binary classification, recall is called sensitivity. So it can be looked at as the probability that a relevant document is retrieved by the query.

It is trivial to achieve recall of 100% by returning all documents in response to any query. Therefore, recall alone is not enough but one needs to measure the number of non-relevant documents also, for example by computing the precision.

Definition (classification context)

For classification tasks, the terms true positives, true negatives, false positives, and false negatives (see also Type I and type II errors

Type I and type II errors

) compare the results of the classifier under test with trusted external judgments. The terms positive and negative refer to the classifier's prediction (sometimes known as the observation), and the terms true and false refer to whether that prediction corresponds to the external judgment (sometimes known as the expectation). This is illustrated by the table below:


	actual class (expectation)

predicted class (observation)	tp (true positive) Correct result	fp (false positive) Unexpected result
predicted class (observation)	fn (false negative) Missing result	tn (true negative) Correct absence of result

Precision and recall are then defined as:

Recall in this context is also referred to as the True Positive Rate, other related measures used in classification include True Negative Rate and Accuracy:. True Negative Rate is also called Specificity.

Probabilistic interpretation

It is possible to interpret precision and recall not as ratios but as probabilities:

Precision is the probability that a (randomly selected) retrieved document is relevant.

Recall is the probability that a (randomly selected) relevant document is retrieved in a search.

Note that the random selection refers to a uniform distribution over the appropriate pool of documents; i.e. by randomly selected retrieved document, we mean selecting a document from the set of retrieved documents in a random fashion. The random selection should be such that all documents in the set are equally likely to be selected.

Note that, in a typical classification system, the probability that a retrieved document is relevant depends on the document. The above interpretation extends to that scenario also (needs explanation).

Another interpretation for precision and recall is as follows. Precision is the average probability of relevant retrieval. Recall is the average probability of complete retrieval. Here we average over multiple retrieval queries.

F-measure

A measure that combines precision and recall is the harmonic mean

Harmonic mean

In mathematics, the harmonic mean is one of several kinds of average. Typically, it is appropriate for situations when the average of rates is desired....

of precision and recall, the traditional F-measure or balanced F-score:

This is also known as the

measure, because recall and precision are evenly weighted.

It is a special case of the general

measure (for non-negative real values of

Two other commonly used

measures are the

measure, which weights recall higher than precision, and the

measure, which puts more emphasis on precision than recall.

The F-measure was derived by van Rijsbergen (1979) so that

"measures the effectiveness of retrieval with respect to a user who attaches

times as much importance to recall as precision". It is based on van Rijsbergen's effectiveness measure

. Their relationship is

where

Limitations as goals

There are other parameters and strategies for performance metric of information retrieval system. In particular, for web document

Web document

A web document is similar in concept to a web page, but also satisfies the following broader definition:The term "web document" has been used as a fuzzy term in many sources A web document is similar in concept to a web page, but also satisfies the following broader (W3C) definition:The term "web...

retrieval, if the user's objectives are not clear, the precision and recall can't be optimized. As summarized by D. Lopresti,

"Browsing is a comfortable and powerful paradigm (the serendipity effect
Serendipity
Serendipity means a "happy accident" or "pleasant surprise"; specifically, the accident of finding something good or useful without looking for it. The word has been voted as one of the ten English words hardest to translate in June 2004 by a British translation company. However, due to its...

).

Search results don't have to be very good.

Recall? Not important (as long as you get at least some good hits).

Precision? Not important (as long as at least some of the hits on the first page you return are good).

Sources

Baeza-Yates, R.; Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM Press, Addison-Wesley. Seiten 75 ff. ISBN 0-201-39829-X

Hjørland, Birger (2010). The foundation of the concept of relevance. Journal of the American Society for Information Science and Technology, 61(2), 217-237.

Makhoul, John; Francis Kubala; Richard Schwartz; Ralph Weischedel: Performance measures for information extraction. In: Proceedings of DARPA Broadcast News Workshop, Herndon, VA, February 1999.

van Rijsbergen, C.V.: Information Retrieval. London; Boston. Butterworth, 2nd Edition 1979. ISBN 0-408-70929-4

External links

Information Retrieval – C. J. van Rijsbergen 1979

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Introduction

Definition (information retrieval context)

Precision

Recall

Definition (classification context)

Probabilistic interpretation

F-measure

Limitations as goals

See also

Sources

External links