Unsolved problems in statistics
Encyclopedia
There are many longstanding unsolved problems in mathematics
for which a solution has still not yet been found. The unsolved problems in statistics
are generally of a different flavor; according to John Tukey
, "difficulties in identifying problems have delayed statistics far more than difficulties in solving problems." A list of "one or two open problems" (in fact 22 of them) was given by David Cox
.
Unsolved problems in mathematics
This article lists some unsolved problems in mathematics. See individual articles for details and sources.- Millennium Prize Problems :Of the seven Millennium Prize Problems set by the Clay Mathematics Institute, six have yet to be solved:* P versus NP...
for which a solution has still not yet been found. The unsolved problems in statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
are generally of a different flavor; according to John Tukey
John Tukey
John Wilder Tukey ForMemRS was an American statistician.- Biography :Tukey was born in New Bedford, Massachusetts in 1915, and obtained a B.A. in 1936 and M.Sc. in 1937, in chemistry, from Brown University, before moving to Princeton University where he received a Ph.D...
, "difficulties in identifying problems have delayed statistics far more than difficulties in solving problems." A list of "one or two open problems" (in fact 22 of them) was given by David Cox
David Cox (statistician)
Sir David Roxbee Cox FRS is a prominent British statistician.-Early years:Cox studied mathematics at St. John's College, Cambridge and obtained his PhD from the University of Leeds in 1949, advised by Henry Daniels and Bernard Welch.-Career:He was employed from 1944 to 1946 at the Royal Aircraft...
.
Inference and testing
- How to detect and correct for systematic errorSystematic errorSystematic errors are biases in measurement which lead to the situation where the mean of many separate measurements differs significantly from the actual value of the measured attribute. All measurements are prone to systematic errors, often of several different types...
s, especially in sciences where random errorRandom errorRandom errors are errors in measurement that lead to measurable values being inconsistent when repeated measures of a constant attribute or quantity are taken...
s are large (a situation Tukey termed uncomfortable scienceUncomfortable scienceUncomfortable science is the term coined by statistician John Tukey for cases in which there is a need to draw an inference from a limited sample of data, where further samples influenced by the same cause system will not be available...
). - The Graybill-Deal estimator is often used to estimate the common mean of two normal populations with unknown and possibly unequal variances. Though this estimator is generally unbiased, its admissibilityAdmissible decision ruleIn statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....
remains to be shown. - Meta-analysisMeta-analysisIn statistics, a meta-analysis combines the results of several studies that address a set of related research hypotheses. In its simplest form, this is normally by identification of a common measure of effect size, for which a weighted average might be the output of a meta-analyses. Here the...
: Though independent p-valueP-valueIn statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α ,...
s can be combined using Fisher's methodFisher's MethodIn statistics, Fisher's method, also known as Fisher's combined probability test, is a technique for data fusion or "meta-analysis" . It was developed by and named for Ronald Fisher...
, techniques are still being developed to handle the case of dependent p-values. - Behrens–Fisher problem: Yuri LinnikYuri LinnikYuri Vladimirovich Linnik was a Soviet mathematician active in number theory, probability theory and mathematical statistics.Linnik was born in Bila Tserkva, in present-day Ukraine. He went to St Petersburg University where his supervisor was Vladimir Tartakovski, and later worked at that...
showed in 1966 that there is no uniformly most powerful testUniformly most powerful testIn statistical hypothesis testing, a uniformly most powerful test is a hypothesis test which has the greatest power 1 − β among all possible tests of a given size α...
for the difference of two means when the variances are unknown and possibly unequal. That is, there is no exact testExact testIn statistics, an exact test is a test where all assumptions upon which the derivation of the distribution of the test statistic is based are met, as opposed to an approximate test, in which the approximation may be made as close as desired by making the sample size big enough...
(meaning that, if the means are in fact equal, one that rejects the null hypothesisNull hypothesisThe practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
with probability exactly α) that is also the most powerful for all values of the variances (which are thus nuisance parameters). Though there are many approximate solutions (such as Welch's t-test), the problem continues to attract attention as one of the classic problems in statistics. - Multiple comparisonsMultiple comparisonsIn statistics, the multiple comparisons or multiple testing problem occurs when one considers a set of statistical inferences simultaneously. Errors in inference, including confidence intervals that fail to include their corresponding population parameters or hypothesis tests that incorrectly...
: There are various ways to adjust p-values to compensate for the simultaneous or sequential testing of hypothesis. Of particular interest is how to simultaneously control the overall error rate, preserve statistical power, and incorporate the dependence between tests into the adjustment. These issues are especially relevant when the number of simultaneous tests can be very large, as is increasingly the case in the analysis of data from DNA microarrayDNA microarrayA DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...
s.
Experimental design
- As the theory of Latin squareLatin squareIn combinatorics and in experimental design, a Latin square is an n × n array filled with n different symbols, each occurring exactly once in each row and exactly once in each column...
s is a cornerstone in the design of experimentsDesign of experimentsIn general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
, solving the problems in Latin squaresProblems in Latin squaresIn mathematics, the theory of Latin squares is an active research area with many open problems. As in other areas of mathematics, such problems are often made public at professional conferences and meetings...
could have immediate applicability to experimental design.
Problems of a more philosophical nature
- Sunrise problemSunrise problemThe sunrise problem can be expressed as follows: "What is the probability that the sun will rise tomorrow?"The sunrise problem illustrates the difficulty of using probability theory when evaluating the plausibility of statements or beliefs....
: What is the probability that the sun will rise tomorrow? - Doomsday argumentDoomsday argumentThe Doomsday argument is a probabilistic argument that claims to predict the number of future members of the human species given only an estimate of the total number of humans born so far...
: How valid is the probabilistic argumentProbabilistic argumentProbabilistic argument can refer to the following:* In some contexts, probabilistic argument means any argument involving probability theory...
that claims to predict the futureFutureThe future is the indefinite time period after the present. Its arrival is considered inevitable due to the existence of time and the laws of physics. Due to the nature of the reality and the unavoidability of the future, everything that currently exists and will exist is temporary and will come...
lifetime of the human raceHuman RaceHuman Race refers to the Human species.Human race may also refer to:*The Human Race, 79th episode of YuYu Hakusho* Human Race Theatre Company of Dayton Ohio* Human Race Machine, a computer graphics device...
given only an estimate of the total number of humans born so far? - Exchange paradox: within the subjectivistic interpretationBayesian probabilityBayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...
of probability theoryProbabilityProbability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
; more specifically within Bayesian decision theory. This is still an open problem among the subjectivists as no consensus has been reached yet. Examples include:- The two envelopes problemTwo envelopes problemThe two envelopes problem, also known as the exchange paradox, is a brain teaser, puzzle or paradox in logic, philosophy, probability and recreational mathematics, of special interest in decision theory and for the Bayesian interpretation of probability theory...
- The Necktie ParadoxNecktie paradoxThe necktie paradox is a puzzle or paradox within the subjectivistic interpretation of probability theory. It is a variation of the two-envelope paradox....
- The two envelopes problem