Optimal design
Encyclopedia
Optimal designs are a class of experimental designs
that are optimal
with respect to some statistical
criterion.
In the design of experiments
for estimating
statistical model
s, optimal designs allow parameters to be estimated without bias
and with minimum-variance
. A non-optimal design requires a greater number of experimental runs
to estimate
the parameters
with the same precision
as an optimal design. In practical terms, optimal experiments can reduce the costs of experimentation.
The optimality of a design depends on the statistical model
and is assessed with respect to a statistical criterion, which is related to the variance-matrix of the estimator. Specifying an appropriate model and specifying a suitable criterion function both require understanding of statistical theory
and practical knowledge with designing experiments
.
Optimal designs are also called optimum designs.
:
It is known that the least squares
estimator minimizes the variance
of mean
-unbiased
estimators (under the conditions of the Gauss–Markov theorem
). In the estimation
theory for statistical model
s with one real
parameter
, the reciprocal of the variance of an ("efficient") estimator is called the "Fisher information
" for that estimator. Because of this reciprocity, minimizing the variance corresponds to maximizing the information.
When the statistical model
has several parameter
s, however, the mean
of the parameter-estimator is a vector
and its variance
is a matrix
. The inverse matrix of the variance-matrix is called the "information matrix". Because the variance of the estimator of a parameter vector is a matrix, the problem of "minimizing the variance" is complicated. Using statistical theory
, statisticians compress the information-matrix using real-valued summary statistics
; being real-valued functions, these "information criteria" can be maximized. The traditional optimality-criteria are invariants
of the information
matrix; algebraically, the traditional optimality-criteria are functionals
of the eigenvalues of the information matrix.
Other optimality-criteria are concerned with the variance of predictions
:
In many applications, the statistician is most concerned with a "parameter of interest" rather than with "nuisance parameters". More generally, statisticians consider linear combination
s of parameters, which are estimated via linear combinations of treatment-means in the design of experiments
and in the analysis of variance
; such linear combinations are called contrasts
. Statisticians can use appropriate optimality-criteria for such parameters of interest and for more generally for contrasts
.
In addition, major statistical systems like SAS
and R
have procedures for optimizing a design according to a user's specification. The experimenter must specify a model
for the design and an optimality-criterion before the method can compute an optimal design.
and practical knowledge in designing experiments.
dependent: While an optimal design is best for that model
, its performance may deteriorate on other models
. On other models
, an optimal design can be either better or worse than a non-optimal design. Therefore, it is important to benchmark
the performance of designs under alternative models
.
Indeed, there are several classes of designs for which all the traditional optimality-criteria agree, according to the theory of "universal optimality" of Kiefer
. The experience of practitioners like Cornell and the "universal optimality" theory of Kiefer suggest that robustness with respect to changes in the optimality-criterion is much greater than is robustness with respect to changes in the model.
All of the traditional optimality-criteria are convex (or concave) functions
, and therefore optimal-designs are amenable to the mathematical theory of convex analysis
and their computation can use specialized methods of convex minimization. The practitioner need not select exactly one traditional, optimality-criterion, but can specify a custom criterion. In particular, the practitioner can specify a convex criterion using the maxima of convex optimality-criteria and nonnegative combinations
of optimality criteria (since these operations preserve convex functions). For convex optimality criteria, the Kiefer
-Wolfowitz
equivalence theorem allows the practitioner to verify that a given design is globally optimal. The Kiefer
-Wolfowitz
equivalence theorem is related with the Legendre
-Fenchel
conjugacy
for convex function
s.
If an optimality-criterion lacks convexity
, then finding a global optimum
and verifying its optimality often are difficult.
supporting pharmacokinetics
and pharmacodynamics
, following the work of Cox and Atkinson.
, they can specify a probability-measure
on the models and then select any design maximizing the expected value
of such an experiment. Such probability-based optimal-designs are called optimal Bayesian
designs
. Such Bayesian designs
are used especially for generalized linear models (where the response follows an exponential-family
distribution).
The use of a Bayesian design
does not force statisticians to use Bayesian methods
to analyze the data, however. Indeed, the "Bayesian" label for probability-based experimental-designs is disliked by some researchers. Alternative terminology for "Bayesian" optimality includes "on-average" optimality or "population" optimality.
was pioneered by Abraham Wald
. In 1972, Herman Chernoff
wrote an overview of optimal sequential designs, while adaptive designs were surveyed later by S. Zacks. Of course, much work on the optimal design of experiments is related to the theory of optimal decision
s, especially the statistical decision theory of Abraham Wald
.
are discussed in the textbook by Atkinson, Donev and Tobias, and in the survey of Gaffke and Heiligers and in the mathematical text of Pukelsheim. The blocking
of optimal designs is discussed in the textbook of Atkinson, Donev and Tobias and also in the monograph by Goos.
The earliest optimal designs were developed to estimate the parameters of regression models with continuous variables, for example, by J. D. Gergonne
in 1815 (Stigler). In English, two early contributions were made by Charles S. Peirce and Kirstine Smith.
Pioneering designs for multivariate response-surfaces
were proposed by George E. P. Box
. However, Box's designs have few optimality properties. Indeed, the Box-Behnken design
requires excessive experimental runs when the number of variables exceeds three.
Box's "central-composite" designs
require more experimental runs than do the optimal designs of Kôno.
and in systems
and control
. Popular methods include stochastic approximation
and other methods of stochastic optimization
. Much of this research has been associated with the subdiscipline of system identification
.
In computational optimal control
, D. Judin & A. Nemirovskii and Boris Polyak has described methods that are more efficient than the (Armijo-style) step-size rules introduced by G. E. P. Box
in response-surface methodology
Adaptive designs are used in clinical trials, and optimal adaptive designs are surveyed in the Handbook of Experimental Designs chapter by Shelemyahu Zacks.
. Of course, fixing the number of experimental runs a priori would be impractical. Prudent statisticians examine the other optimal designs, whose number of experimental runs differ.
that is supported
on an infinite set of observation-locations. Such optimal probability-measure designs solve a mathematical problem that neglected to specify the cost of observations and experimental runs. Nonetheless, such optimal probability-measure designs can be discretized
to furnish approximately
optimal designs.
In some cases, a finite set of observation-locations suffices to support
an optimal design. Such a result was proved by Kôno and Kiefer
in their works on response-surface designs
for quadratic models. The Kôno-Kiefer analysis explains why optimal designs for response-surfaces can have discrete supports, which are very similar as do the less efficient designs that have been traditional in response surface methodology
.
, foresaw that experimental designs should be improved. Researchers who improved experiments were praised in Bacon's utopian novel
New Atlantis
:
In 1815, an article on optimal designs for polynomial regression
was published by Joseph Diaz Gergonne
, according to Stigler.
Charles S. Peirce proposed an economic theory of scientific experimentation in 1876, which sought to maximize the precision of the estimates. Peirce's optimal allocation immediately improved the accuracy of gravitational experiments and was used for decades by Peirce and his colleagues. In his 1882 published lecture at Johns Hopkins University
, Peirce introduced experimental design with these words:
Like Bacon, Peirce was aware that experimental methods should strive for substantial improvement (even optimality).
Kirstine Smith proposed optimal designs for polynomial models in 1918. (Kirstine Smith had been a student of the Danish statistician Thorvald N. Thiele
and was working with Karl Pearson
in London.)
are discussed by Bailey and by Bapat. The first chapter of Bapat's book reviews the linear algebra
used by Bailey (or the advanced books below). Bailey's exercises and discussion of randomization
both emphasize statistical concepts (rather than algebraic computations). Draft available on-line. (Especially Chapter 11.8 "Optimality") (Chapter 5 "Block designs and optimality", pages 99–111)
Optimal block designs
are discussed in the advanced monograph by Shah and Sinha and in the survey-articles by Cheng and by Majumdar, which are cited below.
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
that are optimal
Optimization (mathematics)
In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....
with respect to some statistical
Statistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that...
criterion.
In the design of experiments
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
for estimating
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
s, optimal designs allow parameters to be estimated without bias
Bias of an estimator
In statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...
and with minimum-variance
Minimum-variance unbiased estimator
In statistics a uniformly minimum-variance unbiased estimator or minimum-variance unbiased estimator is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.The question of determining the UMVUE, if one exists, for a particular...
. A non-optimal design requires a greater number of experimental runs
Replication (statistics)
In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. ASTM, in standard E1847, defines replication as "the repetition of the set of all the treatment combinations to be compared in...
to estimate
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
the parameters
Parametric model
In statistics, a parametric model or parametric family or finite-dimensional model is a family of distributions that can be described using a finite number of parameters...
with the same precision
Efficiency (statistics)
In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors...
as an optimal design. In practical terms, optimal experiments can reduce the costs of experimentation.
The optimality of a design depends on the statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
and is assessed with respect to a statistical criterion, which is related to the variance-matrix of the estimator. Specifying an appropriate model and specifying a suitable criterion function both require understanding of statistical theory
Statistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that...
and practical knowledge with designing experiments
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
.
Optimal designs are also called optimum designs.
Advantages of optimal designs
Optimal designs offer three advantages over suboptimal experimental designsDesign of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
:
- Optimal designs reduce the costs of experimentation by allowing statistical modelStatistical modelA statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
s to be estimated with fewer experimental runs. - Optimal designs can accommodate multiple types of factors, such as process, mixture, and discrete factors.
- Designs can be optimized when the design-space is constrained, for example, when the mathematical process-space contains factor-settings that are practically infeasible (e.g. due to safety concerns).
Minimizing the variance of estimators
Experimental designs are evaluated using statistical criteria.It is known that the least squares
Least squares
The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...
estimator minimizes the variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
of mean
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
-unbiased
Bias of an estimator
In statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...
estimators (under the conditions of the Gauss–Markov theorem
Gauss–Markov theorem
In statistics, the Gauss–Markov theorem, named after Carl Friedrich Gauss and Andrey Markov, states that in a linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances, the best linear unbiased estimator of the coefficients is given by the...
). In the estimation
Estimation
Estimation is the calculated approximation of a result which is usable even if input data may be incomplete or uncertain.In statistics,*estimation theory and estimator, for topics involving inferences about probability distributions...
theory for statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
s with one real
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...
parameter
Parameter
Parameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....
, the reciprocal of the variance of an ("efficient") estimator is called the "Fisher information
Fisher information
In mathematical statistics and information theory, the Fisher information is the variance of the score. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior...
" for that estimator. Because of this reciprocity, minimizing the variance corresponds to maximizing the information.
When the statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
has several parameter
Parameter
Parameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....
s, however, the mean
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of the parameter-estimator is a vector
Coordinate vector
In linear algebra, a coordinate vector is an explicit representation of a vector in an abstract vector space as an ordered list of numbers or, equivalently, as an element of the coordinate space Fn....
and its variance
Covariance matrix
In probability theory and statistics, a covariance matrix is a matrix whose element in the i, j position is the covariance between the i th and j th elements of a random vector...
is a matrix
Matrix (mathematics)
In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...
. The inverse matrix of the variance-matrix is called the "information matrix". Because the variance of the estimator of a parameter vector is a matrix, the problem of "minimizing the variance" is complicated. Using statistical theory
Statistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that...
, statisticians compress the information-matrix using real-valued summary statistics
Summary statistics
In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible...
; being real-valued functions, these "information criteria" can be maximized. The traditional optimality-criteria are invariants
Invariant theory
Invariant theory is a branch of abstract algebra dealing with actions of groups on algebraic varieties from the point of view of their effect on functions...
of the information
Fisher information
In mathematical statistics and information theory, the Fisher information is the variance of the score. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior...
matrix; algebraically, the traditional optimality-criteria are functionals
Functional (mathematics)
In mathematics, and particularly in functional analysis, a functional is a map from a vector space into its underlying scalar field. In other words, it is a function that takes a vector as its input argument, and returns a scalar...
of the eigenvalues of the information matrix.
- A-optimality ("average" or trace)
- One criterion is A-optimality, which seeks to minimize the traceTrace (linear algebra)In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal of A, i.e.,...
of the inverse of the information matrix. This criterion results in minimizing the average variance of the estimates of the regression coefficients.
- One criterion is A-optimality, which seeks to minimize the trace
- C-optimality
- D-optimality (determinant)
- A popular criterion is D-optimality, which seeks to minimize |(X'X)−1|, or equivalently maximize the determinantDeterminantIn linear algebra, the determinant is a value associated with a square matrix. It can be computed from the entries of the matrix by a specific arithmetic expression, while other ways to determine its value exist as well...
of the information matrix X'X of the design. This criterion results in maximizing the differential Shannon informationDifferential entropyDifferential entropy is a concept in information theory that extends the idea of entropy, a measure of average surprisal of a random variable, to continuous probability distributions.-Definition:...
content of the parameter estimates.
- A popular criterion is D-optimality, which seeks to minimize |(X'X)−1|, or equivalently maximize the determinant
- E-optimality (eigenvalue)
- Another design is E-optimality, which maximizes the minimum eigenvalue of the information matrix. The E-optimality criterion need not be differentiableDerivativeIn calculus, a branch of mathematics, the derivative is a measure of how a function changes as its input changes. Loosely speaking, a derivative can be thought of as how much one quantity is changing in response to changes in some other quantity; for example, the derivative of the position of a...
at every point. Such E-optimal designs can be computed using methods of convex minimization that use subgradientsSubderivativeIn mathematics, the concepts of subderivative, subgradient, and subdifferential arise in convex analysis, that is, in the study of convex functions, often in connection to convex optimization....
rather than gradientGradientIn vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....
s at points of non-differentiability. Any non-differentiability need not be a serious problem, however: E-optimality problems are special cases of semidefinite-programming problemsSemidefinite programmingSemidefinite programming is a subfield of convex optimization concerned with the optimization of a linear objective functionover the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron....
which have effective solution-methods, especially bundle methods and interior-point methodsInterior point methodInterior point methods are a certain class of algorithms to solve linear and nonlinear convex optimization problems.The interior point method was invented by John von Neumann...
.
- Another design is E-optimality, which maximizes the minimum eigenvalue of the information matrix. The E-optimality criterion need not be differentiable
- T-optimality
- This criterion maximizes the traceTrace (linear algebra)In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal of A, i.e.,...
of the information matrix.
- This criterion maximizes the trace
Other optimality-criteria are concerned with the variance of predictions
Predictive inference
Predictive inference is an approach to statistical inference that emphasizes the prediction of future observations based on past observations.Initially, predictive inference was based on observable parameters and it was the main purpose of studying probability, but it fell out of favor in the 20th...
:
- G-optimality
- A popular criterion is G-optimality, which seeks to minimize the maximum entry in the diagonal of the hat matrixHat matrixIn statistics, the hat matrix, H, maps the vector of observed values to the vector of fitted values. It describes the influence each observed value has on each fitted value...
X(X'X)−1X'. This has the effect of minimizing the maximum variance of the predicted values.
- A popular criterion is G-optimality, which seeks to minimize the maximum entry in the diagonal of the hat matrix
- I-optimality (integrated)
- A second criterion on prediction variance is I-optimality, which seeks to minimize the average prediction variance over the design space.
- V-optimality (variance)
- A third criterion on prediction variance is V-optimality, which seeks to minimizes the average prediction variance over a set of m specific points.
Contrasts
In many applications, the statistician is most concerned with a "parameter of interest" rather than with "nuisance parameters". More generally, statisticians consider linear combination
Linear combination
In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results...
s of parameters, which are estimated via linear combinations of treatment-means in the design of experiments
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
and in the analysis of variance
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...
; such linear combinations are called contrasts
Contrast (statistics)
In statistics, particularly analysis of variance, a contrast is a linear combination of two or more factor level means whose coefficients add up to zero. A simple contrast is the difference between two means...
. Statisticians can use appropriate optimality-criteria for such parameters of interest and for more generally for contrasts
Contrast (statistics)
In statistics, particularly analysis of variance, a contrast is a linear combination of two or more factor level means whose coefficients add up to zero. A simple contrast is the difference between two means...
.
Finding optimal designs
Catalogs of optimal designs occur in books and in software libraries.In addition, major statistical systems like SAS
SAS System
SAS is an integrated system of software products provided by SAS Institute Inc. that enables programmers to perform:* retrieval, management, and mining* report writing and graphics* statistical analysis...
and R
R (programming language)
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
have procedures for optimizing a design according to a user's specification. The experimenter must specify a model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
for the design and an optimality-criterion before the method can compute an optimal design.
Practical considerations
Some advanced topics in optimal design require more statistical theoryStatistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that...
and practical knowledge in designing experiments.
Model dependence and robustness
Since the optimality criterion of most optimal designs is based on some function of the information matrix, the 'optimality' of a given design is modelStatistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
dependent: While an optimal design is best for that model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
, its performance may deteriorate on other models
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
. On other models
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
, an optimal design can be either better or worse than a non-optimal design. Therefore, it is important to benchmark
Benchmarking
Benchmarking is the process of comparing one's business processes and performance metrics to industry bests and/or best practices from other industries. Dimensions typically measured are quality, time and cost...
the performance of designs under alternative models
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
.
Choosing an optimality criterion and robustness
The choice of an appropriate optimality criterion requires some thought, and it is useful to benchmark the performance of designs with respect to several optimality criteria. Cornell writes that
since the [traditional optimality] criteria . . . are variance-minimizing criteria, . . . a design that is optimal for a given model using one of the . . . criteria is usually near-optimal for the same model with respect to the other criteria.
Indeed, there are several classes of designs for which all the traditional optimality-criteria agree, according to the theory of "universal optimality" of Kiefer
Jack Kiefer (mathematician)
Jack Carl Kiefer was an American statistician.- Biography :Jack Kiefer was born on January 25, 1924, in Cincinnati, Ohio, to Carl Jack Kiefer and Marguerite K. Rosenau...
. The experience of practitioners like Cornell and the "universal optimality" theory of Kiefer suggest that robustness with respect to changes in the optimality-criterion is much greater than is robustness with respect to changes in the model.
Flexible optimality criteria and convex analysis
High-quality statistical software provide a combination of libraries of optimal designs or iterative methods for constructing approximately optimal designs, depending on the model specified and the optimality criterion. Users may use a standard optimality-criterion or may program a custom-made criterion.All of the traditional optimality-criteria are convex (or concave) functions
Convex function
In mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...
, and therefore optimal-designs are amenable to the mathematical theory of convex analysis
Convex analysis
Convex analysis is the branch of mathematics devoted to the study of properties of convex functions and convex sets, often with applications in convex minimization, a subdomain of optimization theory....
and their computation can use specialized methods of convex minimization. The practitioner need not select exactly one traditional, optimality-criterion, but can specify a custom criterion. In particular, the practitioner can specify a convex criterion using the maxima of convex optimality-criteria and nonnegative combinations
Conical combination
Given a finite number of vectors x_1, x_2, \dots, x_n\, in a real vector space, a conical combination or a conical sum of these vectors is a vector of the formwhere the real numbers \alpha_i\, satisfy \alpha_i\ge 0...
of optimality criteria (since these operations preserve convex functions). For convex optimality criteria, the Kiefer
Jack Kiefer (mathematician)
Jack Carl Kiefer was an American statistician.- Biography :Jack Kiefer was born on January 25, 1924, in Cincinnati, Ohio, to Carl Jack Kiefer and Marguerite K. Rosenau...
-Wolfowitz
Jacob Wolfowitz
Jacob Wolfowitz was a Polish-born American statistician and Shannon Award-winning information theorist. He was the father of former Deputy Secretary of Defense and World Bank Group President Paul Wolfowitz....
equivalence theorem allows the practitioner to verify that a given design is globally optimal. The Kiefer
Jack Kiefer (mathematician)
Jack Carl Kiefer was an American statistician.- Biography :Jack Kiefer was born on January 25, 1924, in Cincinnati, Ohio, to Carl Jack Kiefer and Marguerite K. Rosenau...
-Wolfowitz
Jacob Wolfowitz
Jacob Wolfowitz was a Polish-born American statistician and Shannon Award-winning information theorist. He was the father of former Deputy Secretary of Defense and World Bank Group President Paul Wolfowitz....
equivalence theorem is related with the Legendre
Legendre transformation
In mathematics, the Legendre transformation or Legendre transform, named after Adrien-Marie Legendre, is an operation that transforms one real-valued function of a real variable into another...
-Fenchel
Fenchel's duality theorem
In mathematics, Fenchel's duality theorem is a result in the theory of convex functions named after Werner Fenchel.Let ƒ be a proper convex function on Rn and let g be a proper concave function on Rn...
conjugacy
Convex conjugate
In mathematics, convex conjugation is a generalization of the Legendre transformation. It is also known as Legendre–Fenchel transformation or Fenchel transformation .- Definition :...
for convex function
Convex function
In mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...
s.
If an optimality-criterion lacks convexity
Quasiconvex function
In mathematics, a quasiconvex function is a real-valued function defined on an interval or on a convex subset of a real vector space such that the inverse image of any set of the form is a convex set...
, then finding a global optimum
Global optimization
Global optimization is a branch of applied mathematics and numerical analysis that deals with the optimization of a function or a set of functions to some criteria.- General :The most common form is the minimization of one real-valued function...
and verifying its optimality often are difficult.
Model selection
When scientists wish to test several theories, then a statistician can design an experiment that allows optimal tests between specified models. Such "discrimination experiments" are especially important in the biostatisticsBiostatistics
Biostatistics is the application of statistics to a wide range of topics in biology...
supporting pharmacokinetics
Pharmacokinetics
Pharmacokinetics, sometimes abbreviated as PK, is a branch of pharmacology dedicated to the determination of the fate of substances administered externally to a living organism...
and pharmacodynamics
Pharmacodynamics
Pharmacodynamics is the study of the biochemical and physiological effects of drugs on the body or on microorganisms or parasites within or on the body and the mechanisms of drug action and the relationship between drug concentration and effect...
, following the work of Cox and Atkinson.
Bayesian experimental design
When practitioners need to consider multiple modelsStatistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
, they can specify a probability-measure
Probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity...
on the models and then select any design maximizing the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of such an experiment. Such probability-based optimal-designs are called optimal Bayesian
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
designs
Bayesian experimental design
Bayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment...
. Such Bayesian designs
Bayesian experimental design
Bayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment...
are used especially for generalized linear models (where the response follows an exponential-family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
distribution).
The use of a Bayesian design
Bayesian experimental design
Bayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment...
does not force statisticians to use Bayesian methods
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
to analyze the data, however. Indeed, the "Bayesian" label for probability-based experimental-designs is disliked by some researchers. Alternative terminology for "Bayesian" optimality includes "on-average" optimality or "population" optimality.
Iterative experimentation
Scientific experimentation is an iterative process, and statisticians have developed several approaches to the optimal design of sequential experiments.Sequential analysis
Sequential analysisSequential analysis
In statistics, sequential analysis or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance. Instead data are evaluated as they are collected, and further sampling is stopped in accordance with a pre-defined stopping rule as soon as significant results...
was pioneered by Abraham Wald
Abraham Wald
- See also :* Sequential probability ratio test * Wald distribution* Wald–Wolfowitz runs test...
. In 1972, Herman Chernoff
Herman Chernoff
Herman Chernoff is an American applied mathematician, statistician and physicist formerly a professor at MIT and currently working at Harvard University.-Education:* Ph.D., Applied Mathematics, 1948. Brown University....
wrote an overview of optimal sequential designs, while adaptive designs were surveyed later by S. Zacks. Of course, much work on the optimal design of experiments is related to the theory of optimal decision
Optimal decision
An optimal decision is a decision such that no other available decision options will lead to a better outcome. It is an important concept in decision theory. In order to compare the different decision outcomes, one commonly assigns a relative utility to each of them...
s, especially the statistical decision theory of Abraham Wald
Abraham Wald
- See also :* Sequential probability ratio test * Wald distribution* Wald–Wolfowitz runs test...
.
Response-surface methodology
Optimal designs for response-surface modelsResponse surface methodology
In statistics, response surface methodology explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an...
are discussed in the textbook by Atkinson, Donev and Tobias, and in the survey of Gaffke and Heiligers and in the mathematical text of Pukelsheim. The blocking
Blocking (statistics)
In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. For example, an experiment is designed to test a new drug on patients. There are two levels of the treatment, drug, and placebo, administered to male...
of optimal designs is discussed in the textbook of Atkinson, Donev and Tobias and also in the monograph by Goos.
The earliest optimal designs were developed to estimate the parameters of regression models with continuous variables, for example, by J. D. Gergonne
Joseph Diaz Gergonne
Joseph Diaz Gergonne was a French mathematician and logician.-Life:In 1791, Gergonne enlisted in the French army as a captain. That army was undergoing rapid expansion because the French government feared a foreign invasion intended to undo the French Revolution and restore Louis XVI to full power...
in 1815 (Stigler). In English, two early contributions were made by Charles S. Peirce and Kirstine Smith.
Pioneering designs for multivariate response-surfaces
Response surface methodology
In statistics, response surface methodology explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an...
were proposed by George E. P. Box
George E. P. Box
- External links :* from a at NIST* * * * * *** For Box's PhD students see*...
. However, Box's designs have few optimality properties. Indeed, the Box-Behnken design
Box-Behnken design
In statistics, Box–Behnken designs are experimental designs for response surface methodology, devised by George E. P. Box and Donald Behnken in 1960, to achieve the following goals:...
requires excessive experimental runs when the number of variables exceeds three.
Box's "central-composite" designs
Central composite design
In statistics, a central composite design is an experimental design, useful in response surface methodology, for building a second order model for the response variable without needing to use a complete three-level factorial experiment....
require more experimental runs than do the optimal designs of Kôno.
System identification and stochastic approximation
The optimization of sequential experimentation is studied also in stochastic programmingStochastic programming
Stochastic programming is a framework for modeling optimization problems that involve uncertainty. Whereas deterministic optimization problems are formulated with known parameters, real world problems almost invariably include some unknown parameters. When the parameters are known only within...
and in systems
Systems analysis
Systems analysis is the study of sets of interacting entities, including computer systems analysis. This field is closely related to requirements analysis or operations research...
and control
Control theory
Control theory is an interdisciplinary branch of engineering and mathematics that deals with the behavior of dynamical systems. The desired output of a system is called the reference...
. Popular methods include stochastic approximation
Stochastic approximation
Stochastic approximation methods are a family of iterative stochastic optimization algorithms that attempt to find zeroes or extrema of functions which cannot be computed directly, but only estimated via noisy observations....
and other methods of stochastic optimization
Stochastic optimization
Stochastic optimization methods are optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involve random objective functions or random constraints, for example. Stochastic...
. Much of this research has been associated with the subdiscipline of system identification
System identification
In control engineering, the field of system identification uses statistical methods to build mathematical models of dynamical systems from measured data...
.
In computational optimal control
Optimal control
Optimal control theory, an extension of the calculus of variations, is a mathematical optimization method for deriving control policies. The method is largely due to the work of Lev Pontryagin and his collaborators in the Soviet Union and Richard Bellman in the United States.-General method:Optimal...
, D. Judin & A. Nemirovskii and Boris Polyak has described methods that are more efficient than the (Armijo-style) step-size rules introduced by G. E. P. Box
George E. P. Box
- External links :* from a at NIST* * * * * *** For Box's PhD students see*...
in response-surface methodology
Response surface methodology
In statistics, response surface methodology explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an...
Adaptive designs are used in clinical trials, and optimal adaptive designs are surveyed in the Handbook of Experimental Designs chapter by Shelemyahu Zacks.
Using a computer to find a good design
There are several methods of finding an optimal design, given an a priori restriction on the number of experimental runs or replications. Some of these methods are discussed by Atkinson, Donev and Tobias and in the paper by Hardin and SloaneNeil Sloane
Neil James Alexander Sloane is a British-U.S. mathematician. His major contributions are in the fields of combinatorics, error-correcting codes, and sphere packing...
. Of course, fixing the number of experimental runs a priori would be impractical. Prudent statisticians examine the other optimal designs, whose number of experimental runs differ.
Discretizing probability-measure designs
In the mathematical theory on optimal experiments, an optimal design can be a probability measureProbability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity...
that is supported
Support (measure theory)
In mathematics, the support of a measure μ on a measurable topological space is a precise notion of where in the space X the measure "lives"...
on an infinite set of observation-locations. Such optimal probability-measure designs solve a mathematical problem that neglected to specify the cost of observations and experimental runs. Nonetheless, such optimal probability-measure designs can be discretized
Discretization
In mathematics, discretization concerns the process of transferring continuous models and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation on digital computers...
to furnish approximately
Approximation
An approximation is a representation of something that is not exact, but still close enough to be useful. Although approximation is most often applied to numbers, it is also frequently applied to such things as mathematical functions, shapes, and physical laws.Approximations may be used because...
optimal designs.
In some cases, a finite set of observation-locations suffices to support
Support (measure theory)
In mathematics, the support of a measure μ on a measurable topological space is a precise notion of where in the space X the measure "lives"...
an optimal design. Such a result was proved by Kôno and Kiefer
Jack Kiefer (mathematician)
Jack Carl Kiefer was an American statistician.- Biography :Jack Kiefer was born on January 25, 1924, in Cincinnati, Ohio, to Carl Jack Kiefer and Marguerite K. Rosenau...
in their works on response-surface designs
Response surface methodology
In statistics, response surface methodology explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an...
for quadratic models. The Kôno-Kiefer analysis explains why optimal designs for response-surfaces can have discrete supports, which are very similar as do the less efficient designs that have been traditional in response surface methodology
Response surface methodology
In statistics, response surface methodology explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an...
.
History
The prophet of scientific experimentation, Francis BaconFrancis Bacon
Francis Bacon, 1st Viscount St Albans, KC was an English philosopher, statesman, scientist, lawyer, jurist, author and pioneer of the scientific method. He served both as Attorney General and Lord Chancellor of England...
, foresaw that experimental designs should be improved. Researchers who improved experiments were praised in Bacon's utopian novel
Utopia
Utopia is an ideal community or society possessing a perfect socio-politico-legal system. The word was imported from Greek by Sir Thomas More for his 1516 book Utopia, describing a fictional island in the Atlantic Ocean. The term has been used to describe both intentional communities that attempt...
New Atlantis
New Atlantis
New Atlantis and similar can mean:*New Atlantis, a novel by Sir Francis Bacon*The New Atlantis, founded in 2003, a journal about the social and political dimensions of science and technology...
:
Then after divers meetings and consults of our whole number, to consider of the former labors and collections, we have three that take care out of them to direct new experiments, of a higher light, more penetrating into nature than the former. These we call lamps.
In 1815, an article on optimal designs for polynomial regression
Polynomial regression
In statistics, polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth order polynomial...
was published by Joseph Diaz Gergonne
Joseph Diaz Gergonne
Joseph Diaz Gergonne was a French mathematician and logician.-Life:In 1791, Gergonne enlisted in the French army as a captain. That army was undergoing rapid expansion because the French government feared a foreign invasion intended to undo the French Revolution and restore Louis XVI to full power...
, according to Stigler.
Charles S. Peirce proposed an economic theory of scientific experimentation in 1876, which sought to maximize the precision of the estimates. Peirce's optimal allocation immediately improved the accuracy of gravitational experiments and was used for decades by Peirce and his colleagues. In his 1882 published lecture at Johns Hopkins University
Johns Hopkins University
The Johns Hopkins University, commonly referred to as Johns Hopkins, JHU, or simply Hopkins, is a private research university based in Baltimore, Maryland, United States...
, Peirce introduced experimental design with these words:
Logic will not undertake to inform you what kind of experiments you ought to make in order best to determine the acceleration of gravity, or the value of the Ohm; but it will tell you how to proceed to form a plan of experimentation.
[....] Unfortunately practice generally precedes theory, and it is the usual fate of mankind to get things done in some boggling way first, and find out afterward how they could have been done much more easily and perfectly.
Like Bacon, Peirce was aware that experimental methods should strive for substantial improvement (even optimality).
Kirstine Smith proposed optimal designs for polynomial models in 1918. (Kirstine Smith had been a student of the Danish statistician Thorvald N. Thiele
Thorvald N. Thiele
Thorvald Nicolai Thiele was a Danish astronomer, actuary and mathematician, most notable for his work in statistics, interpolation and the three-body problem. He was the first to propose a mathematical theory of Brownian motion...
and was working with Karl Pearson
Karl Pearson
Karl Pearson FRS was an influential English mathematician who has been credited for establishing the disciplineof mathematical statistics....
in London.)
See also
- Bayesian experimental designBayesian experimental designBayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment...
- Blocking (statistics)Blocking (statistics)In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. For example, an experiment is designed to test a new drug on patients. There are two levels of the treatment, drug, and placebo, administered to male...
- Convex functionConvex functionIn mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...
- Convex minimization
- Design of experimentsDesign of experimentsIn general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
- Efficiency (statistics)Efficiency (statistics)In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors...
- Entropy (information theory)
- Fisher informationFisher informationIn mathematical statistics and information theory, the Fisher information is the variance of the score. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior...
- Glossary of experimental designGlossary of experimental design- Glossary :* Alias: When the estimate of an effect also includes the influence of one or more other effects the effects are said to be aliased . For example, if the estimate of effect D in a four factor experiment actually estimates , then the main effect D is aliased with the 3-way interaction ABC...
- Hadamard's maximal determinant problem
- Information theoryInformation theoryInformation theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
- Kiefer, JackJack Kiefer (mathematician)Jack Carl Kiefer was an American statistician.- Biography :Jack Kiefer was born on January 25, 1924, in Cincinnati, Ohio, to Carl Jack Kiefer and Marguerite K. Rosenau...
- Replication (statistics)Replication (statistics)In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. ASTM, in standard E1847, defines replication as "the repetition of the set of all the treatment combinations to be compared in...
- Response surface methodologyResponse surface methodologyIn statistics, response surface methodology explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an...
- Statistical modelStatistical modelA statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
- Wald, AbrahamAbraham Wald- See also :* Sequential probability ratio test * Wald distribution* Wald–Wolfowitz runs test...
- Wolfowitz, JacobJacob WolfowitzJacob Wolfowitz was a Polish-born American statistician and Shannon Award-winning information theorist. He was the father of former Deputy Secretary of Defense and World Bank Group President Paul Wolfowitz....
Textbooks emphasizing regression and response-surface methodology
The textbook by Atkinson, Donev and Tobias has been used for short courses for industrial practitioners as well as university courses.Textbooks emphasizing block designs
Optimal block designsRandomized block design
In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter...
are discussed by Bailey and by Bapat. The first chapter of Bapat's book reviews the linear algebra
Linear algebra
Linear algebra is a branch of mathematics that studies vector spaces, also called linear spaces, along with linear functions that input one vector and output another. Such functions are called linear maps and can be represented by matrices if a basis is given. Thus matrix theory is often...
used by Bailey (or the advanced books below). Bailey's exercises and discussion of randomization
Random assignment
Random assignment or random placement is an experimental technique for assigning subjects to different treatments . The thinking behind random assignment is that by randomizing treatment assignment, then the group attributes for the different treatments will be roughly equivalent and therefore any...
both emphasize statistical concepts (rather than algebraic computations). Draft available on-line. (Especially Chapter 11.8 "Optimality") (Chapter 5 "Block designs and optimality", pages 99–111)
Optimal block designs
Randomized block design
In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter...
are discussed in the advanced monograph by Shah and Sinha and in the survey-articles by Cheng and by Majumdar, which are cited below.
Articles and chapters
- R. H. Hardin and N. J. A. SloaneNeil SloaneNeil James Alexander Sloane is a British-U.S. mathematician. His major contributions are in the fields of combinatorics, error-correcting codes, and sphere packing...
, "A New Approach to the Construction of Optimal Designs", Journal of Statistical Planning and Inference, vol. 37, 1993, pp. 339-369