Optimal discriminant analysis - AbsoluteAstronomy.com

Optimal discriminant analysis (ODA) and the related classification tree analysis (CTA) are statistical methods that maximize predictive accuracy. For any specific sample and exploratory or confirmatory hypothesis, optimal discriminant analysis (ODA) identifies the statistical model that yields maximum predictive accuracy, assesses the exact Type I error rate, and evaluates potential cross-generalizability. Optimal discriminant analysis may be applied to > 0 dimensions, with the one-dimensional case being referred to as UniODA and the multidimensional case being referred to as MultiODA. Classification tree analysis is a generalization of optimal discriminant analysis to non-orthogonal trees. Classification tree analysis has more recently been called "hierarchical optimal discriminant analysis". Optimal discriminant analysis and classification tree analysis may be used to find the combination of variables and cut points that best separate classes of objects or events. These variables and cut points may then be used to reduce dimensions and to then build a statistical model that optimally describes the data.

Optimal discriminant analysis may be thought of as a generalization of Fisher's linear discriminant analysis

Linear discriminant analysis

Linear discriminant analysis and the related Fisher's linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events...

. Optimal discriminant analysis is an alternative to ANOVA

Analysis of variance

In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

(analysis of variance) and regression analysis

Regression analysis

In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

, which attempt to express one dependent variable as a linear combination of other features or measurements. However, ANOVA and regression analysis give a dependent variable that is a numerical variable, while optimal discriminant analysis gives a dependent variable that is a class variable.

Data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
Decision tree
Decision tree
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. Decision trees are commonly used in operations research, specifically...
Factor analysis
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
Linear classifier
Linear classifier
In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics...
Logit
Logit
The logit function is the inverse of the sigmoidal "logistic" function used in mathematics, especially in statistics.Log-odds and logit are synonyms.-Definition:The logit of a number p between 0 and 1 is given by the formula:...

(for logistic regression
Logistic regression
In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...

)
Machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
Multidimensional scaling
Multidimensional scaling
Multidimensional scaling is a set of related statistical techniques often used in information visualization for exploring similarities or dissimilarities in data. MDS is a special case of ordination. An MDS algorithm starts with a matrix of item–item similarities, then assigns a location to each...
Perceptron
Perceptron
The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.- Definition :...
Preference regression
Quadratic classifier
Quadratic classifier
A quadratic classifier is used in machine learning and statistical classification to separate measurements of two or more classes of objects or events by a quadric surface...
Statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

External links

LDA tutorial using MS Excel
IMSL discriminant analysis function DSCRM, which has many useful mathematical definitions.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

See also

External links