Multivariate analysis
Encyclopedia
Multivariate analysis is based on the statistical principle of multivariate statistics
, which involves observation and analysis of more than one statistical variable
at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest.
Uses for multivariate analysis include:
Multivariate analysis can be complicated by the desire to include physics-based analysis to calculate the effects of variables for a hierarchical "system-of-systems." Often, studies that wish to use multivariate analysis are stalled by the dimensionality of the problem. These concerns are often eased through the use of surrogate model
s, highly accurate approximations of the physics-based code. Since surrogate models take the form of an equation, they can be evaluated very quickly. This becomes an enabler for large-scale MVA studies: while a Monte Carlo simulation across the design space is difficult with physics-based codes, it becomes trivial when evaluating surrogate models, which often take the form of response surface
equations.
Factor analysis is used to uncover the latent structure (dimensions) of a set of variables. It reduces attribute space from a larger number of variables to a smaller number of factors. Factor analysis originated a century ago with Charles Spearman's attempts to show that a wide variety of mental tests could be explained by a single underlying intelligence factor.
Applications:
• To reduce a large number of variables to a smaller number of factors for data modeling
• To validate a scale or index by demonstrating that its constituent items load on the same factor, and to drop proposed scale items which cross-load on more than one factor.
• To select a subset of variables from a larger set, based on which original variables have the highest correlations with the principal component factors.
• To create a set of factors to be treated as uncorrelated variables as one approach to handling multi-collinearity in such procedures as multiple regression
Factor analysis is part of the general linear model (GLM) family of procedures and makes many of the same assumptions as multiple regression
s: Admissibility
, unbiasedness
and monotonicity.
Multivariate statistics
Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...
, which involves observation and analysis of more than one statistical variable
Variable
Variable may refer to:* Variable , a logical set of attributes* Variable , a symbol that represents a quantity in an algebraic expression....
at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest.
Uses for multivariate analysis include:
- Design for capability (also known as capability-based design)
- Inverse design, where any variable can be treated as an independent variable
- Analysis of AlternativesAnalysis of AlternativesThis article refers to the Analysis of Alternatives military process, not the general business practice. The AoA is a cornerstone of Military Acquisition, and deliberately embodies the fair and competitive character of the United States business atmosphere...
(AoA), the selection of concepts to fulfill a customer need - Analysis of concepts with respect to changing scenarios
- Identification of critical design drivers and correlations across hierarchical levels.
Multivariate analysis can be complicated by the desire to include physics-based analysis to calculate the effects of variables for a hierarchical "system-of-systems." Often, studies that wish to use multivariate analysis are stalled by the dimensionality of the problem. These concerns are often eased through the use of surrogate model
Surrogate model
Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the air flow around the wing for...
s, highly accurate approximations of the physics-based code. Since surrogate models take the form of an equation, they can be evaluated very quickly. This becomes an enabler for large-scale MVA studies: while a Monte Carlo simulation across the design space is difficult with physics-based codes, it becomes trivial when evaluating surrogate models, which often take the form of response surface
Response surface methodology
In statistics, response surface methodology explores the relationships between several explanatory variables and one or more response variables. The method was introduced by G. E. P. Box and K. B. Wilson in 1951. The main idea of RSM is to use a sequence of designed experiments to obtain an...
equations.
Factor analysis
Overview:Factor analysis is used to uncover the latent structure (dimensions) of a set of variables. It reduces attribute space from a larger number of variables to a smaller number of factors. Factor analysis originated a century ago with Charles Spearman's attempts to show that a wide variety of mental tests could be explained by a single underlying intelligence factor.
Applications:
• To reduce a large number of variables to a smaller number of factors for data modeling
• To validate a scale or index by demonstrating that its constituent items load on the same factor, and to drop proposed scale items which cross-load on more than one factor.
• To select a subset of variables from a larger set, based on which original variables have the highest correlations with the principal component factors.
• To create a set of factors to be treated as uncorrelated variables as one approach to handling multi-collinearity in such procedures as multiple regression
Factor analysis is part of the general linear model (GLM) family of procedures and makes many of the same assumptions as multiple regression
History
Anderson's 1958 textbook, An Introduction to Multivariate Analysis, educated a generation of theorists and applied statisticians; Anderson's book emphasizes hypothesis testing via likelihood ratio tests and the properties of power functionStatistical power
The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is actually false . The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis...
s: Admissibility
Admissible decision rule
In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....
, unbiasedness
Bias of an estimator
In statistics, bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased.In ordinary English, the term bias is...
and monotonicity.
See also
- Univariate analysisUnivariate analysisUnivariate analysis is the simplest form of quantitative analysis. The analysis is carried out with the description of a single variable and its attributes of the applicable unit of analysis...
- Bivariate analysisBivariate analysisBivariate analysis is one of the simplest forms of the quantitative analysis. It involves the analysis of two variables , for the purpose of determining the empirical relationship between them...
- Pattern recognitionPattern recognitionIn machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...
- Exploratory data analysisExploratory data analysisIn statistics, exploratory data analysis is an approach to analysing data sets to summarize their main characteristics in easy-to-understand form, often with visual graphs, without using a statistical model or having formulated a hypothesis...
- Principal component analysis (PCA)
- Design of experimentsDesign of experimentsIn general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
(DoE) - Soft independent modelling of class analogiesSoft independent modelling of class analogiesSoft independent modelling by class analogy is a statistical method for supervised classification of data. The method requires a training data set consisting of samples with a set of attributes and their class membership...
(SIMCA) - Regression analysisRegression analysisIn statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
- OLSOrdinary least squaresIn statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...
- Partial least squares regressionPartial least squares regressionPartial least squares regression is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the...
Software and tools
- TMVA - Toolkit for Multivariate Data Analysis in ROOTROOTROOT is an object-oriented program and library developed by CERN. It was originally designed for particle physics data analysis and contains several features specific to this field, but it is also used in other applications such as astronomy and data mining....
- XLSTATXLSTATXLSTAT is a commercial statistical and multivariate analysis software. The software has been developed by Addinsoft and was introduced by Thierry Fahmy, the founder of Addinsoft, in 1993. It is a Microsoft Excel add-in...
Add-in for Excel for statistics and multivariate analysis - The UnscramblerThe UnscramblerThe Unscrambler is a commercial software product for multivariate data analysis, used primarily for calibration in the application of near infrared spectroscopy and development of predictive models for use in real-time spectroscopic analysis of materials. The software was originally developed in...
(free-to-try commercial MVA software for Windows) - ControlMV, PharmaMV and WaterMV from Perceptive Engineering
- SIMCA-P+ (Professional MVA Software, free demo)
- RR (programming language)R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
(See Task View: Multivariate for relevant packages) - OCCAM is a web-based discrete multivariate modeling tool based on the methodology of reconstructability analysis, from Portland State UniversityPortland State UniversityPortland State University is a public state urban university located in downtown Portland, Oregon, United States. Founded in 1946, it has the largest overall enrollment of any university in the state of Oregon, including undergraduate and graduate students. It is also the only public university in...
Further reading
(M.A. level "likelihood" approach)- Feinstein, A. R. (1996) Multivariable Analysis. New Haven, CT: Yale University Press.
- Hair, J. F. Jr. (1995) Multivariate Data Analysis with Readings, 4th ed. Prentice-Hall.
- Schafer, J. L. (1997) Analysis of Incomplete Multivariate Data. CRC Press. (Advanced)
- Sharma, S. (1996) Applied Multivariate Techniques. Wiley. (Informal, applied)