Latent variable
Encyclopedia
In statistics
, latent variables (as opposed to observable variable
s), are variables
that are not directly observed but are rather inferred (through a mathematical model
) from other variables that are observed (directly measured). Mathematical models that aim to explain observed variables in terms of latent variables are called latent variable model
s. Latent variable models are used in many disciplines, including economics
, machine learning
/artificial intelligence
, bioinformatics
, natural language processing
, psychology
, and the social sciences
.
Sometimes latent variables correspond to aspects of physical reality, which could in principle be measured, but may not be for practical reasons. In this situation, the term hidden variables is commonly used (reflecting the fact that the variables are "really there", but hidden). Other times, latent variables correspond to abstract concepts, like categories, behavioral or mental states, or data structures. The terms hypothetical variables or hypothetical constructs may be used in these situations.
One advantage of using latent variables is that it reduces the dimensionality
of data. A large number of observable variables can be aggregated in a model to represent an underlying concept, making it easier to understand the data. In this sense, they serve a function similar to that of scientific theories. At the same time, latent variables link observable ("sub-symbolic") data in the real world to symbolic data in the modeled world.
Latent variables, as created by factor analytic methods, generally represent 'shared' variance, or the degree to which variables 'move' together. Variables that have no correlation cannot result in a latent construct based on the common factor model
.
include quality of life
, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly. However, given an economic model linking these latent variables to other, observable variables (such as GDP
), the values of the latent variables can be inferred from measurements of the observable variables.
is often used for inferring latent variables.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, latent variables (as opposed to observable variable
Observable variable
In statistics, observable variables or manifest variables, as opposed to latent variables, are those variables that can be observed and directly measured.- See also :* Observables in physics* Observability in control theory* Latent variable model...
s), are variables
Variable (mathematics)
In mathematics, a variable is a value that may change within the scope of a given problem or set of operations. In contrast, a constant is a value that remains unchanged, though often unknown or undetermined. The concepts of constants and variables are fundamental to many areas of mathematics and...
that are not directly observed but are rather inferred (through a mathematical model
Mathematical model
A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used not only in the natural sciences and engineering disciplines A mathematical model is a...
) from other variables that are observed (directly measured). Mathematical models that aim to explain observed variables in terms of latent variables are called latent variable model
Latent variable model
A latent variable model is a statistical model that relates a set of variables to a set of latent variables.It is assumed that 1) the responses on the indicators or manifest variables are the result of...
s. Latent variable models are used in many disciplines, including economics
Economics
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek from + , hence "rules of the house"...
, machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
/artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
, bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
, natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
, psychology
Psychology
Psychology is the study of the mind and behavior. Its immediate goal is to understand individuals and groups by both establishing general principles and researching specific cases. For many, the ultimate goal of psychology is to benefit society...
, and the social sciences
Social sciences
Social science is the field of study concerned with society. "Social science" is commonly used as an umbrella term to refer to a plurality of fields outside of the natural sciences usually exclusive of the administrative or managerial sciences...
.
Sometimes latent variables correspond to aspects of physical reality, which could in principle be measured, but may not be for practical reasons. In this situation, the term hidden variables is commonly used (reflecting the fact that the variables are "really there", but hidden). Other times, latent variables correspond to abstract concepts, like categories, behavioral or mental states, or data structures. The terms hypothetical variables or hypothetical constructs may be used in these situations.
One advantage of using latent variables is that it reduces the dimensionality
Dimensionality reduction
In machine learning, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.-Feature selection:...
of data. A large number of observable variables can be aggregated in a model to represent an underlying concept, making it easier to understand the data. In this sense, they serve a function similar to that of scientific theories. At the same time, latent variables link observable ("sub-symbolic") data in the real world to symbolic data in the modeled world.
Latent variables, as created by factor analytic methods, generally represent 'shared' variance, or the degree to which variables 'move' together. Variables that have no correlation cannot result in a latent construct based on the common factor model
Factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
.
Economics
Examples of latent variables from the field of economicsEconomics
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek from + , hence "rules of the house"...
include quality of life
Quality of life
The term quality of life is used to evaluate the general well-being of individuals and societies. The term is used in a wide range of contexts, including the fields of international development, healthcare, and politics. Quality of life should not be confused with the concept of standard of...
, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly. However, given an economic model linking these latent variables to other, observable variables (such as GDP
Gross domestic product
Gross domestic product refers to the market value of all final goods and services produced within a country in a given period. GDP per capita is often considered an indicator of a country's standard of living....
), the values of the latent variables can be inferred from measurements of the observable variables.
Psychology
- The "Big Five personality traitsBig Five personality traitsIn contemporary psychology, the "Big Five" factors of personality are five broad domains or dimensions of personality which are used to describe human personality....
" have been inferred using factor analysisFactor analysisFactor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
. - extraversion
- spatial ability
- wisdom “Two of the more predominant means of assessing wisdom include wisdom-related performance and latent variable measures.”
Common methods for inferring latent variables
- Hidden Markov modelHidden Markov modelA hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...
s - Factor analysisFactor analysisFactor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
- Principal component analysis
- Latent semantic analysisLatent semantic analysisLatent semantic analysis is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close...
and Probabilistic latent semantic analysisProbabilistic latent semantic analysisProbabilistic latent semantic analysis , also known as probabilistic latent semantic indexing is a statistical technique for the analysis of two-mode and co-occurrence data. PLSA evolved from latent semantic analysis, adding a sounder probabilistic model... - EM algorithms
Bayesian algorithms and methods
Bayesian statisticsBayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...
is often used for inferring latent variables.
- Latent Dirichlet AllocationLatent Dirichlet allocationIn statistics, latent Dirichlet allocation is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar...
- The Chinese Restaurant Process is often used to provide a prior distribution over assignments of objects to latent categories.
- The Indian buffet process is often used to provide a prior distribution over assignments of latent binary features to objects.