Poisson regression
Encyclopedia
In statistics
, Poisson regression is a form of regression analysis
used to model count data and contingency table
s. Poisson regression assumes the response variable Y has a Poisson distribution
, and assumes the logarithm
of its expected value
can be modeled by a linear combination of unknown parameter
s. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.
If is a vector of independent variables, then the model takes the form
,
where and . Sometimes this is written more compactly as
,
where x is now an n+1-dimensional vector consisting of n independent variables concatenated to some constant, usually 1. Here θ is simply a concatenated to b.
Thus, when given a Poisson regression model θ and an input vector , the predicted mean of the associated Poisson distribution is given by
.
If Yi are independent
observations with corresponding values xi of the predictor variable, then θ can be estimated by maximum likelihood
. The maximum-likelihood estimates lack a closed-form expression
and must be found by numerical methods. The probability surface for maximum-likelihood Poisson regression is always convex, making Newton-Raphson or other gradient-based methods appropriate estimation techniques.
Poisson regression models are generalized linear model
s with the logarithm as the (canonical) link function, and the Poisson
distribution function.
,
and thus, the Poisson distribution's probability mass function
is given by
Now suppose we are given a data set consisting of m vectors , along with a set of m values . Then, for a given set of parameters θ, the probability of attaining this particular set of data is given by
.
By the method of maximum likelihood
, we wish to find the set of parameters θ that makes this probability as large as possible. To do this, the equation is first rewritten as a likelihood function
in terms of θ:
.
Note that the expression on the right hand side
has not actually changed. A formula in this form is typically difficult to work with; instead, one uses the log-likelihood:
.
Notice that the parameters θ only appear in the first two terms of each term in the summation. Therefore, given that we are only interested in finding the best value for θ we may drop the yi! and simply write
.
To find a maximum, we need to solve an equation which has no closed-form solution. However, the negative log-likelhood, , is a convex function, and so standard convex optimization or gradient descent
techniques can be applied to find the optimal value of θ.
which implies
is that its mean is equal to its variance. In certain circumstances, it will be found that the observed variance
is greater than the mean; this is known as overdispersion
and indicates that the model is not appropriate. A common reason is the omission of relevant explanatory variables. Under some circumstances, the problem of overdispersion can be solved by using a negative binomial distribution
instead.
Another common problem with Poisson regression is excess zeros: if there are two processes at work, one determining whether there are zero events or any events, and a Poisson process determining how many events there are, there will be more zeros than a Poisson regression would predict. An example would be the distribution of cigarettes smoked in an hour by members of a group where some individuals are non-smokers.
Other generalized linear model
s such as the negative binomial
model may function better in these cases.
: see proportional hazards models
for descriptions of Cox models.
,
where m is the number of examples in the data set, and is the probability mass function
of the Poisson distribution
with the mean set to . Regularization can be added to this optimization problem by instead maximizing
,
for some positive constant . This technique, similar to ridge regression, can reduce overfitting
.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, Poisson regression is a form of regression analysis
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
used to model count data and contingency table
Contingency table
In statistics, a contingency table is a type of table in a matrix format that displays the frequency distribution of the variables...
s. Poisson regression assumes the response variable Y has a Poisson distribution
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
, and assumes the logarithm
Logarithm
The logarithm of a number is the exponent by which another fixed value, the base, has to be raised to produce that number. For example, the logarithm of 1000 to base 10 is 3, because 1000 is 10 to the power 3: More generally, if x = by, then y is the logarithm of x to base b, and is written...
of its expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
can be modeled by a linear combination of unknown parameter
Parameter
Parameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....
s. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.
If is a vector of independent variables, then the model takes the form
,
where and . Sometimes this is written more compactly as
,
where x is now an n+1-dimensional vector consisting of n independent variables concatenated to some constant, usually 1. Here θ is simply a concatenated to b.
Thus, when given a Poisson regression model θ and an input vector , the predicted mean of the associated Poisson distribution is given by
.
If Yi are independent
Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...
observations with corresponding values xi of the predictor variable, then θ can be estimated by maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
. The maximum-likelihood estimates lack a closed-form expression
Closed-form expression
In mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...
and must be found by numerical methods. The probability surface for maximum-likelihood Poisson regression is always convex, making Newton-Raphson or other gradient-based methods appropriate estimation techniques.
Poisson regression models are generalized linear model
Generalized linear model
In statistics, the generalized linear model is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to...
s with the logarithm as the (canonical) link function, and the Poisson
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
distribution function.
Maximum likelihood-based parameter estimation
Given a set of parameters θ and an input vector x, the mean of the predicted Poisson distribution, as stated above, is given by,
and thus, the Poisson distribution's probability mass function
Probability mass function
In probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...
is given by
Now suppose we are given a data set consisting of m vectors , along with a set of m values . Then, for a given set of parameters θ, the probability of attaining this particular set of data is given by
.
By the method of maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
, we wish to find the set of parameters θ that makes this probability as large as possible. To do this, the equation is first rewritten as a likelihood function
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...
in terms of θ:
.
Note that the expression on the right hand side
Sides of an equation
In mathematics, LHS is informal shorthand for the left-hand side of an equation. Similarly, RHS is the right-hand side. Each is solely a name for a term as part of an expression; and they are in practice interchangeable, since equality is symmetric...
has not actually changed. A formula in this form is typically difficult to work with; instead, one uses the log-likelihood:
.
Notice that the parameters θ only appear in the first two terms of each term in the summation. Therefore, given that we are only interested in finding the best value for θ we may drop the yi! and simply write
.
To find a maximum, we need to solve an equation which has no closed-form solution. However, the negative log-likelhood, , is a convex function, and so standard convex optimization or gradient descent
Gradient descent
Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point...
techniques can be applied to find the optimal value of θ.
Poisson regression in practice
Poisson regression is appropriate when the dependent variable is a count, for instance of events such as the arrival of a telephone call at a call centre. The events must be independent in the sense that the arrival of one call will not make another more or less likely, but the probability per unit time of events is understood to be related to covariates such as time of day."Exposure" and offset
Poisson regression is also appropriate for rate data, where the rate is a count of events occurring to a particular unit of observation, divided by some measure of that unit's exposure. For example, biologists may count the number of tree species in a forest, and the rate would be the number of species per square kilometre. Demographers may model death rates in geographic areas as the count of deaths divided by person−years. More generally, event rates can be calculated as events per unit time, which allows the observation window to vary for each unit. In these examples, exposure is respectively unit area, person−years and unit time. In Poisson regression this is handled as an offset, where the exposure variable enters on the right-hand side of the equation, but with a parameter estimate (for log(exposure)) constrained to 1.which implies
Overdispersion
A characteristic of the Poisson distributionPoisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
is that its mean is equal to its variance. In certain circumstances, it will be found that the observed variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
is greater than the mean; this is known as overdispersion
Overdispersion
In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....
and indicates that the model is not appropriate. A common reason is the omission of relevant explanatory variables. Under some circumstances, the problem of overdispersion can be solved by using a negative binomial distribution
Negative binomial distribution
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...
instead.
Another common problem with Poisson regression is excess zeros: if there are two processes at work, one determining whether there are zero events or any events, and a Poisson process determining how many events there are, there will be more zeros than a Poisson regression would predict. An example would be the distribution of cigarettes smoked in an hour by members of a group where some individuals are non-smokers.
Other generalized linear model
Generalized linear model
In statistics, the generalized linear model is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to...
s such as the negative binomial
Negative binomial distribution
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified number of failures occur...
model may function better in these cases.
Use in survival analysis
Poisson regression creates proportional hazards models, one class of survival analysisSurvival analysis
Survival analysis is a branch of statistics which deals with death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, and duration analysis or duration modeling in economics or sociology...
: see proportional hazards models
Proportional hazards models
Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity. In a proportional hazards model, the unique effect of a unit increase in a covariate...
for descriptions of Cox models.
Regularized Poisson Regression
When estimating the parameters for Poisson regression, one typically tries to find values for θ that maximize the likelihood of an expression of the form,
where m is the number of examples in the data set, and is the probability mass function
Probability mass function
In probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...
of the Poisson distribution
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
with the mean set to . Regularization can be added to this optimization problem by instead maximizing
,
for some positive constant . This technique, similar to ridge regression, can reduce overfitting
Overfitting
In statistics, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations...
.
Implementations
Some statistics packages include implementations of Poisson regression.- MATLABMATLABMATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...
Statistics Toolbox: Poisson regression can be performed using the "glmfit" and "glmval" functions. - Microsoft ExcelMicrosoft ExcelMicrosoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...
: Excel is not capable of doing Poisson regression by default. One of the Excel Add-ins for Poisson regression is XPost - RR (programming language)R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
: The function for fitting a generalized linear model in R is glm, and can be used for Poisson Regression - SASSAS SystemSAS is an integrated system of software products provided by SAS Institute Inc. that enables programmers to perform:* retrieval, management, and mining* report writing and graphics* statistical analysis...
: Poisson regression in SAS is done by using GENMOD - SPSSSPSSSPSS is a computer program used for survey authoring and deployment , data mining , text analytics, statistical analysis, and collaboration and deployment ....
: In SPSS, Poisson regression is done by using the GENLIN command - StataStataStata is a general-purpose statistical software package created in 1985 by StataCorp. It is used by many businesses and academic institutions around the world...
: Stata has a procedure for Poisson regression named "poisson"