Generalized Linear Model
A Generalized linear model (GLM) extends the classic Linear Regression model in two ways:
- Exponential Family
- Allows a link between the outcome and the predictors that satisfies
- The function is called the link function
- The regression function is .
The generalized linear model can also be combined with Generalized Linear Regression, where the regression function is , which is still linear in the parameter, but could be transformed non-linearly into a new feature space. Together, we have the regression function .
- 📗 If , then and satisfies the requirement, as .
Remarks
- Note that just like a Gaussian Linear Model, GLM is a model for the data-generating distribution that enable tractable Regression methods. In other words, it’s an assumption rather than a specific algorithm.
- We can use regression methods tailored for GLM even if the assumption does not hold. If we want to do so, we should design the best GLM that describes the data.
- Since the data-generating model is parametrized, the Regression task reduces to the Estimation of , and MLE is a natural method for this task.
- Here the regression function is chosen to be the conditional mean. One can also extend the model to consider other regression functions.
Canonical GLM
Conventionally, GLM often refers to the canonical form that follows a natural exponential dispersion model:
where is called a dispersion parameter (and could be known); see more in Exponential Family. We remark that not every exponential family can be cast into this natural exponential dispersion model, but many common distributions can be.
Clearly, depends on . In canonical GLM, we model the natural parameter linearly: , and link it to the conditional mean:
This is called the canonical link.
As calculated in Moments of Dispersion Exponential Family, we have . Thus, we have
Furthermore, . Therefore, if , we have
A positive variance means the problem is not degenerated. And the strictly monotonicity also ensures the invertibility of and , which further enables the identifiability of .
MLE for GLM
We focus on MLE for canonical GLM. We use the nice properties of natural exponential dispersion family to derive the likelihood of . We first have
where . Then the log-likelihood is
If we use the canonical link, then , and thus , i.e., ; modeling by is the motivation of the canonical link. Further, the MLE reduces to
Moreover, a positive , a positive variance (^bpp), and a full column rank gives a negative definite Hessian of the log-likelihood:
Therefore, is unique and can be attained by either Convex Optimization or solving the zero of the gradient:
- 💡 The asymptotic normality of MLE applies to GLMs.
Applications
Ordinary Least Squares by Gaussian Distribution
When the GLM is Gaussian Linear Model with canonical link, the MLE gives Ordinary Least Squares, the first regression method. In this case, is a continuous numerical.
To see this, the natural exponential dispersion form of Gaussian has with a dispersion parameter . Then, is linear, making the zero point of the log-likelihood (^zerog) satisfy
Logistic Regression by Bernoulli Distribution
When the GLM is Bernoulli Distribution with canonical link, the MLE gives Logistic Regression. In this case, is a categorical with two possible outcomes.
To see this, we have
Therefore, the canonical link function is
This link is called the logit link, and is called the sigmoid function:
Poisson Regression by Poisson Distribution
When the GLM is Poisson Distribution with canonical link, the MLE gives Poisson Regression. In this case, is a count variable, which is a non-negative integer.
Other
- Multinomial distribution with MLE gives multinomial or ordinal regression, where is categorical with more than two outcomes.
- Gamma Distribution with MLE gives Gamma regression, where is positive, e.g., amount of an insurance claim.