Generalized Linear Model

A Generalized linear model (GLM) extends the classic Linear Regression model in two ways:

$Y ∣ x \sim$ Exponential Family
Allows a link between the outcome and the predictors that satisfies $g (E [Y ∣ x]) = x^{T} 𝛽$
- The function $g$ is called the link function
- The regression function is $f_{β} (x) = (x \mapsto E [Y ∣ x]) = g^{- 1} (x^{T} β)$ .

The generalized linear model can also be combined with Generalized Linear Regression, where the regression function is $x \mapsto ϕ (x)^{T} β$ , which is still linear in the parameter, but $x$ could be transformed non-linearly into a new feature space. Together, we have the regression function $f_{β} (x) = g^{- 1} (ϕ (x)^{T} β)$ .

📗 If $E [Y ∣ x] = \frac{m x}{h + x}$ , then $g (y) = y^{- 1}$ and $ϕ (x) = x^{- 1}$ satisfies the requirement, as $g (E [Y ∣ x]) = \frac{1}{m} + \frac{h}{m} \frac{1}{x} = ϕ (1; x)^{T} (\frac{1}{m}; \frac{h}{m})$ .

Remarks

Note that just like a Gaussian Linear Model, GLM is a model for the data-generating distribution that enable tractable Regression methods. In other words, it’s an assumption rather than a specific algorithm.

We can use regression methods tailored for GLM even if the assumption does not hold. If we want to do so, we should design the best GLM that describes the data.

Since the data-generating model is parametrized, the Regression task reduces to the Estimation of $β$ , and MLE is a natural method for this task.

Here the regression function is chosen to be the conditional mean. One can also extend the model to consider other regression functions.

Canonical GLM

Conventionally, GLM often refers to the canonical form that $Y ∣ x$ follows a natural exponential dispersion model:

f (y_{i}; θ_{i}, ϕ) = exp (\frac{y _{i} θ _{i} - b ( θ _{i} )}{ϕ} + c (y_{i}, ϕ)), i = 1, \dots, n,

where $ϕ$ is called a dispersion parameter (and could be known); see more in Exponential Family. We remark that not every exponential family can be cast into this natural exponential dispersion model, but many common distributions can be.

Clearly, $θ_{i}$ depends on $x_{i}$ . In canonical GLM, we model the natural parameter linearly: $θ = x^{T} β$ , and link it to the conditional mean:

g (E [Y ∣ θ, ϕ]) = θ = x^{T} β .

This is called the canonical link.

As calculated in Moments of Dispersion Exponential Family, we have $E [Y ∣ θ, ϕ] = b^{'} (θ)$ . Thus, we have

g = b^{' - 1} .

Furthermore, $Var (Y ∣ θ, ϕ) = ϕ b^{''} (θ)$ . Therefore, if $ϕ > 0$ , we have

Var (Y ∣ θ, ϕ) > 0 ⟹ b^{''} (θ) > 0 ⟹ (g^{- 1})^{'} > 0 ⟹ g^{- 1} is strictly increasing ⟹ g is strictly increasing .

A positive variance means the problem is not degenerated. And the strictly monotonicity also ensures the invertibility of $g$ and $b^{'}$ , which further enables the identifiability of $β$ .

MLE for GLM

We focus on MLE for canonical GLM. We use the nice properties of natural exponential dispersion family to derive the likelihood of $β$ . We first have

θ_{i} = b^{' - 1} (E [Y_{i} ∣ X_{i}]) = b^{' - 1} (g^{- 1} (X_{i}^{T} β)) = (g \circ b^{'})^{- 1} (X_{i}^{T} β) = : h (X_{i}^{T} β),

where $h = (g \circ b^{'})^{- 1}$ . Then the log-likelihood is

ℓ (β ∣ (X, Y)) = i \sum \frac{Y _{i} θ _{i} - b ( θ _{i} )}{ϕ} + Const. = i \sum \frac{Y _{i} h ( X _{i}^{T} β ) - b ( h ( X _{i}^{T} β ))}{ϕ} + Const .

If we use the canonical link, then $g = b^{' - 1}$ , and thus $h = id$ , i.e., $θ_{i} = X_{i}^{T} β$ ; modeling $θ_{i}$ by $X_{i}^{T} β$ is the motivation of the canonical link. Further, the MLE reduces to

β^{MLE} = β arg max i \sum \frac{Y _{i} X _{i}^{T} β - b ( X _{i}^{T} β )}{ϕ} .

Moreover, a positive $ϕ$ , a positive variance (^bpp), and a full column rank $X = (X_{1}, \dots, X_{n})^{T}$ gives a negative definite Hessian of the log-likelihood:

Hess (ℓ (β)) = - ϕ^{- 1} i \sum Hess (b (X_{i}^{T} β)) = - ϕ^{- 1} i \sum b^{''} (X_{i}^{T} β) X_{i} X_{i}^{T} ≺ 0.

Therefore, $β^{MLE}$ is unique and can be attained by either Convex Optimization or solving the zero of the gradient:

i \sum Y_{i} X_{i} - i \sum b^{'} (X_{i}^{T} β) X_{i} = 0.

💡 The asymptotic normality of MLE applies to GLMs.

Applications

Ordinary Least Squares by Gaussian Distribution

When the GLM is Gaussian Linear Model with canonical link, the MLE gives Ordinary Least Squares, the first regression method. In this case, $Y$ is a continuous numerical.

To see this, the natural exponential dispersion form of Gaussian has $b (θ) = θ^{2} /2$ with a dispersion parameter $σ^{2}$ . Then, $b^{'} (θ) = θ$ is linear, making the zero point of the log-likelihood (^zerog) satisfy

i \sum Y_{i} X_{i} = i \sum X_{i} X_{i}^{T} β^{MLE} ⟺ X^{T} Y = X^{T} X β^{MLE} ⟺ β^{MLE} = (X^{T} X)^{- 1} X^{T} Y .

Logistic Regression by Bernoulli Distribution

When the GLM is Bernoulli Distribution with canonical link, the MLE gives Logistic Regression. In this case, $Y$ is a categorical with two possible outcomes.

To see this, we have

f (y) = p^{y} (1 - p)^{1 - y} = exp (y ln (p / (1 - p)) + ln (1 - p)) = : exp (y θ - ln (1 + e^{θ})) = : exp (y θ - b (θ)) .

Therefore, the canonical link function is

g (p) = ln (\frac{p}{1 - p}) .

This link is called the logit link, and $g^{- 1}$ is called the sigmoid function:

p = b^{'} (θ) = \frac{exp ( θ )}{1 + exp ( θ )} = \frac{exp ( x ^{T} β )}{1 + exp ( x ^{T} β )} .

Poisson Regression by Poisson Distribution

When the GLM is Poisson Distribution with canonical link, the MLE gives Poisson Regression. In this case, $Y$ is a count variable, which is a non-negative integer.

Other

Multinomial distribution with MLE gives multinomial or ordinal regression, where $Y$ is categorical with more than two outcomes.
Gamma Distribution with MLE gives Gamma regression, where $Y$ is positive, e.g., amount of an insurance claim.

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Generalized Linear Model

Table of Contents

Generalized Linear Model

Canonical GLM

MLE for GLM

Applications

Ordinary Least Squares by Gaussian Distribution

Logistic Regression by Bernoulli Distribution

Poisson Regression by Poisson Distribution

Other

Backlinks

Graph View