Gaussian Linear Model
A Gaussian linear model is a linear model with Gaussian noise:
Suppose we have a sample size , , , . This note focuses on the regression task with a fixed design matrix , which reduces to the Estimation of .
Least Squares and Maximum Likelihood Estimation
Ordinary Least Squares gives the Maximum Likelihood Estimation of :
We can show that
See A Probabilistic View Maximum Likelihood Estimation for the derivation.
Bayes Estimator
For prior and a Bowl-Shaped Loss , the Bayes Optimal Estimator is
1st Proof
We note that this is essentially Bayesian Linear Regression. Specifically, the posterior is also a Gaussian (Conjugate Prior) that satisfies
where ,
By Anderson’s Lemma, , Moreover, the Bayes risk is where .
2nd Proof
Similarly, by Anderson’s Lemma, we know that is the posterior mean. Since for normal distribution, mean and mode coincide, we have
This corresponds to a Ridge Regression, whose solution is
Minimax Estimator
For a Bowl-Shaped Loss and a full column rank design matrix , Ordinary Least Squares also gives the Minimax Optimal Estimator of :
Proof
Recall that . Thus, the risk is
Case I. and . Note that the risk of the least squares estimator is independent of , and thus to show it’s minimax optimal, we aim to find a prior whose Bayes risk matches (see Minimax via Bayes). Recall (see Bayes Estimator) that given a normal prior , the posterior is , and the Bayes risk w.r.t is
Since is convex hence continuous, by the continuous mapping theorem,
Case II. and . We define a new loss function by . One can check that is also bowl-shaped. For a fixed design matrix , we consider a general estimator determined by . We have the equivalence:
where the last equation applies the notation change:
In words, is an estimator of the new parameter based on data . One can see that
We denote the risk of an estimator of w.r.t the new loss . Then, applying Case I to gives
where is a general estimator of . Since is invertible, we have
On the other hand, we have
where the last inequality uses the fact that . Combining the above three inequalities gives
Thus, is minimax optimal.
Case III. . Let . Then is the orthogonal projection onto the column space of . Then, we have
Equivalently, the original data gives a new Gaussian linear model:
This Gaussian linear model reduces to Case II. Specifically, with a fixed design matrix , let be a general estimator of . And we denote the risk w.r.t this new Gaussian linear model. Then, Case II gives
Therefore, we are left to show that for any general estimator corresponding to the original Gaussian linear model, there exists a induced estimator such that . We claim this is true with the following randomized induced estimator:
where .
The induced estimator works as follows: upon observing , we simulate from the conditional distribution , and then apply the original estimator to obtain . In our case, since we actually observe , we directly have . However, we can still interpret as being generated from the conditional distribution .
Suppose is a Sufficient Statistic of , the above claim is true:
Therefore, we are left to show that is indeed a sufficient statistic of . We provide three methods.
Method I. Intuition
Let . Then,
Note that for a Gaussian vector, zero correlation implies independence. And since perpendicular vectors have zero correlation, we have . Moreover, since , we can simulate without . Thus is sufficient.
In words, captures all the information of left-applied by . The remaining part is pure noise perpendicular to the column space of and does not depend on . This is illustrated in the following plot, where and .
Method II. Fisher-Neyman Factorization
We first show that is sufficient:
By Fisher-Neyman Factorization Theorem, is sufficient Since is full column-rank, exists and is also sufficient.
Method III. Conditional Simulation
We first show that .
Note that
Therefore,
Finally,
Thus, is sufficient.