Bayes Optimal Estimator
Given a loss function and a prior, the Bayes optimal estimator, often shortened to Bayes estimator, minimizes the Bayes risk:
Due to the Greedy principle, we have
This can also be interpreted as a Bayesian approach:
- Have a prior over ;
- Observe and form the posterior ;
- Act optimally according to the posterior.
Uniqueness
The Bayes estimator may not be unique. However, we have the following result.
Suppose the action space 1 is convex, is strictly convex in the first argument for any , and for any , there exists such that . Then, the Bayes estimator is unique.
Proof of Uniqueness
Since is strictly convex, we have
is also strictly convex due to the linearity of expectation, and is proper because of the assumption. Therefore,
is uniquely defined.
We now need to show that all Bayes optimal estimator is the minimizer of . Suppose is another Bayes optimal estimator. By definition,
However, by the definition of , is a non-negative random variable. Therefore, we must have
By the uniqueness of the minimizer of , we have
Minimizing Posterior Risk
After calculating the posterior (see Bayesian Inference for discussion on the calculation of posterior), the next question is how to calculate the Bayes estimator, i.e., find the minimizer of the posterior risk.
For certain loss functions, their posterior risk minimizers are common functional of the posterior:
- Posterior median: implies .
- Posterior mode: implies .
More often, the posterior mean happens to be the Bayes estimator:
This is the case when
- is the Mean Squared Error; or
- is a Bowl-Shaped Loss and the posterior is Gaussian.
For the first case, see Optimal Estimation. The second case is given by the Anderson’s Lemma.
Example: Gaussian Mean
Consider the Mean Squared Error. Suppose and . The posterior is also Gaussian:
where
We can see that the posterior mean is a convex combination of the prior mean and the sample mean, with being the proportion, which increases with more information in the prior ( decreases), and decreases with more information in the sample ( increases).
Thus, the Bayes optimal estimator is
When and , we have , i.e., a regularized sample mean.
The following figure plots the risk (MSE with ) for different estimators. We can see that, unlike sample mean or sample median, which have a constant risk regardless of , the regularized sample mean has a lower risk when is small, but underestimate when is large. Therefore, if the prior puts more belief on small , the regularized sample mean has a smaller Bayes risk.
Footnotes
-
Note that . Usually for an estimation task. ↩