Bayes Optimal Estimator

Given a loss function $L : A \times Θ \to R_{+}$ and a prior, the Bayes optimal estimator, often shortened to Bayes estimator, minimizes the Bayes risk:

A^{*} = ar g A in f R_{B} (A) = ar g A in f E_{θ \sim Q} [E_{X \sim P_{θ}} L (A (X), θ)] .

Due to the Greedy principle, we have

A^{*} (x) = ar g \hat{θ} in f E_{θ} [L (\hat{θ}, θ) ∣ X = x] .

Link to original

This can also be interpreted as a Bayesian approach:

Have a prior over $Θ$ ;
Observe $x$ and form the posterior $θ ∣ X = x$ ;
Act optimally according to the posterior.

Uniqueness

The Bayes estimator may not be unique. However, we have the following result.

Suppose the action space $A$ ¹ is convex, $L (\cdot, θ)$ is strictly convex in the first argument for any $θ \in Θ$ , and for any $x \in X$ , there exists $a \in A$ such that $E [L (a, θ) ∣ X = x] < \infty$ . Then, the Bayes estimator is unique.

Proof of Uniqueness

Since $L$ is strictly convex, we have

F_{x} (a) : = E [L (a, θ) ∣ X = x],

is also strictly convex due to the linearity of expectation, and is proper because of the assumption. Therefore,

A^{*} (x) = ar g a \in A in f F_{x} (a)

is uniquely defined.

We now need to show that all Bayes optimal estimator is the minimizer of $F_{x}$ . Suppose $A^{'}$ is another Bayes optimal estimator. By definition,

R_{B} (A^{'}) \leq R_{B} (A^{*}) ⟹ E_{x} [F_{x} (A^{'} (x)) - F_{x} (A^{*} (x))] \leq 0.

However, by the definition of $A^{*}$ , $F_{x} (A^{'} (x)) - F_{x} (A^{*} (x))$ is a non-negative random variable. Therefore, we must have

F_{x} (A^{'} (x)) = a.s. F_{x} (A^{*} (x)) = a \in A in f F_{x} (a) .

By the uniqueness of the minimizer of $F_{x}$ , we have

A^{'} (x) = A^{*} (x), a.s.

Minimizing Posterior Risk

After calculating the posterior (see Bayesian Inference for discussion on the calculation of posterior), the next question is how to calculate the Bayes estimator, i.e., find the minimizer of the posterior risk.

For certain loss functions, their posterior risk minimizers are common functional of the posterior:

Posterior median: $L (a, θ) = ∣ a - θ ∣$ implies $A^{*} (x) = F_{θ ∣ x}^{- 1} (1/2)$ .
Posterior mode: $L (a, θ) = I (a \neq = θ)$ implies $A^{*} (x) = ar g max_{θ} f (θ ∣ x)$ .

More often, the posterior mean happens to be the Bayes estimator:

A^{*} (X) = E [θ ∣ X] .

This is the case when

$L$ is the Mean Squared Error; or
$L$ is a Bowl-Shaped Loss and the posterior is Gaussian.

For the first case, see Optimal Estimation. The second case is given by the Anderson’s Lemma.

Example: Gaussian Mean

Consider the Mean Squared Error. Suppose $X_{i} \sim N (θ, σ^{2})$ and $Q = N (μ_{0}, τ^{2})$ . The posterior is also Gaussian:

θ ∣ \overline{X} \sim N ((1 - B) \overline{X} + B μ_{0}, (1 - B) σ^{2} / n),

where

B = \frac{σ ^{2} / n}{σ ^{2} / n + τ ^{2}} .

We can see that the posterior mean is a convex combination of the prior mean and the sample mean, with $B$ being the proportion, which increases with more information in the prior ( $τ$ decreases), and decreases with more information in the sample ( $n$ increases).

Thus, the Bayes optimal estimator is

A^{*} (X) = (1 - B) \overline{X} + B μ_{0} .

When $μ_{0} = 0$ and $τ^{2} = σ^{2} / n$ , we have $A^{*} (X) = \overline{X} /2$ , i.e., a regularized sample mean.

The following figure plots the risk (MSE with $σ^{2} = 1$ ) for different estimators. We can see that, unlike sample mean or sample median, which have a constant risk regardless of $θ$ , the regularized sample mean has a lower risk when $θ$ is small, but underestimate when $θ$ is large. Therefore, if the prior puts more belief on small $θ$ , the regularized sample mean has a smaller Bayes risk.

Note that $A : X \to A$ . Usually $A = Θ$ for an estimation task. ↩

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Bayes Optimal Estimator

Table of Contents

Bayes Optimal Estimator

Uniqueness

Proof of Uniqueness

Minimizing Posterior Risk

Example: Gaussian Mean

Backlinks

Graph View

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Bayes Optimal Estimator

Table of Contents

Bayes Optimal Estimator

Uniqueness

Proof of Uniqueness

Minimizing Posterior Risk

Example: Gaussian Mean

Footnotes

Backlinks

Graph View