Minimax Optimal Estimator

A minimax optimal estimator, often shortened to minimax estimator, minimizes the minimax risk:

A_{M} = ar g A in f R_{M} (A) = ar g A in f θ \in Θ sup R (A, θ) .

This note discusses three strategies for finding minimax estimators: via generalization, Bayes risk, and constant risk.

Generalization

Minimax estimators have a simple generalization result: suppose $A_{M}$ is minimax w.r.t Statistical Model $P = {P_{θ}}$ , which is contained in a larger model $P^{'}$ , and we have

P \in P sup R (A_{M}, P) = P \in P^{'} sup R (A_{M}, P) .

Then $A_{M}$ is also minimax w.r.t $P^{'}$ .

Proof of Generalization

By minimax optimality and set inclusion,

P \in P sup R (A_{M}, P) = A in f P \in P sup R (A, P) \leq A in f P \in P^{'} sup R (A, P) .

On the other hand,

P \in P^{'} sup R (A_{M}, P) \geq A in f P \in P^{'} sup R (A, P) .

Since the LHS of both inequalities are equal, we have

R_{M} (A_{M}, P^{'}) = A in f R_{M} (A_{M}, P^{'}),

i.e., $A_{M}$ is also minimax w.r.t $P^{'}$ .

Minimax via Constant Risk + Generalization

The generalization property provides us with a way to find minimax estimators. A more specific case is when $R (A_{M}, P)$ is constant w.r.t $P$ . Formally, if $A_{M}$ is minimax w.r.t $P \subset P^{'}$ and $R (A_{M}, P)$ is constant w.r.t $P \in P^{'}$ , then $A_{M}$ is also minimax w.r.t $P^{'}$ .

Minimax via constant risk + generalization

Choose a subset $P_{0}$ of the original model $P$ .

Find a minimax estimator $A_{M}$ w.r.t $P_{0}$ .

Show that $R (A_{M}, P)$ is constant w.r.t $P \in P$ .

Conclude that $A_{M}$ is minimax w.r.t $P$ .

Worst-case and conservative

The generalization property, which is from the “worst-case” nature of the minimax risk, indicates that this metric can be very conservative. Even if an estimator performs well for almost all distributions in the larger model, it may still not be minimax if there is a single distribution for which it performs poorly.

Example: Non-Parametric Mean

We give an interesting example of non-parametric model $P$ , which is the collection of distributions with finite mean $μ$ and variance $σ^{2}$ . Consider a loss $L (a, (μ, σ^{2})) = (a - μ)^{2} / σ^{2}$ . Then, the sample mean $\overline{X}$ is a minimax estimator of $μ$ w.r.t $P$ .

This is because $\overline{X}$ is a minimax estimator of $μ$ w.r.t a smaller parametric model $P_{0}$ , the collection of Gaussian distributions with finite mean $μ$ and variance $σ^{2}$ (see Example Gaussian Mean). Then, since $R (A, P) = E [(\overline{X} - μ)^{2} / σ^{2}] = 1/ n$ is constant w.r.t $P \in P$ , we can apply the generalization property to conclude that $\overline{X}$ is also minimax w.r.t $P$ .

Minimax via Bayes

💡 Picking a bad prior (or “nature”, “environment”) to prove worst-case/hardness results is a common technique in Statistics and theoretical computer science.

In the same spirit, we can use Bayes risks to find minimax estimators. The prior we choose is called the least favorable prior.

The supreme gives an immediate lower bound of the minimax risk:

R_{M} (A) \geq R_{B} (A, Q), \forall Q . (1)

Therefore, we have the following strategy:

Minimax via Bayes

Find a prior $Q$ with a large Bayes risk $R_{B} (A^{*}, Q)$ , where $A^{*}$ is a Bayes Optimal Estimator w.r.t $Q$ .

Find an estimator $A_{M}$ such that $R_{M} (A_{M}) = R_{B} (A^{*}, Q)$ .

Conclude that $A_{M}$ is minimax optimal.

This strategy is valid because taking infimum on both sides of the lower bound $(1)$ gives

A in f R_{M} (A) \geq A in f R_{B} (A, Q) = R_{B} (A^{*}, Q) = R_{M} (A_{M}) \geq A in f R_{M} (A),

which indicates that $A_{M}$ is minimax.

Remarks

The bound for any $Q$ ; thus we only need to find one such $Q$ .

We do not need $A_{M} = A^{*}$ , but only that $R_{M} (A_{M}) = R_{B} (A^{*}, Q)$ .

However, quite often $A_{M} = A^{*}$ .

Example: Gaussian Mean

From Example Gaussian Mean, we know that for the Mean Squared Error, the posterior mean is Bayes optimal of the Gaussian mean $θ$ , and the Bayes risk is

R_{B} (A^{*}, Q) = E_{X} [Var (θ ∣ X)] = (1 - B) σ^{2} / n,

where $B = \frac{σ ^{2} / n}{σ ^{2} / n + τ ^{2}}$ . By letting $τ \to \infty$ , we get a prior $Q_{\infty}$ whose Bayes risk is $R_{B} (A^{*}, Q_{\infty}) = σ^{2} / n$ .

Now let $A_{M}$ be the sample mean. We have $R (A_{M}, θ) = σ^{2} / n$ , which is constant w.r.t $θ$ . Therefore, we have

R_{M} (A_{M}) = σ^{2} / n = R_{B} (A^{*}, Q_{\infty}) .

Therefore, we conclude that for MSE, the sample mean $\overline{X}$ is minimax, as well as Bayes with prior $N (0, \infty)$ .

Minimax via Constant Risk + Bayes

From the above example we can see that, if an estimator has a constant risk w.r.t $θ$ , it’s easier to prove its minimax optimality. This motivates the following strategy:

Minimax via constant risk + bayes

Show $R (A_{M}, θ)$ is constant w.r.t $θ$ .

Find a prior $Q$ such that $A_{M}$ is Bayes w.r.t $Q$ .

Conclude that $A_{M}$ is minimax optimal.

The first step establishes that $R_{M} (A_{M}) = R (A_{M}) = R_{B} (A_{M}, Q)$ and the second step establishes that $R_{B} (A_{M}, Q) = R_{B} (A^{*}, Q)$ .

Example: Bernoulli Mean

We will show that with MSE, the sample mean of Bernoulli trials $X \sim Binom (n, θ)$ is not a minimax estimator of $θ$ , and we will find a minimax estimator.

Recall that the Conjugate Prior of binomial is Beta Distribution:

(θ \sim Beta (a, b)) * (x \sim Binom (n, θ)) = θ \sim Beta (a + n, b + n - x),

whose posterior mean is

A^{*} (x) = \frac{x + a}{a + b + n} .

We need to find a prior such that $A^{*}$ has a constant risk w.r.t $θ$ . The risk is

R (A^{*}, θ) = = = Bias^{2} (A^{*}) + Var (A^{*}) (\frac{n θ + a}{a + b + n} - θ)^{2} + (\frac{1}{a + b + n})^{2} n θ (1 - θ) \frac{1}{( a + b + n ) ^{2}} (((a + b)^{2} - n) θ^{2} + (n - 2 a (a + b)) θ + a^{2}) .

Thus, to make the above quantity independent of $θ$ , we let

a = b = \frac{n}{2},

which gives

A_{M} (x) = = = : \frac{x + n /2}{n + n}, \frac{n}{n + 1} \frac{x}{n} + \frac{1}{n + 1} \frac{1}{2} (1 - B) \overline{X} + B μ_{0},

where $μ_{0}$ can be think of as the prior mean of $Beta (\frac{1}{2}, \frac{1}{2})$ , and $B$ is the information proportion, which has a rate of $n^{- 1/2}$ .

We conclude that $A_{M}$ is minimax, and it has a constant risk w.r.t $θ$ :

R_{M} (A_{M}) = R (A_{M}) = \frac{1}{4 ( n + 1 ) ^{2}} = R_{B} (A^{*}, Beta (\frac{1}{2}, \frac{1}{2})) .

On the other hand, the sample mean has a risk of

R (\overline{X}, θ) = Var (\overline{X}) = \frac{θ ( 1 - θ )}{n} .

When $θ = 1/2$ , we have

R (\overline{X}, \frac{1}{4}) = \frac{1}{4 n} > \frac{1}{4 ( n + 1 ) ^{2}} = R_{M} (A_{M}) .

Thus, the sample mean is not minimax. However, as illustrated in the figure below, the sample mean’s risk is only larger than this minimax estimator’s risk when $θ$ is near $1/2$ ; as $n \to \infty$ , their minimax risk gap vanishes, and the sample mean’s risk is almost always smaller than this minimax estimator’s.

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Minimax Optimal Estimator

Table of Contents

Minimax Optimal Estimator

Generalization

Proof of Generalization

Minimax via Constant Risk + Generalization

Example: Non-Parametric Mean

Minimax via Bayes

Example: Gaussian Mean

Minimax via Constant Risk + Bayes

Example: Bernoulli Mean

Backlinks

Graph View