Minimax Optimal Estimator

A minimax optimal estimator, often shortened to minimax estimator, minimizes the minimax risk:

This note discusses three strategies for finding minimax estimators: via generalization, Bayes risk, and constant risk.

Generalization

Minimax estimators have a simple generalization result: suppose is minimax w.r.t Statistical Model , which is contained in a larger model , and we have

Then is also minimax w.r.t .

Proof of Generalization

By minimax optimality and set inclusion,

On the other hand,

Since the LHS of both inequalities are equal, we have

i.e., is also minimax w.r.t .

Minimax via Constant Risk + Generalization

The generalization property provides us with a way to find minimax estimators. A more specific case is when is constant w.r.t . Formally, if is minimax w.r.t and is constant w.r.t , then is also minimax w.r.t .

Minimax via constant risk + generalization

  1. Choose a subset of the original model .
  2. Find a minimax estimator w.r.t .
  3. Show that is constant w.r.t .
  4. Conclude that is minimax w.r.t .

Worst-case and conservative

The generalization property, which is from the “worst-case” nature of the minimax risk, indicates that this metric can be very conservative. Even if an estimator performs well for almost all distributions in the larger model, it may still not be minimax if there is a single distribution for which it performs poorly.

Example: Non-Parametric Mean

We give an interesting example of non-parametric model , which is the collection of distributions with finite mean and variance . Consider a loss . Then, the sample mean is a minimax estimator of w.r.t .

This is because is a minimax estimator of w.r.t a smaller parametric model , the collection of Gaussian distributions with finite mean and variance (see Example Gaussian Mean). Then, since is constant w.r.t , we can apply the generalization property to conclude that is also minimax w.r.t .

Minimax via Bayes

  • 💡 Picking a bad prior (or “nature”, “environment”) to prove worst-case/hardness results is a common technique in Statistics and theoretical computer science.

In the same spirit, we can use Bayes risks to find minimax estimators. The prior we choose is called the least favorable prior.

The supreme gives an immediate lower bound of the minimax risk:

Therefore, we have the following strategy:

Minimax via Bayes

  1. Find a prior with a large Bayes risk , where is a Bayes Optimal Estimator w.r.t .
  2. Find an estimator such that .
  3. Conclude that is minimax optimal.

This strategy is valid because taking infimum on both sides of the lower bound gives

which indicates that is minimax.

Remarks

  • The bound for any ; thus we only need to find one such .
  • We do not need , but only that .
    • However, quite often .

Example: Gaussian Mean

From Example Gaussian Mean, we know that for the Mean Squared Error, the posterior mean is Bayes optimal of the Gaussian mean , and the Bayes risk is

where . By letting , we get a prior whose Bayes risk is .

Now let be the sample mean. We have , which is constant w.r.t . Therefore, we have

Therefore, we conclude that for MSE, the sample mean is minimax, as well as Bayes with prior .

Minimax via Constant Risk + Bayes

From the above example we can see that, if an estimator has a constant risk w.r.t , it’s easier to prove its minimax optimality. This motivates the following strategy:

Minimax via constant risk + bayes

  1. Show is constant w.r.t .
  2. Find a prior such that is Bayes w.r.t .
  3. Conclude that is minimax optimal.

The first step establishes that and the second step establishes that .

Example: Bernoulli Mean

We will show that with MSE, the sample mean of Bernoulli trials is not a minimax estimator of , and we will find a minimax estimator.

Recall that the Conjugate Prior of binomial is Beta Distribution:

whose posterior mean is

We need to find a prior such that has a constant risk w.r.t . The risk is

Thus, to make the above quantity independent of , we let

which gives

where can be think of as the prior mean of , and is the information proportion, which has a rate of .

We conclude that is minimax, and it has a constant risk w.r.t :

On the other hand, the sample mean has a risk of

When , we have

Thus, the sample mean is not minimax. However, as illustrated in the figure below, the sample mean’s risk is only larger than this minimax estimator’s risk when is near ; as , their minimax risk gap vanishes, and the sample mean’s risk is almost always smaller than this minimax estimator’s.