Estimation

Quick Reference

flowchart
subgraph BB[Prediction]
direction TB
E["Empirical Risk Minimization"]
E --"geneneralizes"--> F["Regression"]
F --"contains"--> FF@{shape: processes, label: "... ...", w: 100px}
end
subgraph AA[Estimation]
direction TB
A["M-Estimator"] --"generalizes"--> B["Maximum Likelihood Estimator"]
C["Z-Estimator"] --"generalizes"--> A
C["Z-Estimator"] --"generalizes"--> D["Moment Estimator"]
B <--"same for exponential family"--> D
B --"add a prior"--> M["Maximum a Posteriori "]
D --"contains"--> D1["Sample Mean"]
D --"contains"--> D2["Sample Variance"]
end
E  <--"same form"--> A

Point Estimation

A point estimator/statistic recovers a quantity of interest from data samples. Formally, it’s any algorithm/measurable function that returns a point in the parameter space given the sample:

The parameter space can be one-dimensional, multi-dimensional, or even a function space. When the sample has a sample size/dimension of , we also conventionally write to denote the point estimator.

In contrast to point estimation, Confidence Interval/region returns a subset of the parameter space , and Bayesian Inference returns a distribution over the parameter space .

Comparison of Estimation Methods

MLE vs MoM

  • For quadratic risks, MLE is more accurate in general
  • MLE still gives good results even for misspecified models, while Method of Moments is more sensitive to model misspecification.
  • Sometimes MLE can be computationally intractable, and Method of Moments is easier with only polynomial equations.

Bayesian Estimation

  • Maximum a Posteriori, which returns the mode of the posterior distribution.
  • Bayes Optimal Estimator, which returns the
    • mean of the posterior distribution for Mean Squared Error, or any Bowl-Shaped Loss with a Gaussian posterior;
    • median of the posterior distribution for absolute error loss ;
    • mode of the posterior distribution for zero-one loss .
Link to original

Bayes vs Frequentist

  • The Bayesian approach has been criticized for its over-reliance on convenient priors and lack of robustness.
  • The frequentist approach, such as MLE, has been criticized for its inflexibility (failure to incorporate prior information) and incoherence (failure to process information systematically).
  • For large sample sizes (), or when the prior is uniform, the Bayesian method tends to yield results similar to those of the classical likelihood approach.