Sufficient Statistic

Recall that a Statistic is a “measurement” of the sample. A statistic is sufficient if we can “recover” the sample distribution without knowing the true parameter. Formally, A statistic is sufficient for Statistical Model if the conditional distribution of the sample given does not depend on . In symbolic form, we have

In the language of causal inference, the statistic “blocks” the causal chain

Info

  • The intuition is that, if I know but not , I can simulate as good as1 .
  • A sufficient statistic is a “lossless” compression of the data that has all the information about the parameter.

Fisher-Neyman Factorization Theorem

Thm

Suppose , distribution has density for all . Then, a statistic is sufficient iff , for some functions and .

  • ❗️ The result also holds for PMFs.

The proof is straightforward:

  • ❗️ can be a constant. Then, , and thus , is just a reformulation of . See !todo homework 1.1.

Examples

  • is always a sufficient statistic.

  • Order Statistics is sufficient.

  • For iid Gaussian r.v.s with unit variance, . Thus, and .

  • ❗️ For iid Gaussian r.v.s with a known variance, the Order Statistics is considered “bigger” than , and the latter is more “compressed”.

Gaussian Linear Model

For a fixed overdetermined () design matrix and the Gaussian linear model is , . Let . Then, is sufficient for .

One way to prove this is to express in and show that the expression does not depend on . We have

Note that is a projector and . Thus,

Specifically,

Intuitively, the date is high-dimensional () while the useful information is low-dimensional (). The statistic maps the data to the column space of , which is also of dimension . The remaining part is pure noise, orthogonal to the column space of .

See Gaussian Linear Model for more details.

Rmk

The sufficient statistic is not unique.

Rao-Blackwell Theorem

Thm

Suppose the action space is convex, the loss function is convex in , and is a sufficient statistic for . Then, for any statistical procedure , consider

We have

Applications

Gaussian Median

Rao-Blackwell theorem gives a better estimator for the median of a Gaussian distribution with known variance than the sample median. Since is sufficient, we can do a symmetric sampling such that the new sample satisfies . Then, we use the sample median of the new sample as the estimator.

Order Should Not Matter

Rao-Blackwell theorem is useful in proving Admissibility. Suppose the loss function is strictly convex in . If is not order-invariant to its arguments, then is not admissible.

To see this, we use the Order Statistics as the sufficient statistic. Since is convex, by Rao-Blackwell theorem, we consider

where is the permutation group of elements and the last equality is because the summation is over all permutations. Clearly, is order-invariant. Further, by the strict convexity of , we have

where the penultimate equality is because due to iidness, and the strict inequality is by Jensen’s inequality; the equality holds iff for any two permutations, which implies is order-invariant.

If the loss function is Mean Squared Error, we have an alternative proof that has a better interpretation. Recall the MSE decomposition:

Note that

So we only need to compare the variance:

where the inequality is due to Cauchy-Schwartz Inequality, and the equality holds if and only if

where the first equivalence is because

Footnotes

  1. Equal in distribution.