Law of Large Numbers

Let ${X_{i}}_{i = 1}^{n}$ be iid Random Variables with mean $μ$ . Denote $\overline{X} = \sum_{i = 1}^{n} X_{i} / n$ . Then

\overline{X} \to P μ,

where $\to P$ means convergence in probability. This is the weak law of large numbers (WLLN).

Substituting the Convergence in Probability with the Strong Convergence gives the strong law of large numbers (SLLN) under the same condition (finite mean):

\overline{X} \to a.s. μ .

Proof

WLLN Assuming Finite Variance

❗️ The following proof only works for r.v. with finite variance.
💡 prerequisite: Chebyshev Inequality.

Suppose $X_{i}$ has finite variance $σ^{2}$ . For any $ε > 0$ , by Chebyshev Inequality,

P (\overset{ˉ}{X} - μ > ε) ⩽ \frac{1}{ε ^{2}} E ∣ \overset{ˉ}{X} - μ ∣^{2} = \frac{σ ^{2}}{n ε ^{2}} \to 0, as n \to + \infty.

WLLN with General Case

💡 prerequisite: Characteristic Function

We now remove the condition of finite variance. Since $X_{n}$ are independent,

ϕ_{\overline{X}} (t) = ϕ_{X_{1} / n} (t)^{n} = ϕ_{X_{1}} (t / n)^{n} .

By the Taylor expansion and the fact that $ϕ_{X_{1}} (0) = 1, ϕ_{X_{1}}^{'} (0) = i μ$ ,

ϕ_{\overline{X}} (t) = (1 + i μ t / n + o (t / n))^{n} \to e^{i μ t} .

Since $e^{i μ t}$ is also the CF of the constant $μ$ , by the inverse property and Convergence of Characteristic Functions, $\sum_{i} X_{i} / n \to d μ$ . And since $μ$ is a constant, convergence in distribution implies convergence in probability.

Comparison with Proof

Both using the Characteristic Function, the proof of WLLN only uses the first order Taylor expansion, while the proof of Central Limit Theorem uses the second order expansion. This is because WLLN concerns only about what the convergence point is (mean), while CLT characterizes how the sequence converges (variance). In this sense, WLLN is less informative than CLT, while requires weaker assumptions.

SLLN

💡 prerequisite: Borel-Cantelli Lemma
📎 Reference: Etemadi, N. An elementary proof of the strong law of large numbers. Z. Wahrscheinlichkeitstheorie verw Gebiete 55, 119–122 (1981) and Terance Tao: The strong law of large numbers.

Note that the almost sure convergence is equivalent to

⟺ ⟺ ⟺ P (n \to \infty lim \overline{X}_{n} = μ) = 1 P (n \to \infty lim \overline{X}_{n} \neq = μ) = 0 P (n \to \infty lim sup ∣ \overline{X}_{n} - μ ∣ > ϵ) = 0, \forall ϵ > 0 P (\cap_{m = 1}^{\infty} \cup_{n = m}^{\infty} {∣ \overline{X}_{n} - μ ∣ > ϵ}) = 0, \forall ϵ > 0,

where in the second equivalence, we say the event is true if $\overline{X}_{n}$ does not converge. This hints us to use the Borel-Cantelli Lemma.

Before invoking the lemma, we make several reductions. First, let $X_{i}^{+} = max {X_{i}, 0}$ and $X_{i}^{-} = - min {X_{i}, 0}$ . Then, if $\overline{X}_{n}^{+} : = \frac{1}{n} \sum_{i = 1}^{n} \to a.s. E X_{1}^{+}$ and $\overline{X}_{n}^{-} : = \frac{1}{n} \sum_{i = 1}^{n} \to a.s. E X_{1}^{-}$ , by the property of Convergence under Transformations, we have

\overline{X}_{n} = \overline{X}_{n}^{+} - \overline{X}_{n}^{-} \to a.s. E X_{1}^{+} - E X_{1}^{-} = E X_{1} .

Thus, WLOG, we assume $X_{i} \geq 0$ .

Second, instead of considering all $n$ , we consider the subsequence $k_{j} = α^{j}$ for $α > 1$ . Denote $S_{n} : = \sum_{i = 1}^{n} X_{i}$ . We know $S_{n}$ is monotonic. Therefore, for any $n \in N$ , let $j$ such that $k_{j} \leq n < k_{j + 1}$ , we have

\frac{S _{n}}{n} \leq \frac{S _{k_{j + 1}}}{n} \leq \frac{S _{k_{j + 1}}}{k _{j}} = \frac{S _{k_{j + 1}}}{k _{j + 1}} \cdot α = α \overline{X}_{k_{j + 1}} .

Similarly, we have $S_{n} / n \geq α^{- 1} \overline{X}_{k_{j}}$ . If $\overline{X}_{k_{j}}$ converges to $μ$ almost surely, by the arbitrariness of $α$ , we get the desired result. Thus, we focus on the subsequence $\overline{X}_{k_{j}}$ .

Third, we focus on a truncated sequence of $\overline{X}_{k_{j}}$ . Let $Y_{i} : = X_{i} 𝟙 {X_{i} \leq i}$ and $S_{n}^{'} : = \sum_{i = 1}^{n} Y_{i}$ . Then, $Y_{i}$ has finite second moment and we can apply Chebyshev Inequality. Importantly, the error due to truncation is negligible because of the property of Expectation:

i = 1 \sum \infty P (Y_{i} \neq = X_{i}) = i = 1 \sum \infty P (X_{i} > i) = i = 1 \sum \infty j = i \sum \infty P (j < X_{i} \leq j + 1) = i = 1 \sum \infty i P (i < X_{i} \leq i + 1) \leq i = 1 \sum \infty \int_{i}^{i + 1} x d P (x) = E X_{1} < \infty.

Therefore, by the Borel-Cantelli Lemma, $Y_{i}$ only differs from $X_{i}$ finitely many times almost surely. Thus, we can focus on the truncated sequence $Y_{i}$ .

Finally, we apply Chebyshev Inequality and Borel-Cantelli Lemma to the truncated subsequence:

\leq \leq = \leq = \leq = = = \leq \leq = j = 1 \sum \infty P (\frac{S _{k_{j}}^{'} - E S _{k_{j}}^{'}}{k _{j}} > ϵ) j = 1 \sum \infty \frac{Var ( S _{k_{j}}^{'} )}{k _{j}^{2} ϵ ^{2}} j = 1 \sum \infty \frac{\sum _{i = 1}^{k_{j}} Var ( Y _{i} )}{k _{j}^{2} ϵ ^{2}} i = 1 \sum k_{1} Var (Y_{i}) \cdot \frac{1}{k _{1}^{2} ϵ ^{2}} \frac{1}{1 - α ^{- 2}} + i = k_{1} + 1 \sum k_{2} Var (Y_{i}) \cdot \frac{1}{k _{2}^{2} ϵ ^{2}} \frac{1}{1 - α ^{- 2}} + \dots \frac{1}{ϵ ^{2} ( 1 - α ^{- 2} )} (i = 1 \sum k_{1} \frac{Var ( Y _{i} )}{i ^{2}} + i = k_{1} + 1 \sum k_{2} \frac{Var ( Y _{i} )}{i ^{2}} + \dots) \frac{1}{ϵ ^{2} ( 1 - α ^{- 2} )} i = 1 \sum \infty \frac{Var ( Y _{i} )}{i ^{2}} \frac{1}{ϵ ^{2} ( 1 - α ^{- 2} )} i = 1 \sum \infty \frac{E Y _{i}^{2}}{i ^{2}} \frac{1}{ϵ ^{2} ( 1 - α ^{- 2} )} i = 1 \sum \infty i^{- 2} \int_{x = 0}^{i} x^{2} d P (x) \frac{1}{ϵ ^{2} ( 1 - α ^{- 2} )} i = 1 \sum \infty i^{- 2} j = 0 \sum i - 1 \int_{j}^{j + 1} x^{2} d P (x) \frac{1}{ϵ ^{2} ( 1 - α ^{- 2} )} i = 1 \sum \infty (j = i \sum \infty j^{- 2}) \int_{i - 1}^{i} x^{2} d P (x) \frac{1}{ϵ ^{2} ( 1 - α ^{- 2} )} i = 1 \sum \infty \frac{2}{i} \int_{i - 1}^{i} x^{2} d P (x) \frac{2}{ϵ ^{2} ( 1 - α ^{- 2} )} i = 1 \sum \infty \int_{i - 1}^{i} x d P (x) \frac{2}{ϵ ^{2} ( 1 - α ^{- 2} )} E X_{1} < \infty. (1) (2) (3) (4) (5) (6)

See Details for the derivation of $(1)$ - $(7)$ . Thus, by the Borel-Cantelli Lemma,

P (j \to \infty lim sup \frac{S _{k_{j}}^{'} - E S _{k_{j}}^{'}}{k _{j}} > ϵ) = P (i = 1 ⋂ \infty j = i ⋃ \infty {\frac{S _{l_{j}}^{'} - E S _{k_{j}}^{'}}{k _{j}} > ϵ}) = 0, \forall ϵ > 0.

Finally, we claim that

j \to \infty lim \frac{E S _{k_{j}}^{'}}{k _{j}} = E X_{1} = μ . (7)

Thus, $S_{k_{j}}^{'} / k_{j} \to a.s. μ$ . Unrolling the reductions we made gives the desired result:

\overline{X}_{n} \to a.s. μ .

Details

By Chebyshev Inequality.
By the independence of $Y_{i}$ , inherited from $X_{i}$ .
By the definition of $k_{j}$ and geometric series.
Central moment is smaller than the raw moment.
By the truncation of $Y_{i}$ .
Note that $\sum_{j = 1}^{\infty} j^{- 2} = π^{2} /6 \leq 2$ and for $i \geq 2$ :

j = i \sum \infty \frac{1}{j ^{2}} \leq j = i \sum \infty \frac{1}{j ( j - 1 )} = j = i \sum \infty \frac{1}{j - 1} - \frac{1}{j} = \frac{1}{i - 1} \leq \frac{2}{i} .

First we have

E X_{1} = i \to \infty lim \int_{0}^{i} x d P (x) = i \to \infty lim E Y_{i} .

Thus, for any $ϵ > 0$ , there exists $i_{0} \in N$ such that for all $i \geq i_{0}$ , $E X_{1} - E Y_{i} \leq ϵ$ . Then, for all $i \geq i_{0}$ , we have

E X_{1} - \frac{E S _{i}^{'}}{i} \leq E X_{1} - \frac{\sum _{j = i_{0}}^{i} E _{Y_{j}}}{i} \leq \frac{i _{0}}{i} μ + \frac{i - i _{0}}{i} ϵ,

which gives

i \to \infty lim E X_{1} - \frac{E S _{i}^{'}}{i} \leq ϵ .

By the arbitrariness of $ϵ$ , we conclude $lim_{i \to \infty} E S_{i}^{'} / i = μ$ .

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Law of Large Numbers

Table of Contents

Law of Large Numbers

Proof

WLLN Assuming Finite Variance

WLLN with General Case

SLLN

Details

Backlinks

Graph View