Law of Large Numbers
Let be iid Random Variables with mean . Denote . Then
where means convergence in probability. This is the weak law of large numbers (WLLN).
Substituting the Convergence in Probability with the Strong Convergence gives the strong law of large numbers (SLLN) under the same condition (finite mean):
Proof
WLLN Assuming Finite Variance
-
❗️ The following proof only works for r.v. with finite variance.
-
💡 prerequisite: Chebyshev Inequality.
Suppose has finite variance . For any , by Chebyshev Inequality,
WLLN with General Case
- 💡 prerequisite: Characteristic Function
We now remove the condition of finite variance. Since are independent,
By the Taylor expansion and the fact that ,
Since is also the CF of the constant , by the inverse property and Convergence of Characteristic Functions, . And since is a constant, convergence in distribution implies convergence in probability.
Comparison with Proof
Both using the Characteristic Function, the proof of WLLN only uses the first order Taylor expansion, while the proof of Central Limit Theorem uses the second order expansion. This is because WLLN concerns only about what the convergence point is (mean), while CLT characterizes how the sequence converges (variance). In this sense, WLLN is less informative than CLT, while requires weaker assumptions.
SLLN
-
💡 prerequisite: Borel-Cantelli Lemma
-
📎 Reference: Etemadi, N. An elementary proof of the strong law of large numbers. Z. Wahrscheinlichkeitstheorie verw Gebiete 55, 119–122 (1981) and Terance Tao: The strong law of large numbers.
Note that the almost sure convergence is equivalent to
where in the second equivalence, we say the event is true if does not converge. This hints us to use the Borel-Cantelli Lemma.
Before invoking the lemma, we make several reductions. First, let and . Then, if and , by the property of Convergence under Transformations, we have
Thus, WLOG, we assume .
Second, instead of considering all , we consider the subsequence for . Denote . We know is monotonic. Therefore, for any , let such that , we have
Similarly, we have . If converges to almost surely, by the arbitrariness of , we get the desired result. Thus, we focus on the subsequence .
Third, we focus on a truncated sequence of . Let and . Then, has finite second moment and we can apply Chebyshev Inequality. Importantly, the error due to truncation is negligible because of the property of Expectation:
Therefore, by the Borel-Cantelli Lemma, only differs from finitely many times almost surely. Thus, we can focus on the truncated sequence .
Finally, we apply Chebyshev Inequality and Borel-Cantelli Lemma to the truncated subsequence:
See Details for the derivation of -. Thus, by the Borel-Cantelli Lemma,
Finally, we claim that
Thus, . Unrolling the reductions we made gives the desired result:
Details
- By Chebyshev Inequality.
- By the independence of , inherited from .
- By the definition of and geometric series.
- Central moment is smaller than the raw moment.
- By the truncation of .
- Note that and for :
- First we have
Thus, for any , there exists such that for all , . Then, for all , we have
which gives
By the arbitrariness of , we conclude .