Convergence of Random Variables
Relation between Convergence Modes
where , , and are PDF (for continuous r.v.s), PMF (for discrete r.v.s), and characteristic function, respectively, and their convergences are in the sense of convergence of functions: .
- If follows a point-mass/Dirac Distribution , then .
Convergence under transformations
As a Random Variable involves many different elements, we can define various modes of convergence for a sequence of random variables. Some definitions view random variables as Measurable functions, others as probability measures, and some through their associated functions, such as the CDF and the Characteristic Function.
In this note, we denote a sequence of random variables as .
Almost Sure/ Strong Convergence
- Definition: We say that converges almost surely/ almost everywhere/ with probability 1/ strongly to if
\mathbb{P}(\omega\in\Omega:\lim_{n\to\infty} X_n(\omega) = X(\omega)) = 1,
where $\Omega$ is the [[Probability Space|event space]]. - Remark: $X_n$ are generally highly dependent for this convergence to hold. - Partial sum -> infinite sum - R.v.s defined as converging functions of a single underlying r.v. ## Convergence in Probability - Definition: We say that $X_n$ converges ==in probability== to $X$ if for any $\epsilon > 0$,$ \lim_{n\to\infty} \mathbb{P}(|X_n - X| > \epsilon) = 0 .
$
- Notation: .
- Remark: Convergence in probability has a similar interpretation as almost sure convergence, and it also generally requires to be dependent. However, it’s weaker than almost sure convergence as it’s not uniform: different pairs such that require different .
Convergence in Distribution/ Weak Convergence
- Definition: We say that converges in distribution/ in law/ weakly to if
for all at which is continuous, where is a Cumulative Distribution Function.
$ - Notation: $X_n \overset{d}{\longrightarrow} X$. - Alternative interpretation: Weak convergence inspects the convergence of the associated CDF of a random variable sequence, and thus it's _weaker_ and generally requires no dependency between $X_{n}$. It's equivalent to the convergence of the [[Characteristic Function]]. - Remark: Weak convergence is consistent with the convergence of real numbers. If $X_n \overset{ \text{a.s.} }{ = } a_n\in\R$ and $X\overset{ \text{a.s.} }{ = }a\in\R$, then $X_n\overset{ d }{ \to }X \iff a_n\to a$. - This consistency does not hold if we require $F_{X_n}\to F_{X}$ for all $x$. ### Portmanteau Lemma Several important statements equivalent to convergence in distribution are given by the **Portmanteau Lemma**: 1. $\mathbb{E}g(X_n) \to \mathbb{E} g(X)$ for any bounded, continuous/Lipschitz function $g$. 2. $\liminf_{n \to \infty } \mathbb{E}g(X_n) \geq \mathbb{E} g(X)$ for any nonnegative and continuous function. 3. $\liminf_{n \to \infty } P(X_n\in B) \geq P(X\in B)$ for any open set $B$. 4. $\limsup_{n \to \infty } P(X_n\in B) \le P(X\in B)$ for any closed set $B$. 5. $P(X_n\in B) \to P(X\in B)$ for any continuity set[^3] $B$. - which is further equivalent to $\left|\int _{B} f_{X_n}(x) - f_{X}(x)\right| \, \d x\to 0$. [^3]: A continuity set has a zero-measure boundary. ### Convergence of PDF/PMF Suppose $X_n\overset{ d }{ \to }X$. The convergence of the associated PDF/PMF is unclear: - It is possible for $X_n$ to be discrete and $X$ to be continuous. - $X_n = \frac{1}{n}\operatorname{Unif}\{ 1,\dots, n \} \to \operatorname{Unif}[0,1]$. - It is possible for $X_n$ to be continuous and $X$ to be discrete. - $X_n = \operatorname{Unif}[0,1/n] \to 0$. - If $X_n$ and $X$ are continuous, it is possible that the [[Probability Density Function|PDF]] $f_n \not\to f$ does not converge. - $F_n = x + \cos(2\pi nx) /(2\pi n) \to F_{X} = x$ but $f_n = 1 - \sin(2\pi nx) \not\to f_{X} = 1$ For the other direction, we have: - If $X_n$ and $X$ are continuous, the convergence of [[Probability Density Function|PDF]] implies the convergence in distribution. - If $X_n$ and $X$ are discrete, the convergence in distribution **is equivalent** to the convergence of [[Probability Mass Function|PMF]]. ### Convergence of Characteristic Functions Convergence in distribution is equivalent to the convergence of [[Characteristic Function]]s:\lim_{n\to\infty} \phi_{X_n}(t) = \phi_X(t), \quad \forall t
## Convergence in $L^p$ Norm - Definition: We say that $X_n$ converges ==in \$L^p\$ norm/ in \$p\$th mean== to $X$ if$ \lim_{n\to\infty} \mathbb{E}[|X_n - X|^p] = 0 .
$
- Notation: .
- Remark: For , we have , but not the other way around.
Convergence under Transformations
Continuous Mapping Theorem
Let be a sequence of random variables that converges almost surely/ in probability/ in distribution to . Let be a continuous function. Then converges to .
- The continuous mapping theorem does not apply to convergence in norm.
Further, if is continuous at and , then . need not be continuous in this case.
Slutsky’s Theorem
Let and be sequences of random variables that converge in distribution to and respectively, where is a constant. Then
where is a set of binary operations. For to be defined, must be nonzero.
- The theorem also holds for convergence in probability.
For almost sure convergence, convergence in probability, and in norm, we have stronger results:
Suppose and . Then
Suppose and . Then
Note that we no longer restrict to converge to a constant.
Sum of IID Random Variables
A good example to illustrate different modes of convergence is the sum, or average, of iid random variables. Suppose are iid with finite mean and variance . Let . Then, the following theorems claim that converges to ,
- in distribution by Central Limit Theorem;
- in probability by Weak Law of Large Numbers;
- and almost surely by Strong Law of Large Numbers.