Convergence of Random Variables

Relation between Convergence Modes

where , , and are PDF (for continuous r.v.s), PMF (for discrete r.v.s), and characteristic function, respectively, and their convergences are in the sense of convergence of functions: .

Convergence under transformations

Mode \ OperationCMT (g) 1Addition (+)Multiplication (×)Division (÷) 2Joint Distribution (·,·)

As a Random Variable involves many different elements, we can define various modes of convergence for a sequence of random variables. Some definitions view random variables as Measurable functions, others as probability measures, and some through their associated functions, such as the CDF and the Characteristic Function.

In this note, we denote a sequence of random variables as .

Almost Sure/ Strong Convergence

  • Definition: We say that converges almost surely/ almost everywhere/ with probability 1/ strongly to if
$ \mathbb{P}(\lim_{n\to\infty} X_n = X) = 1 . $ - Notation: $X_n \overset{\text{a.s.}}{\longrightarrow} X$. - Alternative interpretation: Viewing a random variable as a [[Random Variable#law|measure of events]], the above convergence is equivalent to

\mathbb{P}(\omega\in\Omega:\lim_{n\to\infty} X_n(\omega) = X(\omega)) = 1,

where $\Omega$ is the [[Probability Space|event space]]. - Remark: $X_n$ are generally highly dependent for this convergence to hold. - Partial sum -> infinite sum - R.v.s defined as converging functions of a single underlying r.v. ## Convergence in Probability - Definition: We say that $X_n$ converges ==in probability== to $X$ if for any $\epsilon > 0$,

$ \lim_{n\to\infty} \mathbb{P}(|X_n - X| > \epsilon) = 0 .

$

  • Notation: .
  • Remark: Convergence in probability has a similar interpretation as almost sure convergence, and it also generally requires to be dependent. However, it’s weaker than almost sure convergence as it’s not uniform: different pairs such that require different .

Convergence in Distribution/ Weak Convergence

  • Definition: We say that converges in distribution/ in law/ weakly to if
$ \lim_{n\to\infty} F_{X_n}(x) = F_X(x) ,

for all at which is continuous, where is a Cumulative Distribution Function.

$ - Notation: $X_n \overset{d}{\longrightarrow} X$. - Alternative interpretation: Weak convergence inspects the convergence of the associated CDF of a random variable sequence, and thus it's _weaker_ and generally requires no dependency between $X_{n}$. It's equivalent to the convergence of the [[Characteristic Function]]. - Remark: Weak convergence is consistent with the convergence of real numbers. If $X_n \overset{ \text{a.s.} }{ = } a_n\in\R$ and $X\overset{ \text{a.s.} }{ = }a\in\R$, then $X_n\overset{ d }{ \to }X \iff a_n\to a$. - This consistency does not hold if we require $F_{X_n}\to F_{X}$ for all $x$. ### Portmanteau Lemma Several important statements equivalent to convergence in distribution are given by the **Portmanteau Lemma**: 1. $\mathbb{E}g(X_n) \to \mathbb{E} g(X)$ for any bounded, continuous/Lipschitz function $g$. 2. $\liminf_{n \to \infty } \mathbb{E}g(X_n) \geq \mathbb{E} g(X)$ for any nonnegative and continuous function. 3. $\liminf_{n \to \infty } P(X_n\in B) \geq P(X\in B)$ for any open set $B$. 4. $\limsup_{n \to \infty } P(X_n\in B) \le P(X\in B)$ for any closed set $B$. 5. $P(X_n\in B) \to P(X\in B)$ for any continuity set[^3] $B$. - which is further equivalent to $\left|\int _{B} f_{X_n}(x) - f_{X}(x)\right| \, \d x\to 0$. [^3]: A continuity set has a zero-measure boundary. ### Convergence of PDF/PMF Suppose $X_n\overset{ d }{ \to }X$. The convergence of the associated PDF/PMF is unclear: - It is possible for $X_n$ to be discrete and $X$ to be continuous. - $X_n = \frac{1}{n}\operatorname{Unif}\{ 1,\dots, n \} \to \operatorname{Unif}[0,1]$. - It is possible for $X_n$ to be continuous and $X$ to be discrete. - $X_n = \operatorname{Unif}[0,1/n] \to 0$. - If $X_n$ and $X$ are continuous, it is possible that the [[Probability Density Function|PDF]] $f_n \not\to f$ does not converge. - $F_n = x + \cos(2\pi nx) /(2\pi n) \to F_{X} = x$ but $f_n = 1 - \sin(2\pi nx) \not\to f_{X} = 1$ For the other direction, we have: - If $X_n$ and $X$ are continuous, the convergence of [[Probability Density Function|PDF]] implies the convergence in distribution. - If $X_n$ and $X$ are discrete, the convergence in distribution **is equivalent** to the convergence of [[Probability Mass Function|PMF]]. ### Convergence of Characteristic Functions Convergence in distribution is equivalent to the convergence of [[Characteristic Function]]s:

\lim_{n\to\infty} \phi_{X_n}(t) = \phi_X(t), \quad \forall t

## Convergence in $L^p$ Norm - Definition: We say that $X_n$ converges ==in \$L^p\$ norm/ in \$p\$th mean== to $X$ if

$ \lim_{n\to\infty} \mathbb{E}[|X_n - X|^p] = 0 .

$

  • Notation: .
  • Remark: For , we have , but not the other way around.

Convergence under Transformations

Continuous Mapping Theorem

Let be a sequence of random variables that converges almost surely/ in probability/ in distribution to . Let be a continuous function. Then converges to .

  • The continuous mapping theorem does not apply to convergence in norm.

Further, if is continuous at and , then . need not be continuous in this case.

Slutsky’s Theorem

Let and be sequences of random variables that converge in distribution to and respectively, where is a constant. Then

where is a set of binary operations. For to be defined, must be nonzero.

  • The theorem also holds for convergence in probability.

For almost sure convergence, convergence in probability, and in norm, we have stronger results:

Suppose and . Then

Suppose and . Then

Note that we no longer restrict to converge to a constant.

Sum of IID Random Variables

A good example to illustrate different modes of convergence is the sum, or average, of iid random variables. Suppose are iid with finite mean and variance . Let . Then, the following theorems claim that converges to ,

Footnotes

  1. The function is applied on and , and is continuous on the range of .

  2. For the division operation to be well-defined, the denominator sequence and its limit must be non-zero.