Convergence of Random Variables

Relation between Convergence Modes

$X_{n} \to L_{p^{'}} X ⟹ p^{'} \geq p X_{n} \to a.s. X or X_{n} \to L^{p} X ⟹ X_{n} \to P X ⟹ f_{X_{n}} \to f_{X} ⇓ X_{n} \to d X ⇕ p_{X_{n}} \to p_{X} ⟺ ϕ_{X_{n}} \to ϕ_{X},$
where $f$ , $p$ , and $ϕ$ are PDF (for continuous r.v.s), PMF (for discrete r.v.s), and characteristic function, respectively, and their convergences are in the sense of convergence of functions: $g_{n} (t) \to g (t) \forall t$ .

📎 If $X$ follows a point-mass/Dirac Distribution $δ_{c}$ , then $X_{n} \to d X ⟹ X_{n} ⟶ P X$ .

Convergence under transformations

Mode \ Operation CMT (g) ¹ Addition (+) Multiplication (×) Division (÷) ² Joint Distribution (·,·)
$X_{n} \to d X, Y_{n} \to d c$ ✓ ✓ ✓ ✓ ✓
$X_{n} \to d X, Y_{n} \to d Y$ ✓
$X_{n} \to P X, Y_{n} \to P Y$ ✓ ✓ ✓ ✓ ✓
$X_{n} \to a.s. X, Y_{n} \to a.s. Y$ ✓ ✓ ✓ ✓ ✓
$X_{n} \to L^{p} X, Y_{n} \to L^{p} Y$ ✓ ✓

Mode \ Operation	CMT (g) ¹	Addition (+)	Multiplication (×)	Division (÷) ²	Joint Distribution (·,·)
$X_{n} \to d X, Y_{n} \to d c$	✓	✓	✓	✓	✓
$X_{n} \to d X, Y_{n} \to d Y$	✓
$X_{n} \to P X, Y_{n} \to P Y$	✓	✓	✓	✓	✓
$X_{n} \to a.s. X, Y_{n} \to a.s. Y$	✓	✓	✓	✓	✓
$X_{n} \to L^{p} X, Y_{n} \to L^{p} Y$		✓			✓

As a Random Variable involves many different elements, we can define various modes of convergence for a sequence of random variables. Some definitions view random variables as Measurable functions, others as probability measures, and some through their associated functions, such as the CDF and the Characteristic Function.

In this note, we denote a sequence of random variables as $X_{n}$ .

Almost Sure/ Strong Convergence

Definition: We say that $X_{n}$ converges almost surely/ almost everywhere/ with probability 1/ strongly to $X$ if

P (n \to \infty lim X_{n} = X) = 1.

Notation: $X_{n} ⟶ a.s. X$ .
Alternative interpretation: Viewing a random variable as a measure of events, the above convergence is equivalent to

P (ω \in Ω : n \to \infty lim X_{n} (ω) = X (ω)) = 1,

where $Ω$ is the event space.

❗️ Remark: $X_{n}$ are generally highly dependent for this convergence to hold.
- 📗 Partial sum → infinite sum
- 📗 R.v.s defined as converging functions of a single underlying r.v.

Convergence in Probability

Definition: We say that $X_{n}$ converges in probability to $X$ if for any $ϵ > 0$ ,

n \to \infty lim P (∣ X_{n} - X ∣ > ϵ) = 0.

Notation: $X_{n} ⟶ P X$ .
❗️ Remark: Convergence in probability has a similar interpretation as almost sure convergence, and it also generally requires to be dependent. However, it’s weaker than almost sure convergence as it’s not uniform: different $(ϵ, δ)$ pairs such that $P (∣ X_{n} - X ∣ > ϵ) \leq δ$ require different $n$ .

Convergence in Distribution/ Weak Convergence

Definition: We say that $X_{n}$ converges in distribution/ in law/ weakly to $X$ if

n \to \infty lim F_{X_{n}} (x) = F_{X} (x),

for all $x$ at which $F_{X}$ is continuous, where $F$ is a Cumulative Distribution Function.

Notation: $X_{n} ⟶ d X$ .
Alternative interpretation: Weak convergence inspects the convergence of the associated CDF of a random variable sequence, and thus it’s weaker and generally requires no dependency between $X_{n}$ . It’s equivalent to the convergence of the Characteristic Function.
❗️ Remark: Weak convergence is consistent with the convergence of real numbers. If $X_{n} = a.s. a_{n} \in R$ and $X = a.s. a \in R$ , then $X_{n} \to d X ⟺ a_{n} \to a$ .
- ❗️ This consistency does not hold if we require $F_{X_{n}} \to F_{X}$ for all $x$ .

Portmanteau Lemma

Several important statements equivalent to convergence in distribution are given by the Portmanteau Lemma:

$E g (X_{n}) \to E g (X)$ for any bounded, continuous/Lipschitz function $g$ .
$lim inf_{n \to \infty} E g (X_{n}) \geq E g (X)$ for any nonnegative and continuous function.
$lim inf_{n \to \infty} P (X_{n} \in B) \geq P (X \in B)$ for any open set $B$ .
$lim sup_{n \to \infty} P (X_{n} \in B) \leq P (X \in B)$ for any closed set $B$ .
$P (X_{n} \in B) \to P (X \in B)$ for any continuity set³ $B$ .
- which is further equivalent to $\int_{B} f_{X_{n}} (x) - f_{X} (x) d x \to 0$ .

Convergence of PDF/PMF

Suppose $X_{n} \to d X$ . The convergence of the associated PDF/PMF is unclear:

It is possible for $X_{n}$ to be discrete and $X$ to be continuous.
- 📗 $X_{n} = \frac{1}{n} Unif {1, \dots, n} \to Unif [0, 1]$ .
It is possible for $X_{n}$ to be continuous and $X$ to be discrete.
- 📗 $X_{n} = Unif [0, 1/ n] \to 0$ .
❗️ If $X_{n}$ and $X$ are continuous, it is possible that the PDF $f_{n} \neq \to f$ does not converge.
- 📗 $F_{n} = x + cos (2 πn x) / (2 πn) \to F_{X} = x$ but $f_{n} = 1 - sin (2 πn x) \neq \to f_{X} = 1$

For the other direction, we have:

If $X_{n}$ and $X$ are continuous, the convergence of PDF implies the convergence in distribution.
If $X_{n}$ and $X$ are discrete, the convergence in distribution is equivalent to the convergence of PMF.

Convergence of Characteristic Functions

Convergence in distribution is equivalent to the convergence of Characteristic Functions:

n \to \infty lim ϕ_{X_{n}} (t) = ϕ_{X} (t), \forall t

Convergence in $L^{p}$ Norm

Definition: We say that $X_{n}$ converges ==in $L^{p}$ norm/ in $p$ th mean== to $X$ if

n \to \infty lim E [∣ X_{n} - X ∣^{p}] = 0.

Notation: $X_{n} ⟶ L^{p} X$ .
❗️ Remark: For $p_{1} > p_{2} \geq 1$ , we have $X_{n} ⟶ L_{p_{1}} X ⟹ X_{n} ⟶ L_{p_{2}} X$ , but not the other way around.

Convergence under Transformations

Continuous Mapping Theorem

Let $X_{n}$ be a sequence of random variables that converges almost surely/ in probability/ in distribution to $X$ . Let $g$ be a continuous function. Then $g (X_{n})$ converges to $g (X)$ .

❗️ The continuous mapping theorem does not apply to convergence in norm.

Further, if $g$ is continuous at $c$ and $X_{n} \to P c$ , then $g (X_{n}) \to P g (c)$ . $g$ need not be continuous in this case.

Slutsky’s Theorem

Let $X_{n}$ and $Y_{n}$ be sequences of random variables that converge in distribution to $X$ and $c$ respectively, where $c$ is a constant. Then

B (X_{n}, Y_{n}) \to d B (X, c),

where $B = {+, \times, \div, (\cdot, \cdot)}$ is a set of binary operations. For $\div$ to be defined, $Y_{n}, c$ must be nonzero.

❗️ The theorem also holds for convergence in probability.

For almost sure convergence, convergence in probability, and in $L^{p}$ norm, we have stronger results:

Suppose $X_{n} \to a.s / P X$ and $Y_{n} \to a.s / P Y$ . Then

B (X_{n}, Y_{n}) \to a.s. / P B (X, Y) .

Suppose $X_{n} \to L^{p} X$ and $Y_{n} \to L^{p} Y$ . Then

X_{n} + Y_{n} \to L^{p} X + Y .

Note that we no longer restrict $Y_{n}$ to converge to a constant.

Sum of IID Random Variables

A good example to illustrate different modes of convergence is the sum, or average, of iid random variables. Suppose $X_{i}$ are iid with finite mean $μ$ and variance $σ^{2}$ . Let $\overline{X}_{n} = \sum_{i = 1}^{n} X_{i}$ . Then, the following theorems claim that $\overline{X}_{n}$ converges to $μ$ ,

in distribution by Central Limit Theorem;
in probability by Weak Law of Large Numbers;
and almost surely by Strong Law of Large Numbers.

The function $g$ is applied on $Y_{n}$ and $Y$ , and is continuous on the range of $Y$ . ↩
For the division operation to be well-defined, the denominator sequence and its limit must be non-zero. ↩
A continuity set has a zero-measure boundary. ↩

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Convergence of Random Variables

Table of Contents

Convergence of Random Variables

Almost Sure/ Strong Convergence

Convergence in Probability

Convergence in Distribution/ Weak Convergence

Portmanteau Lemma

Convergence of PDF/PMF

Convergence of Characteristic Functions

Convergence in $L^{p}$ Norm

Convergence under Transformations

Continuous Mapping Theorem

Slutsky’s Theorem

Sum of IID Random Variables

Backlinks

Graph View

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Convergence of Random Variables

Table of Contents

Convergence of Random Variables

Almost Sure/ Strong Convergence

Convergence in Probability

Convergence in Distribution/ Weak Convergence

Portmanteau Lemma

Convergence of PDF/PMF

Convergence of Characteristic Functions

Convergence in Lp Norm

Convergence under Transformations

Continuous Mapping Theorem

Slutsky’s Theorem

Sum of IID Random Variables

Footnotes

Backlinks

Graph View

Convergence in $L^{p}$ Norm