Normal Distribution

A Random Variable is said to be normally distributed if it has

Notation
- $N (μ, σ^{2})$
Parameters
- $μ \in R, σ^{2} \in R_{+}$
PDF
- $f (x) = \frac{1}{σ 2 π} e^{- \frac{1}{2} (\frac{x - μ}{σ})^{2}}$
- $(2 π)^{- k /2} ∣ Σ ∣^{- 1/2} exp (- \frac{1}{2} (x - μ)^{⊤} Σ^{- 1} (x - μ))$ for $k$ -dimensional with PSD $Σ$
Mean
- $μ$
Variance
- $σ^{2}$
MGF
- $e^{μ t + σ^{2} t^{2} /2}$
CF
- $e^{i t μ - σ^{2} t^{2} /2}$

Because of the Central Limit Theorem, in practice, many random phenomena obey, at least approximately, a normal probability distribution.

Equivalent Definitions for Multivariate Normal Distribution

We can define a random vector to be (nondegenerate multivariate) normal if if has a PDF specified above.

Or, if it has the form:

X = D W + μ,

for any matrix $D$ and vector $μ$ , where $W$ is a random vector whose components are independent standard normal random variables $N (0, 1)$ . ^7bb02c 2. Or, if for any real vector $a$ , the random variable $a^{T} X$ is normal.

These two alternative definitions cover degenerate normal distribution, i.e., some components are constant/ the distribution is concentrated on a proper subspace of $R^{k}$ .

Equivalence

First Alternative

By change of variables, we can see that the definition via PDF implies the first alternative definition. We now show that if $D$ is not singular, $X = D W + μ$ is a nondegenerate normal r.v. with a PDF of the form above. By the relationship of Derived Distribution, we have

f_{X} (x) = f_{W} (D^{- 1} x) \cdot ∣ D^{- 1} ∣.

Plugging in the PDF of standard normal distribution, we get

f_{X} (x) = (2 π)^{- n /2} exp (- \frac{1}{2} (x - μ)^{T} D^{- T} D^{- 1} (x - μ)) \cdot \frac{1}{∣ D ∣} = (2 π)^{- n /2} ∣ D D^{T} ∣^{- 1/2} exp (- \frac{1}{2} (x - μ)^{T} (D D^{T})^{- 1} (x - μ)),

and $D D^{T}$ is the covariance of $X$ .

Second Alternative

The relationship from the first alternative definition to the second is direct. Now suppose $a^{T} X$ is normal for any vector $a$ .

Wrong

We inspect the MGF of $X$ :
$M_{X} (s) = M_{s^{T} X} (1) = exp (s^{T} E [X] + s^{T} Var (X) s /2),$
where we use the fact that $s^{T} X$ is normal with mean $s^{T} E X$ and variance $s^{T} Var (X) s$ . On the other hand, let $Y = Var (X) W + E [X]$ , where $W$ is a standard normal random vector. Then, $Y$ ~~is multivariate normal and~~ has the same MGF as $X$ . By the inversion theorem of MGF, $X$ and $Y$ have the same distribution.

❗️ The above proof is wrong, as we want to show $X = Y$ but not only $X = d Y$ .

To rigorously prove the equivalence, we consider two cases.

Case I. $V = D^{2} = Cov (X, X) ≻ 0$ . Thus, $D$ is invertible and we let $W = D^{- 1} (X - μ)$ . We want to show that $W$ is standard normal, and thus $X = D W + μ$ satisfies the first alternative definition. We first have $E [W] = 0$ and $Var (W) = D^{- 1} Var (X) D^{- T} = D^{- 1} D^{2} D^{- 1} = I$ . We then inspect the MGF (transform) of $W$ :

M_{W} (s) = = = = = = = E [exp (s^{T} W)] E [exp (s^{T} W + μ^{'}) exp (- μ^{'})] exp (- μ^{'}) E [exp (s^{' T} X)] exp (- μ^{'}) M_{s^{' T} X} (1) exp (- μ^{'}) exp (s^{' T} μ + s^{' T} V s^{'} /2) exp (- μ^{'}) exp (μ^{'} + s^{T} s /2) exp (s^{T} s /2), (μ^{'} = s^{T} D^{- 1} μ) (s^{'} = D^{- 1} s) (s^{'} X is normal by Alt. Def. 2)

which is the MGF of a standard normal random variable. Therefore, $W \sim N (0, I)$ .

Case II. $V$ is singular. Then, there exists a vector $a \neq = 0$ such that $Va = 0$ . For simplicity, suppose $μ = 0$ . Note that

a^{T} Va = E [(a^{T} X)^{2}] = 0.

Thus, $a^{T} X = 0$ , indicating the linear dependence of $X$ . WLOG, suppose $X = (Y; Z) \in R^{k = d + (k - d)}$ and the first $0 \leq d < k$ components of $X$ are linearly independent and $Z = A Y$ is a linear combination of $Y$ . Since $X$ satisfies the second alternative definition, $Y$ also satisfies it. By the Case I, we know $Y = D W$ for some positive definite $D$ and standard normal $W$ . Let $W^{'}$ be another $k - d$ -dimensional standard normal r.v. independent of $W$ . Then, we can write

X = [Y Z] = [D A D 00] [W W^{'}] .

Properties

(Sufficiency) The mean and covariance of a multivariate normal distribution consist of a Sufficient Statistic.
- In other words, the distribution of a multivariate normal random vector is completely determined by its mean and covariance
- 📎 See Sufficiency for proof
(Affine transformation). The Affine Transformation of a normal random variable $X$ : $a + BX$ is also a normal random variable
- As a special case, any sub-vector of a normal random vector is also normal
  - As a special case, any component of a normal random vector is also normal
- If $X \sim N (μ, Σ)$ , then $BX + a \sim N (B μ + a, B Σ B^{T})$
- 📎 The proof follows the alternative definition 1 above
(Independent Gaussians are jointly Gaussian). The sum of independent normal random variables is also a normal random variable
- 📎 Prove this using the Inversion Theorem
- ❗️ Note that this is generally not true for dependent random variables
- ❗️ More generally, if a random vector with normal components is not jointly normal, then its affine transformation is not necessarily normal! See also Independence, Correlation, and Jointly Normal
(Independent iff Uncorrelated). For a multivariate normal random vector, its components are independent if and only if they are uncorrelated
- 📎 We can use the sufficiency property to prove this. For $X$ with uncorrelated components, we can construct $Y$ with independent components that have the same mean and variance. Then $X = d Y$ and thus $X$ has independent components. See also Independence, Correlation, and Jointly Normal
- 📎 For nondegenerate normal random vector, we can also factorize the PDF to show independence.
- ❗️ The statement is not true for general Random Variables for which the mean and variance are not sufficient; see Independence, Correlation, and Jointly Normal
Hence, if $X \sim N (μ, σ^{2})$ , then $Z = (X - μ) / σ$ is normal with mean 0 and variance 1; $Z$ is said to have a standard or unit normal distribution
- We write the CDF of a standard normal distribution $Φ$
(Symmetry). $Φ (- x) = 1 - Φ (x)$
Let $R^{p} ∋ X \sim N (μ, Σ)$ ; then, $(X - μ)^{T} Σ^{- 1} (X - μ) \sim χ_{p}^{2}$ (see Chi-Square Distribution)

Proofs

Sufficiency

Note that PDF or CDF completely determines the distribution of a random variable. Therefore, by the definition for nondegenerate multivariate normal distribution via PDF, we can see that the mean and covariance matrix are sufficient.

For general normal random vectors, we can seek help from the MGF. Note that a MGF also completely determines a distribution. We use the second alternative definition. For a random vector, its MGF is the multivariate transform:

M_{X} (s) = E [exp (s^{T} X)] = M_{s^{T} X} (1) = exp (s^{T} μ + s^{T} V s /2),

where we use the definition that $s^{T} X$ is normal with mean $s^{T} μ$ and variance $s^{T} V s$ . As we can see, the mean and variance are sufficient to determine the MGF, and thus the distribution.

Independence, Correlation, and Jointly Normal

Normal components does not imply jointly normal.

It is not true that if $X$ and $Y$ are both normal, then the joint distribution of $(X, Y)$ is normal. For example, let $X \sim N (0, 1)$ and $Y = (2 B - 1) X$ , where $B \sim Bern (0.5)$ , i.e., $Y = \pm X$ with equal probability. Then, it is easy to verify that $Φ$ is also the CDF of $Y$ , and thus $Y \sim N (0, 1)$ . However, if $(X, Y)$ is jointly normal, we would have $(1, 1) (X, Y)^{T} = X + Y$ is normal, which is not true because $X + Y = 2 BX$ .

Independent normal components implies jointly normal.

The above statement becomes true once we impose the independence condition. We use the second alternative definition above to prove this. Let $X = (X_{1}, \dots, X_{n})$ with normal components. Then, for any vector $a$ , we have $a^{T} X = \sum_{i = 1}^{n} a_{i} X_{i}$ . Note that $a_{i} X_{i}$ are independent normal random variables, and thus their sum is normal by Property ^prop-ind-joint, or the Inversion Theorem. By the alternative definition, $X$ is jointly normal.

Joint normal with zero correlation implies independence.

Suppose that the components of $X$ are uncorrelated, i.e., its covariance matrix is a diagonal. Consider another random vector $Y$ such that $Y_{i} = d X_{i}$ and $Y_{i}$ are independent. By Property ^prop-ind-joint, $Y$ is jointly normal. Since $X$ and $Y$ have the same mean and covariance, by Property ^prop-suff, $X$ and $Y$ have the same distribution, and thus the components of $X$ are independent.

Zero correlation does not imply independence for general random variables.

For example, let $X \sim Unif [- 1, 1]$ and $Y = X^{2}$ . Certainly, $X$ and $Y$ are not independent, but they are uncorrelated:
$Cov (X, Y) = E [X Y] - E [X] E [Y] = E [X^{3}] - 0 = 0.$

Sample Mean and Sample Variance

Thm

If ${X_{i}}_{i = 1}^{n}$ is a sample from a normal population having mean $μ$ and variance $σ^{2}$ , then $\overline{X}$ and $S^{2}$ are independent random variables, with $\overline{X} \sim N (μ, σ^{2} / n)$ , and $\frac{n - 1}{σ ^{2}} S^{2} \sim χ_{n - 1}^{2}$ . Then we have $\frac{X - μ}{S / n} \sim t_{n - 1}$ .

❗️ This independence of $\overline{X}$ and $S^{2}$ is a unique property of the normal distribution.

General Bivariate Normal Distribution

We skipped the introduction of standard bivariate normal random variable $(X, Y) \in R^{2}$ , as it is more convenient, and not conceptually harder, to directly deal with the general bivariate normal distribution.

Let $X$ and $Y$ be two random vectors with a joint normal distribution. That is

[X Y] \sim N ([μ_{X} μ_{Y}], [Σ_{XX} Σ_{Y X} Σ_{X Y} Σ_{YY}]) .

Suppose $Σ_{YY} > 0$ (PSD). We have

$E [X ∣ Y] = μ_{X} + Σ_{X Y} Σ_{YY}^{- 1} (Y - μ_{Y})$ .
$\tilde{X} : = X - E [X ∣ Y]$ is independent of $Y$ , and thus independent of any function of $Y$ .
$Cov (\tilde{X}, \tilde{X} ∣ Y) = Cov (\tilde{X}, \tilde{X}) = Σ_{XX} - Σ_{X Y} Σ_{YY}^{- 1} Σ_{Y X}$ .

Before proving the above results, we recover the standard bivariate normal distribution from them. Let $Σ_{XX} = σ_{X}^{2}$ , $Σ_{YY} = σ_{Y}^{2}$ , $Σ_{X Y} = ρ σ_{X} σ_{Y}$ . Then, we have

E [X ∣ Y] = ρ \frac{σ _{X}}{σ _{Y}} Y, Var (\tilde{X}) = σ_{X}^{2} (1 - ρ^{2}) .

Proofs

WLOG, we assume $μ_{X} = μ_{Y} = 0$ for simplicity. Let $\hat{X} = Σ_{X Y} Σ_{YY}^{- 1} Y$ . We have

E [\hat{X} Y] = Σ_{X Y} Σ_{YY}^{- 1} E [YY] = Σ_{X Y} = E [X Y] ⟹ E [(X - \hat{X}) Y] = 0.

Note that $(X - \hat{X}, Y)$ is a linear transformation of $(X, Y)$ , and thus the above uncorrelatedness implies independence. This also implies, for any function $g$ :

E [(X - \hat{X}) g (Y)] = 0.

By the General Definition of Conditional Expectation, the above condition says that

E [X ∣ Y] = \hat{X} .

Finally, we compute $Var (\tilde{X})$ . Since $E [X ∣ Y]$ is also a function of $Y$ , it is independent of $\tilde{X}$ . Thus, we have

Var (\tilde{X}) = E [(X - \hat{X}) (X - \hat{X})^{T}] = E [(X - \hat{X}) X^{T}] = Σ_{XX} - Σ_{X Y} Σ_{YY}^{- 1} Σ_{Y X} .

Sufficient Statistics

Table of Contents

Backlinks

Graph View

Normal Distribution

Table of Contents

Normal Distribution

Equivalent Definitions for Multivariate Normal Distribution

Equivalence

First Alternative

Second Alternative

Properties

Proofs

Sufficiency

Independence, Correlation, and Jointly Normal

Sample Mean and Sample Variance

General Bivariate Normal Distribution

Proofs

Backlinks

Graph View