Gaussian Properties
A real-valued random variable (r.v.) is called a normal/Gaussian r.v. if it admits the following probability density function (PDF):
- $\displaystyle f(x)=\frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}$
- $(2 \pi)^{-k / 2} |\boldsymbol{\Sigma}|^{-1 / 2} \exp \left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^{\top} \boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right)$ for $k$-dimensional with PSD $\Sigma$
Normal r.v.s have many nice properties, each of which gives a partial answer to why they are so common in nature.
Parametrized Model
A parametrized model is a family of probability distributions with its elements completely determined by a finite number of parameters. Normal distribution is a parametrized model with two parameters: mean and variance . In other words, once we know the values of and , we know everything about the normal distribution.
The parameterization has many implications in Statistics. For example, suppose the variance is known, and we want to do some statistical inference on a normal distribution with i.i.d samples . Then, the sample mean is a Sufficient Statistic for the distribution. That is, we can compress the data from an -dimensional vector to a real number, without losing any information about the distribution.
Affine Transformation Invariance
Any Affine Transformation of a normal r.v. is also normal. Specifically,
- If $X \sim \mathcal{N}(\mu,\Sigma)$, then $BX+a \sim \mathcal{N}(B\mu+a,B \Sigma B^{T})$
- As a special case, any sub-vector of a normal random vector is also normal
The affine transformation invariance is central to normal distribution. Actually, normal distribution can be defined through affine transformation. We have the following two alternative definitions:
- Or, if it has the form:
for any matrix and vector , where is a random vector whose components are independent standard normal random variables .
- Or, if for any real vector , the random variable is normal.
Symmetry
Normal distribution is symmetric around its mean , meaning that and has the same distribution for any normal r.v. .
Graphically, the PDF of a normal r.v. is of a bell shape, symmetric around the mean . Formally, we denote the CDF of standard normal r.v. as ; then,
For a general normal r.v. , we know its CDF satisfies , because
where is a standard normal r.v. Therefore, we have
We often rely on the above transformation to reduce a general normal r.v. to a standard normal r.v. for the ease of analysis.
Moments
Then central moments of normal distribution have a nice closed form:
So do its central absolute moments:
Independence, Correlation, and Jointly Normal
Independence, Correlation, and Jointly Normal
Normal components does not imply jointly normal.
It is not true that if and are both normal, then the joint distribution of is normal. For example, let and , where , i.e., with equal probability. Then, it is easy to verify that is also the CDF of , and thus . However, if is jointly normal, we would have is normal, which is not true because .
Independent normal components implies jointly normal.
The above statement becomes true once we impose the independence condition. We use the second alternative definition above to prove this. Let with normal components. Then, for any vector , we have . Note that are independent normal random variables, and thus their sum is normal by Property ^prop-ind-joint, or the Inversion Theorem. By the alternative definition, is jointly normal.
Joint normal with zero correlation implies independence.
Suppose that the components of are uncorrelated, i.e., its covariance matrix is a diagonal. Consider another random vector such that and are independent. By Property ^prop-ind-joint, is jointly normal. Since and have the same mean and covariance, by Property ^prop-suff, and have the same distribution, and thus the components of are independent.
Zero correlation does not imply independence for general random variables.
For example, let and . Certainly, and are not independent, but they are uncorrelated:
Tail Bound
“Tail” refers to the area under the PDF curve that is far away from the mean. The tail of a standard normal r.v. is given by Mill’s inequality:
which implies the tight bound:
The Chernoff bound of normal r.v. gives a slightly looser bound, often referred to as sub-Gaussian tail bound:
It turns out that such a light tail bound (exponential rate) is actually very common, that an important class of r.v. in probability and statistics is called ==Sub-Gaussian==, defined as r.v.s with a sub-Gaussian tail bound (perhaps with a different constant in the exponent).
And it turns out that such a sub-Gaussian bound is not much looser than the Mill’s Gaussian tail bound. Specifically the following properties are equivalent definitions of sub-Gaussian r.v.s:
- There exists such that ;
- There exists and a Gaussian r.v. such that .
The second property says that any sub-Gaussian tail bound is essentially bounded by a Gaussian tail bound. This is because of the dominance of the exponential decay.
Bayesian Inference
In Bayesian inference, we always need to calculate the posterior distribution given the observed data by
A nice thing about normal distribution is that the posterior of a normal prior and a normal likelihood is also normal. A specific example in Bayesian Linear Regression is:
Additionally, other common operations on Gaussian distributions also preserve Gaussianity, including affine transformation, Convolution, conditioning, and marginalization. As a result, other distributions involved in Bayesian inference using Gaussian models are also Gaussian.